Posts by Year

2018

FastQC and Trimming - Metagenomics (Geoduck) HiSeqX Reads from 20180809

  • 1 min read

Steven tasked me with assembling our geoduck metagenomics HiSeqX data. The first part of the process is examining the quality of the sequencing reads, performing quality trimming, and then checking the quality of the trimmed reads. It’s also possible (likely) that I’ll need to run another round of trimming. The process is documented in the Jupyter Notebook linked below. After these reads are cleaned up, I’ll transfer them over to our HPC nodes (Mox) and try assembling them.

Read More

Installation - Microsoft Machine Learning Server (Microsoft R Open) on Emu/Roadrunner R Studio Server

  • ~1 min read

Steven recently saw an announcement that Microsoft R Open now handles multi-threaded processing (default R does not), so we were interested in trying it out. I installed MLR/MRO on Emu/Roadrunner (Apple Xserve; Ubuntu 16.04). Followed the Microsoft installation directions for Ubuntu. In retrospect, I think I could’ve just installed MRO, but this gets the job done as well and won’t hurt anything.

Read More

Genome Annotation - Olympia oyster genome using WQ-MAKER Instance on Jetstream

  • 8 min read

Yesterday, our [Xsede Startup Application (Google Doc)(https://docs.google.com/document/d/1v4ukb4M3ZY73KaBsYjcmF35pAE2pEGH9AevdXAxCONI/edit?usp=sharing) got approval for 100,000 Service Units (SUs) and 1TB of disk space on Xsede/Atmosphere/Jetstream (or, whatever it’s actually called!). The approval happened within an hour of submitting the application!

Read More

Mox - Password-less SSH!

  • 1 min read

The high performance computing (HPC) cluster (called Mox) at Univ. of Washington (UW) frustratingly requires a password when SSH-ing, even when SSH keys are in use. I have a lengthy, unintelligible password that I use for my UW account, so having to type this in any time I want to initiate a new SSH session on Mox is a painful process.

Read More

Ubuntu - Fix “No Video Signal” Issue on Emu/Roadrunner

  • 1 min read

Both Apple Xserves (Emu/Roadrunner) running Ubuntu (16.04LTS) experienced the same issue - the monitor would indicate “No Video Signal”, would go dark, and wasn’t responsive to keyboard/mouse movements. However, you could ssh into both machines w/o issue.

Read More

Total Alkalinity Calculations - Yaamini’s Ocean Chemistry Samples

  • 1 min read

I ran a subset of Yaamini’s ocean chemistry samples on our T5 Excellence titrator (Mettler Toledo) at the beginning of April. The subset were samples taken from the beginning, middle, and end of the experiment. The rationale for this was to assess whether or not total alkalinity (TA) varied across the experiment. If there was little variation, then there’d likely be no need to run all of the samples. However, should there be temporal differences, then all samples should be processed.

Read More

Kmer Estimation - Kmergenie on Geoduck Sequence Data (default settings)

  • ~1 min read

After the last SparseAssembler assembly completed, I wanted to do another run with a different kmer size (last time was arbitrarily set at 101). However, I didn’t really know how to decide, particularly since this assembly consisted of mixed read lenghts (50bp and 100bp). So, I ran kmergenie on all of our geoduck (Panopea generosa) sequencing data in hopes of getting a kmer determination to apply to my next assembly.

Read More

Back to Top ↑

2017

Samples Submitted - Geoduck Tissues to Illumina for More 10x Genomics Sequencing

  • ~1 min read

Continuing Illumina’s generous efforts to use our geoduck samples to test out the robustness of their emerging sequencing technologies, they have requested we send them some more geoduck tissue so that they can try to isolate higher molecular weight DNA to complete the genome sequencing efforts using the 10x genomics sequencing platform.

Read More

FAIL - Missing Data on Owl!

  • 1 min read

Uh oh. There appears to be some data that’s been removed from Owl. I noticed this earlier when trying to look at some of Sean’s data. His data should be in a folder with his name in Owl/scaphapoda

Read More

Data Management - Convert Oly PacBio H5 to FASTQ

  • ~1 min read

After working with all of this Olympia oyster genome sequencing data, I remembered that we had an old, singular PacBio SMRT cell file (from June 2013). This file didn’t seem to be included in any recent assemblies of Sean’s or mine. This is most likely because we have it in the PacBio H5 format and not in FASTQ.

Read More

RNA Isolation - Olympia oyster gonad tissue in paraffin histology blocks

  • 1 min read

My previous go at this was a little premature - I didn’t wait for Laura to fully annotate her slides/blocks. Little did I know, the tissue was mostly visceral mass and, as such, I didn’t hit much in the way of actual gonad tissue. So, I’m redoing this, now that Grace has gone through and annotated the blocks to point out gonad tissue. SN-10-16 was sent to Katherine Silliman on 20170720.

Read More

RNA Isolation - Olympia oyster gonad tissue in paraffin histology blocks

  • 2 min read

UPDATE 20170712: The RNA I isolated below is from incorrect regions of tissue. I misunderstood exactly what this tissue was, and admittedly, jumped the gun. The tissue is actually collected from the visceral mass - which contains gonad (a small amount) and digestive gland (a large amount). The RNA isolated below will be stored in one of the Shellfish RNA boxes and I will isolate RNA from the correct regions indicated by Grace

Read More

Goals - June 2017

  • ~1 min read

Well, my previous goal was to tidy up an existing manuscript and get it re-submitted to PeerJ. That’s pretty much done, as Steven will be giving a final once over and formatting the rebuttal letter prior to resubmission.

Read More

Computing - Oly BGI GBS Reproducibility Fail

  • 2 min read

Since we’re preparing a manuscript that relies on BGI’s manipulation/handling of the genotype-by-sequencing data, I attempted to could reproduce the demultiplexing steps that BGI used in order to perform the SNP/genotyping on these samples.

Read More

FASTQC - Oly BGI GBS Raw Illumina Data Demultiplexed

  • ~1 min read

Last week, I ran the two raw FASTQ files through FastQC. As expected, FastQC detected “errors”. These errors are due to the presence of adapter sequences, barcodes, and the use of a restriction enzyme (ApeKI) in library preparation. In summary, it’s not surprising that FastQC was not please with the data because it’s expecting a “standard” library prep that’s already been trimmed and demultiplexed.

Read More

DNA Isolation - Geoduck gDNA for Illumina-initiated Sequencing Project

  • 1 min read

We were previously approached by Cindy Lawley (Illumina Market Development) for possible participation in an Illumina product development project, in which they wanted to have some geoduck tissue and DNA on-hand in case Illumina green-lighted the use of geoduck for testing out the new sequencing platform on non-model organisms. Well, guess what, Illumina has give the green light for sequencing our geoduck! However, they need at least 4μg of gDNA, so I’m isolating more.

Read More

Data Management - Replacement of Corrupt BGI Oly Genome FASTQ Files

  • ~1 min read

Previously, Sean and Steven identified two potentially corrupt FASTQ files. I contacted BGI about getting replacement files and they informed me that all versions of the FASTQ files they have delivered on three separate occasions are all the same file (despite having different file names). As such, I could use one of these versions to replace the corrupt FASTQ files. So, that’s what I did!

Read More

Goals - January 2017

  • 1 min read

One of the long-running goals I’ve had is to get this Oly GBS data taken care of and out the door to publication. I think I will finally succeed with this, with the help of Pub-A-Thon. Don’t get too excited, it’s not what you think. It is not the drinking extravaganza that the name implies. Instead, it’s a “friendly” lab competition to get some scientific publications assembled and submitted.

Read More

Back to Top ↑

2016

Data Management - Integrity Check of Final BGI Olympia Oyster & Geoduck Data

  • ~1 min read

After completing the downloads of these files from BGI, I needed to verify that the downloaded copies matched the originals. Below is a Jupyter Notebook detailing how I verified file integrity via MD5 checksums. It also highlights the importance of doing this check when working with large sequencing files (or, just large files in general), as a few of them had mis-matching MD5 checksums!

Read More

Goals - November 2016

  • ~1 min read

Well, I’m serious this time. My goal for this month is to complete the Oly GBS data analysis and, get the data sets and data analysis prepared/placed in satisfactory repositories in preparation for publication in Scientific Data.

Read More

Goals - October 2016

  • ~1 min read

Last month’s goals, as it turns out, were way too ambitious. This month’s goal will be to get the Oly GBS data analysis fully completed (currently have individuals data, but need summary of the three populations data). I’ll also get the data sets and data analysis prepared/placed in satisfactory repositories in preparation for publication in Scientific Data.

Read More

Data Received – Jay’s Coral epiRADseq - Not Demultiplexed

  • 1 min read

Previously downloaded Jay’s epiRADseq data that was provided by the Genomic Sequencing Laboratory at UC-Berkeley. It was provided already demultiplexed (which is very nice of them!). To be completionists on our end, we requested the non-demultiplexed data set.

Read More

Oyster Sampling - Olympia Oyster OA Populations at Manchester

  • 1 min read

I helped Katherine Silliman with her oyster sampling today from her ocean acidification experiment with Olympia oysters (Ostrea lurida) at the Kenneth K. Chew Center for Shellfish Research & Restoration, which is housed at the NOAA Northwest Fisheries Science Center at Manchester in a partnership with the [Puget Sound Restoration Fund (PSRF)(http://www.restorationfund.org/). We sampled the following tissues and stored in 1mL RNAlater:

Read More

Computing - Amazon EC2 Cost “Analysis”

  • 3 min read

I recently moved some computing jobs over to Amazon’s Elastic Cloud Computing (EC2) in attempt to avoid some odd computing issues/errors I kept encountering on our lab computers (Apple Xserve 3,1).

Read More

Goals - August 2016

  • ~1 min read
  • Complete Olympia oyster GBS data analysis - Progress has actually been made! After many struggles, I managed to get a PyRad analysis of the entire data set to complete. Now, I just have to figure out what to do with the output files…

Read More

Computing - Amazon EC2 Instance Out of Space?

  • ~1 min read

Running PyRad analysis on the Olympia oyster GBS data. PyRad exited with warnings about running out of space. However, looking at free disk space on the EC2 Instance suggests that there’s still space left on the disk. Possibly PyRad monitors the expected disk space usage during analysis to verify there will be sufficient disk space to write to? Regardless, will expand EC2 volume instance to a larger size…

Read More

Dissection - Frozen Geoduck & Pacific Oyster

  • 2 min read

We’re working on a project with Washington Department of Natural Resources’ (DNR) Micah Horwith to identify potential proteomic biomarkers in geoduck (Panopea generosa) and  Pacific oyster (Crassostrea gigas). One aspect of the project is how to best conduct sampling of juvenile geoduck (Panopea generosa) and Pacific oyster (Crassostrea gigas) to minimize changes in the proteome of ctenidia tissue during sampling. Generally, live animals are shucked, tissue dissected, and then the tissue is “snap” frozen. However, Micah’s crew will be collecting animals from wild sites around Puget Sound and, because of the remote locations and the means of collection, will have limited tools and time to perform this type of sampling. Time is a significant component that will have great impact on proteomic status in each individual.

Read More

Docker - Improving Roberts Lab Reproducibility

  • 2 min read

In an attempt at furthering our lab’s abilities to maximize our reproducibility, I’ve been  working on developing an all-encompassing Docker image. Docker is a type of virtual machine (i.e. a self-contained computer that runs within your computer). For the Roberts Lab, the advantage of using Docker is that the Docker images can be customized to run a specific suite of software and these images can then be used by any other person in the lab (assuming they can run Docker on their particular operating system), regardless of operating system. In turn, if everyone is using the same Docker image (i.e. the same virtual machine with all the same software), then we should be able to reproduce data analyses more reliably, due to the fact that there won’t be differences between software versions that people are using. Additionally, using Docker greatly simplifies the setup of new computers with the requisite software.

Read More

Data Analysis - Oly GBS Data Using Stacks 1.37

  • ~1 min read

This analysis ran (or, more properly, was attempted) for a couple of weeks and failed a few times. The failures seemed to be linked to the external hard drive I was reading/writing data to. It continually locked up, leading to “Segmentation fault” errors.

Read More

Goals - May 2016

  • ~1 min read

Well, I guess the first goal is to remember to be more consistent about writing monthly goals…

Read More

Data Management - O.lurida Raw BGI GBS FASTQ Data

  • ~1 min read

BGI had previously supplied us with demultiplexed GBS FASTQ files. However, they had not provided us with the information/data on how those files were created. I contacted them and they’ve given us the two original FASTQ files, as well as the library index file and corresponding script they used for demultiplexing all of the files. The Jupyter (iPython) notebook below updates our checksum and readme files in our server directory that’s hosting the files: http://owl.fish.washington.edu/nightingales/O_lurida/20160223_gbs/

Read More

Data Analysis - Oly GBS Data from BGI Using Stacks

  • ~1 min read

UPDATE (20160418) : I’m posting this more for posterity, as Stacks continually locked up at both the “ustacks” and “cstacks” stages. These processes would take days to run (on the full 96 samples) and then the processes would become “stuck” (viewed via the top command in OS X).

Read More

SRA Submission – Genome sequencing of the Olympia oyster (Ostrea lurida)

  • ~1 min read

Adding our Olympia oyster genome sequencing (sequencing done by BGI) to the NCBI Sequence Read Archive (SRS). The current status can be seen in the screen cap below. Release date is set for a year from now, but will likely bump it up. Need Steven to review the details of the submission (BioProject, Experiment descriptions, etc.) before I initiate the public release. Will update this post with the SRA number once we receive it.

Read More

SRA Submission – Genome sequencing of the Pacific geoduck (Panopea generosa)

  • ~1 min read

Adding our geoduck genome sequencing (sequencing done by BGI) to the NCBI Sequence Read Archive (SRS). The current status can be seen in the screen cap below. Release date is set for a year from now, but will likely bump it up. Need Steven to review the details of the submission (BioProject, Experiment descriptions, etc.) before I initiate the public release. Will update this post with the SRA number once we receive it.

Read More

Data Management - O.lurida 2bRAD Dec2015 Undetermined FASTQ files

  • ~1 min read

An astute observation by Katherine Silliman revealed that the FASTQ files I had moved to our high-throughput sequencing archive on our server Owl, only had two FASTQ files labeled as “undetermined”. Based on the number of lanes we had sequenced, we should have had many more. Turns out that the undetermined FASTQ files that were present in different sub-folders of the Genewiz project data were not uniquely named. Thus, when I moved them over (via a bash script), the undetermined files were continually overwritten, until we were left with only two FASTQ files labeled as undetermined.

Read More

Back to Top ↑

2015

Data Storage - Synology DX513

  • ~1 min read

Running a bit low on storage on Owl (Synology DS1812+) and we will be receiving a ton of data in the next few months, so we purchased a Synology DX513. It’s an expansion unit designed specifically for seamlessly expanding our existing storage volume on Owl.

Read More

DNA Isolation - Geoduck Ctenidia gDNA

  • 1 min read

Isolated additional gDNA for the genome sequencing. In an attempt to obtain better yields, I used ctenidia (instead of adductor muscle). Additionally, to try to improve the quality (260/280 & 260/230 ratios) of the gDNA, I added a chloroform step after the initial tissue homogenization.

Read More

Restriction Digest – Oly gDNA for RAD-seq w/AlfI

  • 1 min read

Previously initiated the RAD-seq procedure for the sample set described below. However, the test scale PCR yielded poor results. Katherine Silliman suggested that the poor performance of the test scale PCR was likely due to low numbers of adaptor-ligated fragments. Since the input DNA is so degraded, I’ve repeated this using 9μg of input DNA (instead of the recommended 1.2μg). This should increase the number of available cleavage sites for AlfI, thus improving the number of available ligation sites for the adaptors.

Read More