Posts by Year

2019

FastQC-MultiQC - Additional C.gigas WGBS Sequencing Data from Genewiz Received 20190501

  • ~1 min read

Earlier today, we received the additional G.gigas sequencing data from Genewiz. Wanted to run through FastQC again and get an updated report for each data set. Admittedly, it probably won’t look much different from the initial FastQC run on 20190415, due to the fact that the additional sequencing was simply appended to the previous data. Since FastQC examines a subset of the data in each file, I’d fully expect the FastQC report to look the same. However, we’ll have a greater number of sequences in each file. This should, in turn, increase the number of reads retained after quality trimming.

Read More

Data Analysis - C.virginica MBD Sequencing Coverage

  • 2 min read

A while ago, Steven tasked me with assessing some questions related to the sequencing coverage we get doing MBD-BSseq in this GitHub issue. At the heart of it all was really to try to get an idea of how much usable data we actually get when we do an MBD-BSseq project. Yaamini happened to have done an MBD-BSseq project relatively recently, and it’s one she’s actively working on analyzing, so we went with that data set.

Read More

Metagenomics Gene Prediction - P.generosa Water Samples Using MetaGeneMark on Mox to Compare pH Treatments

  • 2 min read

Continuing with a relatively quick comparison of pH treatments (pH=7.1 vs. pH=8.2), I wanted to run gene prediction on the MEGAHIT assemblies I made yesterday. I ran MetaGeneMark on the two pH-specific assemblies on Mox. This should be a very fast process (I’m talking, like a couple of minutes fast), so it enhances the annotation with very little effort and time.

Read More

Metagenome Assemblies - P.generosa Water Samples Trimmed HiSeqX Data Using Megahit on Mox to Compare pH Treatments

  • 2 min read

A report involving our work on the geoduck water metagenomics is due later this week and our in-depth analysis for this project using Anvi’o is likely to require at least another week to complete. Even though we have a broad overview of the metagenomic taxa present in these water samples, we don’t have data in a format for comparing across samples/treatments. So, I initiated our simplified pipeline (MEGAHIT > MetaGeneMark > BLASTn > KronaTools) for examining our metagenomic data of the two treatments:

Read More

RNA Isolation and Quantification - Crab Hemolypmh Using Quick-DNA-RNA Microprep Plus Kit

  • 1 min read

In the continuing struggle to isolate RNA from the Chionoecetes bairdi hemolymph preserved in RNAlater, Pam managed to find the Quick-DNA-RNA Microprep Plus Kit (ZymoResearch) as a potential option. We received a free sample of the kit and so I gave it a shot. Grace pulled 10 samples she’d previously used to isolate RNA (and was unable to get anything out of virtually all of them using the RNeasy Plus Micro Kit (Qiagen)) for me to try out this new kit:

Read More

Transcriptome Assembly - Geoduck Tissue-specific Assembly Larvae Day5 EPI99 with HiSeq and NovaSeq Data on Mox

  • 2 min read

I previously assembled and annotated P.generosa larval Day 5 transcriptome (20190318 - mislabeled as Juvenile Day 5 in my previous notebook entries) using just our HiSeq data from our Illumina collaboration. This was a an oversight, as I didn’t realize that we also had NovaSeq RNAseq data. So, I’ve initiated another de novo assembly using Trinity incorporating both sets of data.

Read More

Data Management - Data Migration and Drive Expansion on Gannet

  • 1 min read

A little while ago, we installed some additional hard drives in Gannet (Synology RS3618XS) with the intention of expanding the total storage space. However, the original set of HDDs were set up as RAID10. As it turns out, RAID10 configurations cannot be expanded! So, the new set of HDDs were configured as a separate volume (Volume 2) in a RAID6 configuration. After backing up the /volume1/web directory (via rsync) to our UW Google Drive, I begane the data migration.

Read More

Genome Assessment - BUSCO Metazoa on P.generosa v071 on Mox

  • 6 min read

Ran BUSCO on our completed annotation of the P.generosa v071 genome (GFF) (subset of sequences >10kbp). See this notebook entry for genome annotation info. This provides a nice metric on how “complete” a genome assembly (or transcriptome) is. Additionally, BUSCO is tied in with Augustus for gene prediction and generates ab initio gene models. With that said, since I just want to evaluate the completeness of this particular genome assembly, I’ll be using the annotated genome generated through two rounds of SNAP gene prediction. Otherwise, I’d use the initial MAKER annotations to generate an Augustus gene model that could be used in conjuction with the SNAP models (I’ll likely do this at a later date).

Read More

Genome Annotation - Pgenerosa_v070 MAKER on Mox

  • 7 min read

Here it goes, a massive undertaking of attempting to annotate the Pgenerosa_v070 genome (FastA; 2.1GB) using MAKER on Mox! I previously started this on 20190115, but killed it in order to fix a number of different issues with the script that were causing problems. I decided the changes were substantial enough, that I’d just make a new working directory and notebook entry.

Read More

Data Wrangling - CpG OE Calculations on C.virginica Genes

  • 2 min read

Steven tasked me with processing ~90 FastA files containing gene sequences from C.virginica in this GitHub Issue. He needed to determine the Observed/Expected (O/E) ratio of CpGs in each FastA. He provided this example code and this link to all the files. Additionally, today, he tasked Kaitlyn with merging all of the output CpG O/E values for each sample in to a single file, but I decided to tackle it anyway.

Read More

Methylation Analysis - C.virginica

  • 4 min read

This is a quick and dirty (i.e. no adaptor/quality trimming) assessment of all of our Crassostrea irginica bisulfite sequencing efforts to date in order to get a rough idea of the methylation mapping, per this GitHub issue. Ran Bismark on Mox on a series of subset of the reads:

Read More

DNA Isolation and Quantification - Ronit’s C.gigas Ploidy Ctenidia

  • ~1 min read

Last week, I isolated DNA from all of Ronti’s ctenidia samples, however one sample (D18-C) didn’t isolate properly. So, I performed another isolation procedure with another section of frozen tissue. Tissue was excised from frozen tissue block via razor blade (weight not recorded) and pulverized under liquid nitrogen. Samples were incubated O/N @ 37oC (heating block) in 350uL of MB1 Buffer + 25uL Proteinase K, per the E.Z.N.A. Mollusc DNA Kit (Omega) instructions.

Read More

Annotation - Olurida_v081 MAKER Proteins InterProScan5 on Mox

  • 1 min read

Continuation of genome annotation of the Olympia oyster genome. Determined initial gene models using MAKER with two rounds of SNAP, relabeled with more user-friendly names, and then performed protein-level annotations using BLASTp. Next, I’m going to run InterProScan5 (IPS5) to help functionally characterize the O.lurida proteins ID’d by MAKER. Once this is complete, I’ll use MAKER to incorporate the IPS5 and BLASTp results into a much more neatly (i.e. human-readable) annotated genome!

Read More

Back to Top ↑

2018

qPCR - Relative mitochondrial abundance in C.gigas diploids and triploids subjected to acute heat stress via COX1

  • 2 min read

Using the C.gigas cytochrome c oxidase (COX1) primers (SR IDs: 1713, 1714)I designed the other day, I ran a qPCR on a subset of Ronit’s diploid/triploid control/heat shocked oyster DNA that Shelly had previously isolated and performed global DNA methylation assay. The goal is to get a rough assessment of whether or not there appear to be differences in relative mitochondrial abundances between these samples.

Read More

FastQC and Trimming - Metagenomics (Geoduck) HiSeqX Reads from 20180809

  • 1 min read

Steven tasked me with assembling our geoduck metagenomics HiSeqX data. The first part of the process is examining the quality of the sequencing reads, performing quality trimming, and then checking the quality of the trimmed reads. It’s also possible (likely) that I’ll need to run another round of trimming. The process is documented in the Jupyter Notebook linked below. After these reads are cleaned up, I’ll transfer them over to our HPC nodes (Mox) and try assembling them.

Read More

Annotation - Olurida_v081 MAKER on Mox

  • 20 min read

Remarkably, I managed to burn through our Xsede computing resources and don’t have terribly much to show for it. Ooof! This is a major bummer, as it “only” takes ~8hrs for a WQ-MAKER job to run there, as opposed to months the last time I tried running it on Mox. Although we have used up our Xsede allocation, all is not lost! The experience of setting up/running WQ-MAKER has enlightened me on how it all works and how to run it on Mox so it will (hopefully) take far, far less time than the last Mox attempt. With that said, here we go…

Read More

Installation - Microsoft Machine Learning Server (Microsoft R Open) on Emu/Roadrunner R Studio Server

  • ~1 min read

Steven recently saw an announcement that Microsoft R Open now handles multi-threaded processing (default R does not), so we were interested in trying it out. I installed MLR/MRO on Emu/Roadrunner (Apple Xserve; Ubuntu 16.04). Followed the Microsoft installation directions for Ubuntu. In retrospect, I think I could’ve just installed MRO, but this gets the job done as well and won’t hurt anything.

Read More

Mox - Password-less SSH!

  • 1 min read

The high performance computing (HPC) cluster (called Mox) at Univ. of Washington (UW) frustratingly requires a password when SSH-ing, even when SSH keys are in use. I have a lengthy, unintelligible password that I use for my UW account, so having to type this in any time I want to initiate a new SSH session on Mox is a painful process.

Read More

Ubuntu - Fix “No Video Signal” Issue on Emu/Roadrunner

  • 1 min read

Both Apple Xserves (Emu/Roadrunner) running Ubuntu (16.04LTS) experienced the same issue - the monitor would indicate “No Video Signal”, would go dark, and wasn’t responsive to keyboard/mouse movements. However, you could ssh into both machines w/o issue.

Read More

Total Alkalinity Calculations - Yaamini’s Ocean Chemistry Samples

  • 1 min read

I ran a subset of Yaamini’s ocean chemistry samples on our T5 Excellence titrator (Mettler Toledo) at the beginning of April. The subset were samples taken from the beginning, middle, and end of the experiment. The rationale for this was to assess whether or not total alkalinity (TA) varied across the experiment. If there was little variation, then there’d likely be no need to run all of the samples. However, should there be temporal differences, then all samples should be processed.

Read More

Kmer Estimation - Kmergenie on Geoduck Sequence Data (default settings)

  • ~1 min read

After the last SparseAssembler assembly completed, I wanted to do another run with a different kmer size (last time was arbitrarily set at 101). However, I didn’t really know how to decide, particularly since this assembly consisted of mixed read lenghts (50bp and 100bp). So, I ran kmergenie on all of our geoduck (Panopea generosa) sequencing data in hopes of getting a kmer determination to apply to my next assembly.

Read More

Back to Top ↑

2017

Samples Submitted - Geoduck Tissues to Illumina for More 10x Genomics Sequencing

  • ~1 min read

Continuing Illumina’s generous efforts to use our geoduck samples to test out the robustness of their emerging sequencing technologies, they have requested we send them some more geoduck tissue so that they can try to isolate higher molecular weight DNA to complete the genome sequencing efforts using the 10x genomics sequencing platform.

Read More

FAIL - Missing Data on Owl!

  • 1 min read

Uh oh. There appears to be some data that’s been removed from Owl. I noticed this earlier when trying to look at some of Sean’s data. His data should be in a folder with his name in Owl/scaphapoda

Read More

Data Management - Convert Oly PacBio H5 to FASTQ

  • ~1 min read

After working with all of this Olympia oyster genome sequencing data, I remembered that we had an old, singular PacBio SMRT cell file (from June 2013). This file didn’t seem to be included in any recent assemblies of Sean’s or mine. This is most likely because we have it in the PacBio H5 format and not in FASTQ.

Read More

RNA Isolation - Olympia oyster gonad tissue in paraffin histology blocks

  • 1 min read

My previous go at this was a little premature - I didn’t wait for Laura to fully annotate her slides/blocks. Little did I know, the tissue was mostly visceral mass and, as such, I didn’t hit much in the way of actual gonad tissue. So, I’m redoing this, now that Grace has gone through and annotated the blocks to point out gonad tissue. SN-10-16 was sent to Katherine Silliman on 20170720.

Read More

RNA Isolation - Olympia oyster gonad tissue in paraffin histology blocks

  • 2 min read

UPDATE 20170712: The RNA I isolated below is from incorrect regions of tissue. I misunderstood exactly what this tissue was, and admittedly, jumped the gun. The tissue is actually collected from the visceral mass - which contains gonad (a small amount) and digestive gland (a large amount). The RNA isolated below will be stored in one of the Shellfish RNA boxes and I will isolate RNA from the correct regions indicated by Grace

Read More

Goals - June 2017

  • ~1 min read

Well, my previous goal was to tidy up an existing manuscript and get it re-submitted to PeerJ. That’s pretty much done, as Steven will be giving a final once over and formatting the rebuttal letter prior to resubmission.

Read More

Computing - Oly BGI GBS Reproducibility Fail

  • 2 min read

Since we’re preparing a manuscript that relies on BGI’s manipulation/handling of the genotype-by-sequencing data, I attempted to could reproduce the demultiplexing steps that BGI used in order to perform the SNP/genotyping on these samples.

Read More

FASTQC - Oly BGI GBS Raw Illumina Data Demultiplexed

  • ~1 min read

Last week, I ran the two raw FASTQ files through FastQC. As expected, FastQC detected “errors”. These errors are due to the presence of adapter sequences, barcodes, and the use of a restriction enzyme (ApeKI) in library preparation. In summary, it’s not surprising that FastQC was not please with the data because it’s expecting a “standard” library prep that’s already been trimmed and demultiplexed.

Read More

DNA Isolation - Geoduck gDNA for Illumina-initiated Sequencing Project

  • 1 min read

We were previously approached by Cindy Lawley (Illumina Market Development) for possible participation in an Illumina product development project, in which they wanted to have some geoduck tissue and DNA on-hand in case Illumina green-lighted the use of geoduck for testing out the new sequencing platform on non-model organisms. Well, guess what, Illumina has give the green light for sequencing our geoduck! However, they need at least 4μg of gDNA, so I’m isolating more.

Read More

Data Management - Replacement of Corrupt BGI Oly Genome FASTQ Files

  • ~1 min read

Previously, Sean and Steven identified two potentially corrupt FASTQ files. I contacted BGI about getting replacement files and they informed me that all versions of the FASTQ files they have delivered on three separate occasions are all the same file (despite having different file names). As such, I could use one of these versions to replace the corrupt FASTQ files. So, that’s what I did!

Read More

Goals - January 2017

  • 1 min read

One of the long-running goals I’ve had is to get this Oly GBS data taken care of and out the door to publication. I think I will finally succeed with this, with the help of Pub-A-Thon. Don’t get too excited, it’s not what you think. It is not the drinking extravaganza that the name implies. Instead, it’s a “friendly” lab competition to get some scientific publications assembled and submitted.

Read More

Back to Top ↑

2016

Data Management - Integrity Check of Final BGI Olympia Oyster & Geoduck Data

  • ~1 min read

After completing the downloads of these files from BGI, I needed to verify that the downloaded copies matched the originals. Below is a Jupyter Notebook detailing how I verified file integrity via MD5 checksums. It also highlights the importance of doing this check when working with large sequencing files (or, just large files in general), as a few of them had mis-matching MD5 checksums!

Read More