Posts by Year

2023

Trimming and QC - E5 Coral sRNA-seq Trimming Parameter Tests and Comparisons

  • 2 min read

In preparation for FastQC and trimming of the E5 coral sRNA-seq data, I noticed that my “default” trimming settings didn’t produce the results I expected. Specifically, since these are sRNAs and the NEBNext® Multiplex Small RNA Library Prep Set for Illumina (PDF) protocol indicates that the sRNAs should be ~21 - 30bp, it seemed odd that I was still ending up with read lengths of 150bp. So, I tried a couple of quick trimming comparisons on just a single pair of sRNA FastQs to use as examples to get feeback on how trimming should proceed.

Read More

Data Wrangling - P.meandrina Genome GFF to GTF Using gffread

  • ~1 min read

As part of getting P.meandrina genome info added to our Lab Handbook Genomic Resources page, I will index the P.meandrina genome file (Pocillopora_meandrina_HIv1.assembly.fasta) using HISAT2, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread to do this on my computer. Process is documented in Jupyter Notebook linked below.

Read More

FastQ QC and Trimming - E5 Coral RNA-seq Data for A.pulchra P.evermanni and P.meandrina Using FastQC fastp and MultiQC on Mox

  • 5 min read

After downloading and then reorganizing the E5 coral RNA-seq data from Azenta project 30-789513166, I ran FastQC for initial quality checks, followed by trimming with fastp, and then final QC with FastQC/MultiQC. This was performed on all three species in the data sets: A.pulchra, P.evermanni, and P.meandrina. All aspects were run on Mox.

Read More

Data Management - E5 Coral RNA-seq and sRNA-seq Reorganizing and Renaming

  • ~1 min read

Downloaded the E5 coral sRNA-seq data from Azenta project 30-852430235 on 20230515 and the E5 coral RNA-seq data from Azenta project 30-789513166 on 20230516. The data required some reorganization, as the project included data from three different species (Acropora pulchra, Pocillopora meandrina, and Porites evermanni). Additionally, since the project was sequencing the same exact samples with both RNA-seq and sRNA-seq, the resulting FastQ files ended up being the same. This fact seemed like it could lead to potential downstream mistakes and/or difficulty tracking whether or not someone was actually using an RNA-seq or an sRNA-seq FastQ.

Read More

lncRNA Identification - P.generosa lncRNAs using CPC2 and bedtools

  • ~1 min read

After trimming P.generosa RNA-seq reads on 20230426 and then aligning and annotating them to the Panopea-generosa-v1.0 genome on 20230426, I proceeded with the final step of lncRNA identification. To do this, I used Zach’s notebook entry on lncRNA identification for guidance. I utilized the annotated GTF generated by gffcompare during the alignment/annotation step on 20230426. I used ‘bedtools getfasta](https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html) and [CPC2` with an aribtrary 200bp minimum length to identify lncRNAs. All of this was done in a Jupyter Notebook (links below).

Read More

Containers - Apptainer Explorations

  • 6 min read

At some point, our HPC nodes on Mox will be retired. When that happens, we’ll likely purchase new nodes on the newest UW cluster, Klone. Additionally, the coenv nodes are no longer available on Mox. One was decommissioned and one was “migrated” to Klone. The primary issue at hand is that the base operating system for Klone appears to be very, very basic. I’d previously attempted to build/install some bioinformatics software on Klone, but could not due to a variety of missing libraries; these libraries are available by default on Mox… Part of this isn’t surprising, as UW IT has been making a concerted effort to get users to switch to containerization - specifically using Apptainer (formerly Singularity) containers.

Read More

Transcript Alignments - P.generosa RNA-seq Alignments for lncRNA Identification Using Hisat2 StingTie and gffcompare on Mox

  • 10 min read

This is a continuation of the process for identification of lncRNAs,. I aligned FastQs which were previously trimmed earlier today to our Panopea-generosa-v1.0 genome FastA using HISAT2. I used the HISAT2 genome index created on 20190723, which was created with options to identify exons and splice sites. The GFF used was from 20220323. StringTie was used to identify alternative transcripts, assign expression values, and create expression tables for use with ballgown. The job was run on Mox.

Read More

FastQ Trimming and QC - P.generosa RNA-seq Data from 20220323 on Mox

  • 4 min read

Addressing the update to this GitHub Issue regarding identifying Panopea generosa (Pacific geoduck) long non-coding RNAs (lncRNAs), I used the RNA-seq data from the Nextflow NF-Core RNAseq pipeline run on 20220323. Although that data was supposed to have been trimmed in the Nextflow NF-Core RNA-seq pipeline, the FastQC reports still show adapter contamination and some funky stuff happening at the 5’ end of the reads. So, I’ve opted to trim the “trimmed” files with fastp, using a hard 20bp trim at the 5’ end of all reads. FastQC and MultiQC were run before/after trimming. Job was run on Mox.

Read More

Data Wrangling - Append Gene Ontology Aspect to P.generosa Primary Annotation File

  • 1 min read

Steven tasked me with updating our P.generosa genome annotation file (GitHub Issue) a while back and I finally managed to get it all figured out. Although I wanted to perform most of this using the GSEAbase package (PDF), as this package is geared towards storage/retrieval of gene set data, I eventually decided to abondon this approach due to the time it was taking and my lack of familiarity/understanding of how to manipulate objects in R. Despite that, GSEAbase was still utilized for its very simple use for identifying GOlims (IDs and Terms).

Read More

Sequencing Read Taxonomic Classification - P.verrucosa E5 RNA-seq Using DIAMOND BLASTx and MEGAN daa-meganizer on Mox

  • 5 min read

After some discussion with Steven at Science Hour last week regarding the handling of endosymbiont sequences in the E5 P.verrucosa RNA-seq data, Steven thought it would be interesting to run the RNA-seq reads through MEGAN6 just to see what the taxonomic breakdown looks like. We may or may not (probably not) separating reads based on taxonomy. In the meantime, we’ll still proceed with HISAT2 alignments to the respective genomes as a means to separate the endosymbiont reads from the P.verrucosa reads.

Read More

Data Wrangling - C.goreaui Genome GFF to GTF Using gffread

  • ~1 min read

As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I also need to get the coral endosymbiont sequence. After talking with Danielle Becker in Hollie Putnam’s Lab at Univ. of Rhode Island, she pointed me to the Cladocopium goreaui genome from Chen et. al, 2022 available here. Access to the genome requires agreeing to some licensing provisions (primarily the requirment to cite the publication whenever the genome is used), so I will not be providing any public links to the file. In order to index the Cladocopium goreaui genome file (Cladocopium_goreaui_genome_fa) using HISAT2 for downstream isoform analysis using StringTie and ballgown, I need a corresponding GTF to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread to do this on my computer. Process is documented in Jupyter Notebook linked below.

Read More

Transcript Identification and Alignments - P.verrucosa RNA-seq with Pver_genome_assembly_v1.0 Using HiSat2 and Stringtie on Mox

  • 19 min read

After getting the RNA-seq data trimmed, it was time to perform alignments and determine expression levels of transcripts/isoforms using with HISAT2 and StringTie, respectively. StringTie was set to output tables formatted for import into ballgown. After those two analyses were complete, I ran gffcompare, using the merged StringTie GTF and the input GFF3. I caught this in one of Danielle Becker’s scripts and thought it might be interesting. The analsyes were run on Mox.

Read More

FastQ Trimming and QC - P.verrucosa RNA-seq Data from Danielle Becker in Hollie Putnam Lab Using fastp FastQC and MultiQC on Mox

  • 5 min read

After receiving the P.verrucosa RNA-seq data from Danielle Becker (Hollie Putnam’s Lab, Univ. of Rhode Island), I noticed that the trimmed reads didn’t appear to actually be trimmed. There was still adapter contamination (solely in R2 reads - suggesting the detect_adapter_for_pe option had been omitted from the fastp command?), but the reads had an average read length of 150bp - except when looking at the adapter content report!!??.

Read More

Data Wrangling - P.verrucosa Genome GFF to GTF Using gffread

  • ~1 min read

As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the P.verrucosa genome file (Pver_genome_assembly_v1.0.fasta) using HISAT2, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread to do this on my computer. Process is documented in Jupyter Notebook linked below.

Read More

Data Wrangling - M.capitata Genome GFF to GTF Using gffread

  • ~1 min read

As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the M.capitata genome file (Montipora_capitata_HIv3.assembly.fasta) using HISAT2, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread to do this on my computer. Process is documented in Jupyter Notebook linked below.

Read More

Data Wrangling - P.acuta Genome GFF to GTF Conversion Using gffread

  • ~1 min read

As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the P.acuta genome file using HISAT2, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread to do this on my computer. Process is documented in Jupyter Notebook linked below.

Read More

Back to Top ↑

2022

Data Wrangling - C.virginica NCBI GCF_002022765.2 GFF to Gene and Pseudogene Combined BED File

  • ~1 min read

Working on the CEABIGR project, I was preparing to make a gene expression file to use in CIRCOS (GitHub Issue) when I realized that the Ballgown gene expression file (CSV; GitHub) had more genes than the C.virginica genes BED file we were using. After some sleuthing, I discovered that the discrepancy was caused by the lack of pseudogenes in the genes BED file I was using. Although it might not really have any impact on things, I thought it would still be prudent to have a BED file that completely matched all of the genes in the Ballgown gene expression file. Plus, having the pseudogenes might be of longterm usefulness if we we ever decide to evalute the role of long non-coding RNAs (lncRNAs) in this project.

Read More

Data Wrangling - Identify C.virginica Genes with Different Predominant Isoforms for CEABIGR

  • ~1 min read

During today’s discussion, Yaamini recommended we generate a list of genes with different predominant isoforms between females and males, while also adding a column with a binary indicator (e.g. 0 or 1) to mark those genes which were not different (0) or were different (1) between sexes. Steven had already generated files identifying predominant isoforms in each sex:

Read More

RNAseq Alignments - P.generosa Alignments and Alternative Transcript Identification Using Hisat2 and StringTie on Mox

  • 15 min read

As part of identifying long non-coding RNA (lncRNA) in Pacific geoduck(GitHub Issue), one of the first things that I wanted to do was to gather all of our geoduck RNAseq data and align it to our geoduck genome. In addition to the alignments, some of the examples I’ve been following have also utilized expression levels as one aspect of the lncRNA selection criteria, so I figured I’d get this info as well.

Read More

qPCR - Repeat of Mussel Gill Heat Stress cDNA with Ferritin Primers

  • ~1 min read

My previous qPCR on these cDNA using ferritin primers (SRIDs: 1808, 1809) resulted in no amplification. This was a bit surprising and makes me suspect that I screwed up somewhere (not adding primer(s)??), so I decided to repeat the qPCR. I made fresh working primer stocks and used 1uL of cDNA for each reaction. All reactions were run in duplicate on our CFX Connect thermalcycler (BioRad) with SsoFast EVAgreen Master Mix (BioRad). See my previous post linked above for qPCR master mix calcs.

Read More

Data Wrangling - Create Primary P.generosa Genome Annotation File

  • 1 min read

Steven asked me to create a canonical genome annotation file (GitHub Issue). I needed/wanted to create a file containing gene IDs, SwissProt (SP) IDs, gene names, gene descriptions, and gene ontology (GO) accessions. To do so, I utilized the NCBI BLAST and DIAMOND BLAST annotations generated by our GenSas P.generosa genome annotation. Per Steven’s suggestion, I used the best match (i.e. lowest e-value) for any given gene between the two files.

Read More

Server Maintenance - Fix Server Certificate Authentication Issues

  • 2 min read

We had been encounterings issues when linking to images in GitHub (e.g. notebooks, Issues/Discussions) hosted on our servers (primarily Gannet). Images always showed up as broken links and, with some work, we could see an error message related to server authentication. More recently, I also noticed that Jupyter Notebooks hosted on our servers could not be viewed in NB Viewer. Attempting to view a Jupyter Notebook hosted on one of our servers results in a 404 error, with a note regarding server certificate problems. Finally, the most annoying issue was encountered when running the shell programs wget to retrieve files from our servers. This program always threw an error regarding our server certificates. The only way to run wget without this error was to add the option --no-check-certificate (which, thankfully, was a suggestion by wget error message).

Read More

Data Wrangling - P.generosa Genomic Feature FastA Creation

  • 1 min read

Steven wanted me to generate FastA files (GitHub Issue) for Panopea generosa (Pacific geoduck) coding sequences (CDS), genes, and mRNAs. One of the primary needs, though, was to have an ID that could be used for downstream table joining/mapping. I ended up using a combination of GFFutils and bedtools getfasta. I took advantage of being able to create a custom name column in BED files to generate the desired FastA description line having IDs that could identify, and map, CDS, genes, and mRNAs across FastAs and GFFs.

Read More

Differential Gene Expression - P.generosa DGE Between Tissues Using Nextlow NF-Core RNAseq Pipeline on Mox

  • 7 min read

Steven asked that I obtain relative expression values for various geoduck tissues (GitHub Issue). So, I decided to use this as an opportunity to try to use a Nextflow pipeline. There’s an RNAseq pipeline, NF-Core RNAseq which I decided to use. The pipeline appears to be ridiculously thorough (e.g. trims, removes gDNA/rRNA contamination, allows for multiple aligners to be used, quantifies/visualizes feature assignments by reads, performs differential gene expression analysis and visualization), all in one package. Sounds great, but I did have some initial problems getting things up and running. Overall, getting things set up to actually run took longer than the actual pipeline run! Oh well, it’s a learning process, so that’s not totally unexpected.

Read More

Data Analysis - C.virginica RNAseq Zymo ZR4059 Analyzed by ZymoResearch

  • 2 min read

After realizing that the Crassostrea virginica (Eastern oyster) RNAseq data had relatively low alignment rates (see this notebook entry from 20220224 for a bit more background), I contacted ZymoResearch to see if they had any insight on what might be happening. I suspected rRNA contamination. ZymoResearch was kind enough to run the RNAseq data through their pipeline and provided us. This notebook entry provides a brief overview and thoughts on the report.

Read More

Transcript Identification and Alignments - C.virginica RNAseq with NCBI Genome GCF_002022765.2 Using Hisat2 and Stringtie on Mox

  • 14 min read

After an additional round of trimming yesterday, I needed to identify alternative transcripts in the Crassostrea virginica (Eastern oyster) gonad RNAseq data we have. I previously used HISAT2 to index the NCBI Crassostrea virginica (Eastern oyster) genome and identify exon/splice sites on 20210720. Then, I used this genome index to run StringTie on Mox in order to map sequencing reads to the genome/alternative isoforms.

Read More

Trimming - Additional 20bp from C.virginica Gonad RNAseq with fastp on Mox

  • 6 min read

When I previously aligned trimmed RNAseq reads to the NCBI C.virginica genome (GCF_002022765.2) on 20210726, I specifically noted that alignment rates were consistently lower for males than females. However, I let that discrepancy distract me from a the larger issue: low alignment rates. Period! This should have thrown some red flags and it eventually did after Steven asked about overall alignment rate for an alignment of this data that I performed on 20220131 in preparation for genome-guided transcriptome assembly. The overall alignment rate (in which I actually used the trimmed reads from 20210714) was ~67.6%. Realizing this was a on the low side of what one would expect, it prompted me to look into things more and I came across a few things which led me to make the decision to redo the trimming:

Read More

Data Wrangling - C.virginica lncRNA Extractions from NCBI GCF_002022765.2 Using GffRead

  • ~1 min read

Continuing to work on our Crassostrea virginica (Eastern oyster) project examining the effects of OA on female and male gonads (GitHub repo), Steven tasked me with parsing out long, non-coding RNAs (GitHub Issue). To do so, I relied on the NCBI genome and associated files/annotations. I used GffRead, GFFutils, and samtools. The process was documented in the followng Jupyter Notebook:

Read More

Transcriptome Assembly - Genome-guided C.virginica Adult Gonad OA RNAseq Using Trinity on Mox

  • 4 min read

As part of this project, Steven’s asked that I identify long, non-coding RNAs (lncRNAs) (GitHub Issue) in the Crassostrea virginica (Eastern oyster) adult OA gonad RNAseq data we have. The initial step for this is to assemble transcriptome. I generated the necessary BAM alignment on 20220131. Next was to actually get the transcriptome assembled. I followed the Trinity genome-guided procedure.

Read More

RNAseq Alignment - C.virginica Adult OA Gonad Data to GCF_002022765.2 Genome Using HISAT2 on Mox

  • 5 min read

As part of this project, Steven’s asked that I identify long, non-coding RNAs (lncRNAs) (GitHub Issue) in the Crassostrea virginica (Eastern oyster) adult OA gonad RNAseq data we have. The initial step for this is to assemble transcriptome. Since there is a published genome (NCBI RefSeq GCF_002022765.2C_virginica-3.0)](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/022/765/GCF_002022765.2_C_virginica-3.0/) for [_Crassostrea virginica (Eastern oyster), I will perform a genome-guided assembly using Trinity. That process requires a sorted BAM file as input. In order to generate that file, I used HISAT2. I’ve already generated the necessary HISAT2 genome index files (as of 20210720), which also identified/incorporated splice sites and exons, which the HISAT2 alignment process requires to run.

Read More

Data Wrangling - C.virginica Gonad RNAseq Transcript Counts Per Gene Per Sample Using Ballgown

  • ~1 min read

As we continue to work on the analysis of impacts of OA on Crassostrea virginica (Eastern oyster) gonads via DNA methylation and RNAseq (GitHub repo), we decided to compare the number of transcripts expressed per gene per sample (GitHub Issue). As it turns out, it was quite the challenge. Ultimately, I wasn’t able to solve it myself, and turned to StackOverflow for a solution. I should’ve just done this at the beginning, as I got a response (and solution) less than five minutes after posting! Regardless, the data wrangling progress (struggle?) was documented in the following GitHub Discussion:

Read More

Project Summary - Matt George PSMFC Mytilus Byssus Project

  • ~1 min read

This will be a “dynamic” notebook entry, whereby I will update this post continually as I process new samples, analyze new data, etc for this project. The hope is to make it easier to find all the work I’ve done for this without having to search my notebook to find individual notebook entries.

Read More

RNA Isolation - M.trossulus Gill and Phenol Gland

  • 2 min read

As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):

Read More

RNA Isolation - M.trossulus Gill and Phenol Gland

  • 1 min read

As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):

Read More

RNA Isolation - M.trossulus Gill and Phenol Gland

  • 2 min read

As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):

Read More

Back to Top ↑

2021

RNA Isolation - M.trossulus Phenol Gland and Gill

  • 2 min read

As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):

Read More

Project Summary - O.nerka Berdahl Samples

  • ~1 min read

This will be a “dynamic” notebook entry, whereby I will update this post continually as I process new samples, analyze new data, etc for this project. The hope is to make it easier to find all the work I’ve done for this without having to search my notebook to find individual notebook entries.

Read More

RNA Isolation - O.nerka Berdahl Tissues

  • 3 min read

Finally got around to tackling this GitHub issue regarding isolating RNA from some Oncorhynchus nerka (sockeye salmon) tissues we have from Andrew Berdahl’s lab (a UW SAFS professor) to use for RNAseq and/or qPCR. We have blood, brain, gonad, and liver samples from individual salmon from two different groups: territorial and social individuals. We’ve decided to isolate RNA from brain, gonads, and liver from two individuals within each group. All samples are preserved in RNAlater and stored @ -80oC.

Read More

Data Wrangling - C.virginica NCBI GCF_002022765.2 GFF to Gene BED File

  • 1 min read

When working to identify differentially expressed transcripts (DETs) and genes (DEGs) for our Crassostrea virginica (Eastern oyster) RNAseq/DNA methylation comparison of changes across sex and ocean acidification conditions (https://github.com/epigeneticstoocean/2018_L18-adult-methylation), I realized that the DEG tables I was generating had excessive gene counts due to the fact that the analysis (and, in turn, the genome coordinates), were tied to transcripts. Thus, genes were counted multiple times due to the existence of multiple transcripts for a given gene, and the analysis didn’t list gene coordinate data - only transcript coordinates.

Read More

Differential Transcript Expression - C.virginica Gonad RNAseq Using Ballgown

  • 1 min read

In preparation for differential transcript analysis, I previously ran our RNAseq data through StringTie on 20210726 to identify and quantify transcripts. Identification of differentially expressed transcripts (DETs) and genes (DEGs) will be performed using ballgown. This notebook entry will be different than most others, as this notebook entry will simply serve as a “landing page” to access/review the analysis; as the analysis will evolve over time and won’t exist as a single computing job with a definitive endpoint.

Read More

Assembly Indexing - C.bairdi Transcriptome cbai_transcriptome_v3.1.fasta with Hisat2 on Mox

  • 2 min read

We recently received reviews back for the Tanner crab paper submission (“Characterization of the gene repertoire and environmentally driven expression patterns in Tanner crab (Chionoecetes bairdi)”) and one of the reviewers requested a more in-depth analysis. As part of addressing this, we’ve decided to identify SNPs withing the _Chionoecetes bairdi (Tanner crab) transcriptome used in the paper (cbai_transcriptome_v3.1). Since the process involves aligning sequencing reads to the transcriptome, the first thing that needed to be done was to generate index files for the aligner (HISAT2, in this particular case), so I ran HISAT2 on Mox.

Read More

Computer Management - Disable Sleep and Hibernation on Raven

  • ~1 min read

We’ve been having an issue with our computer Raven where it would become inaccessible after some time after a reboot. Attempts to remote in would just indicate no route to host or something like that. We realized it seemed like this was caused by a power saving setting, but changing the sleep setting in the Ubuntu GUI menu didn’t fix the issue. It also seemd like the sleep/hibernate issue was only a problem after the computer had been rebooted and no one had logged in yet…

Read More

Genome Analysis - Identification of Potential Contaminating Sequences in Panopea-generosa-v1.0 Assembly Using BlobToolKit on Mox

  • 6 min read

As part of our Panopea generosa (Pacific geoduck) genome sequencing efforts, Steven came across a tool designed to help identify if there are any contaminating sequences in your assembly. The software is BlobToolKit. The software is actually a complex pipeline of separate tools ([minimap2])https://github.com/lh3/minimap2, BLAST, DIAMOND BLAST, and BUSCO) which aligns sequencing reads and assigns taxonomy to the reads, as well as marking regions of the assembly with various taxonomic assignments.

Read More

Genome Assembly - Olurida_v090 with BGI Illumina and PacBio Hybrid Using Wengan on Mox

  • 3 min read

I was recently tasked with adding annotations for our Ostrea lurida genome assembly to NCBI. As it turns out, adding just annotation files can’t be done since the genome was initially submitted to ENA. Additionally, updating the existing ENA submission with annotations is not possible, as it requires a revocation of the existing genome assembly; requiring a brand new submission. With that being the case, I figured I’d just make a new genome submission with the annotations to NCBI. Unfortunately, there were a number of issues with our genome assembly that were going to require a fair amount of work to resolve. The primary concern was that most of the sequences are considered “low quality” by NCBI (too many and too long stretches of Ns in the sequences). Revising the assembly to make it compatible with the NCBI requirements was going to be too much, so that was abandoned.

Read More

Trimming - O.lurida BGI FastQs with FastP on Mox

  • 3 min read

After attempting to submit our Ostrea lurida (Olympia oyster) genome assembly annotations (via GFF) to NCBI, the submission process also highlighted some short comings of the Olurida_v081 assembly. When getting ready to submit the genome annotations to NCBI, I was required to calculate the genome coverage we had. NCBI suggested to calculate this simply by counting the number of bases sequenced and divide it by the genome size. Doing this resulted in an estimated coverage of ~55X coverage, yet we have significant stretches of Ns throughout the assembly. I understand why this is still technically possible, but it’s just sticking in my craw. So, I’ve decided to set up a quick assembly to see what I can come up with. Of note, the canonical assembly we’ve been using relied on the scaffolded assembly provided by BGI; we never attempted our own assembly from the raw data.

Read More

Genome Submission - Validation of Olurida_v081.fa and Annotated GFFs Prior to Submission to NCBI

  • 3 min read

Per this GitHub Issue, Steven has asked to get our Ostrea lurida (Olympia oyster) genome assembly (Olurida_v081.fa) submitted to NCBI with annotations. The first step in the submission process is to use the NCBI table2asn_GFF software to validate the FastA assembly, as well as the GFF annotations file. Once the software has been run, it will point out any errors which need to be corrected prior to submission.

Read More

Read Mapping - 10x-Genomics Trimmed FastQ Mapped to P.generosa v1.0 Assembly Using Minimap2 for BlobToolKit on Mox

  • 2 min read

To continue towards getting our Panopea generosa (Pacific geoduck) genome assembly (v1.0) analyzed with BlobToolKit, per this GitHub Issue, I’ve decided to run each aspect of the pipeline manually, as I continue to have issues utilizing the automatic pipeline. As such, I’ve run minimap2 according to the BlobToolKit “Getting Started” guide on Mox. This will map the trimmed 10x-Genomics reads from 20210401 to the Panopea-generosa-v1.0.fa assembly (FastA; 914MB).

Read More

Back to Top ↑

2020

Transcriptome Comparisons - C.bairdi Transcriptomes Evaluations with DETONATE rsem-eval on Mox

  • 5 min read

UPDATE: I’ll lead in with the fact that this failed with an error message that I can’t figure out. This will save the reader some time. I’ve posted the problem as an Issue on the DETONATE GitHub repo, however it’s clear that this software is no longer maintained, as the repo hasn’t been updated in >3yrs; even lacking responses to Issues that are that old.

Read More

Alignments - C.bairdi RNAseq Transcriptome Alignments Using Bowtie2 on Mox

  • 5 min read

I had previously attempted to compare all of our C.bairdi transcriptome assemblies using DETONATE on 20200601, but, due to hitting time limits on Mox, failed to successfully get the analysis to complete. I realized that the limiting factor was performing FastQ alignments, so I decided to run this step independently to see if I could at least get that step resolved. DETONATE (rsem-eval) will accept BAM files as input, so I’m hoping I can power through this alignment step and then provided DETONATE (rsem-eval) with the BAM files.

Read More

FastQC-MultiQc - C.gigas Ploidy pH WGBS Raw Sequence Data from Haws Lab on Mox

  • 2 min read

Yesterday (20201205), we received the whole genome bisulfite sequencing (WGBS) data back from ZymoResearch from the 24 C.gigas diploid/triploid subjected to two different pH treatments (received from the Haws’ Lab on 20200820 that we submitted to ZymoResearch on 20200824. As part of our standard sequencing data receipt pipeline, I needed to generate FastQC files for each sample.

Read More

Sample Submission - M.magister MBD BSseq Libraries for MiSeq at NOAA

  • 1 min read

Earlier today I quantified the libraries with the Qubit in preparation for sample pooling and sequencing. Before performing a full sequencing run, Mac wanted to select a subset of the libraries based on the experimental treatments to have an equal representation of samples. She also wanted to do a quick run on the MiSeq at NOAA to evaluate how well libraries map and to make sure libraries appear to be sequencing at relatively equal levels.

Read More

Transcriptome Assessment - Crustacean Transcripome Completeness Evaluation Using BUSCO on Mox

  • 4 min read

Grace was recently working on writing up a manuscript which did a basic comparison of our C.bairdi transcriptome (cbai_transcriptome_v3.1) (see the Genomic Resources wiki for more deets) to two other species’ transcriptome assemblies. We wanted BUSCO evaluations as part of this comparison, but the two other species did not have BUSCO scores in their respective publications. As such, I decided to generate them myself, as BUSCO runs very quickly. The job was run on Mox.

Read More

Hard Drive Upgrade - Gannet Synology Server

  • ~1 min read

Completed upgrading the 12 x 8TB HDDs in our server, Gannet (Synology RS3618XS), to 12 x 16TB HDDs. The process was simple, but the repair process took ~20hrs for each new drive. So, the entire process required 12 separate days of pulling out one old HDD, replacing with a new HDD, and initiating the repair process in the Synology web interface.

Read More

MBD Selection - M.magister Sheared Gill gDNA 16 of 24 Samples Set 3 of 3

  • 1 min read

Click here for notebook on the first eight samples processed. Click here for the second set of eight samples processed. M.magister (Dungeness crab) gill gDNA provided by Mackenzie Gavery was previously sheared on 20201026 and three samples were subjected to additional rounds of shearing on 20201027, in preparation for methyl bidning domain (MBD) selection using the MethylMiner Kit (Invitrogen).

Read More

Trimming - Shelly S.salar RNAseq Using fastp and MultiQC on Mox

  • 3 min read

Shelly asked that I trim, align to a genome, and perform transcriptome alignment counts in this GitHub issue with some Salmo salar RNAseq data she had and, using a subset of the NCBI Salmo salar RefSeq genome, GCF_000233375.1. She created a subset of this genome using only sequences designated as “chromosomes.” A link to the FastA (and a link to her notebook on creating this file) are in that GitHub issue link above. The transcriptome she has provided has not been subsetted in a similar fashion; maybe I’ll do that prior to alignment.

Read More

DNA Shearing - M.magister CH05-21 gDNA Full Shearing Test and Bioanalyzer

  • 2 min read

Yesterday, I did some shearing of Metacarcinus magister gill gDNA on a test sample (CH05-21) to determine how many cycles to run on the sonicator (Bioruptor 300; Diagenode) to achieve an average fragment length of ~350 - 500bp in preparation for MBD-BSseq. The determination from yesterday was 70 cycles (30s ON, 30s OFF; low intensity). That determination was made by first sonicating for 35 cycles, followed by successive rounds of 5 cycles each. I decided to repeat this, except by doing it in a single round of sonication.

Read More

DNA Shearing - M.magister gDNA Shear Testing and Bioanalyzer

  • 1 min read

Steven assigned me to do some MBD-BSseq library prep (GitHub Issue) for some Dungeness crab (Metacarcinus magister) DNA samples provided by Mackenzie Gavery. The DNA was isolated from juvenile (J6/J7 developmental stages) gill tissue. One of the first steps in MBD-BSseq is to fragment DNA to a desired size (~350 - 500bp in our case). However, we haven’t worked with Metacarcinus magister DNA previously, so I need to empirically determine sonicator (Bioruptor 300; Diagenode) settings for these samples.

Read More

Read Mapping - C.bairdi 201002558-2729-Q7 and 6129-403-26-Q7 Taxa-Specific NanoPore Reads to cbai_genome_v1.01.fasta Using Minimap2 on Mox

  • 2 min read

After extracting FastQ reads using seqtk on 20201013 from the various taxa I had been interested in, the next thing needed doing was mapping reads to the cbai_genome_v1.01 “genome” assembly from 20200917. I found that Minimap2 will map long reads (e.g. NanoPore), in addition to short reads, so I decided to give that a rip.

Read More

Data Wrangling - C.bairdi NanoPore Reads Extractions With Seqtk on Mephisto

  • 1 min read

In my pursuit to identify which contigs/scaffolds of our C.bairdi” genome assembly from 20200917 correspond to interesting taxa, based on taxonomic assignments produced by MEGAN6 on 20200928, I used MEGAN6 to extract taxa-specific reads from cbai_genome_v1.01 on 20201007 - the output is only available in FastA format. Since I want the original reads in FastQ format, I will use the FastA sequence IDs (from the FastA index file) and provide that to seqtk to extract the FastQ reads for each sample and corresponding taxa.

Read More

Taxonomic Assignments - C.bairdi 6129-403-26-Q7 NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on emu

  • 3 min read

After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.

Read More

Taxonomic Assignments - C.bairdi 20102558-2729-Q7 NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on emu

  • 3 min read

After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.

Read More

Data Wrangling - C.bairdi NanoPore 6129-403-26 Quality Filtering Using NanoFilt on Mox

  • 2 min read

Last week, I ran all of our Q7-filtered C.baird NanoPore reads through MEGAN6 to evaluate the taxonomic breakdown (on 20200917) and noticed that there were a large quantity of bases assigned to E.canceri (a known microsporidian agent of infection in crabs) and Aquifex sp. (a genus of thermophylic bacteria), in addition to the expected Arthropoda assignments. Notably, Alveolata assignments were remarkably low.

Read More

Data Wrangling - C.bairdi NanoPore 20102558-2729 Quality Filtering Using NanoFilt on Mox

  • 2 min read

Last week, I ran all of our Q7-filtered C.baird NanoPore reads through MEGAN6 to evaluate the taxonomic breakdown (on 20200917) and noticed that there were a large quantity of bases assigned to E.canceri (a known microsporidian agent of infection in crabs) and Aquifex sp. (a genus of thermophylic bacteria), in addition to the expected Arthropoda assignments. Notably, Alveolata assignments were remarkably low.

Read More

Data Wrangling - Subsetting cbai_genome_v1.0 Assembly with faidx

  • 1 min read

Previously assembled cbai_genome_v1.0.fasta with our NanoPore Q7 reads on 20200917 and noticed that there were numerous sequences that were well shorter than the expected 500bp threshold that the assembler (Flye) was supposed to spit out. I created an Issue on the Flye GitHub page to find out why. The developer responded and determined it was an issue with the assembly polisher and that sequences <500bp could be safely ignored.

Read More

DNA Quantification - Re-quant Ronits C.gigas Diploid-Triploid Ctenidia gDNA Submitted to ZymoResearch

  • 1 min read

I received notice from ZymoResearch yesterday afternoon that the DNA we sent on 20200820 for this project (Quote 3534) had insufficient DNA for sequencing for most of the samples. This was, honestly, shocking. I had even submitted well over the minimum amount of DNA required (submitted 1.75ug - only needed 1ug). So, I’m not entirely sure what happened here.

Read More

Primer Design and In-Silico Testing - Geoduck Reproduction Primers

  • 1 min read

Shelly asked that I re-run the primer design pipeline that Kaitlyn had previously run to design a set of reproduction-related qPCR primers. Unfortunately, Kaitlyn’s Jupyter Notebook wasn’t backed up and she accidentally deleted it, I believe, so there’s no real record of how she designed the primers. However, I do know that she was unable to run the EMBOSS primersearch tool, which will check your primers against a set of sequences for any other matches. This is useful for confirming specificity.

Read More

Metagenomics - Data Extractions Using MEGAN6

  • 1 min read

Decided to finally take the time to methodically extract data from our metagenomics project so that I have the tables handy when I need them and I can easily share them with other people. Previously, I hadn’t done this due to limitations on looking at the data remotely. I finally downloaded all of the RMA6 files from 20191014 after being fed up with the remote desktop connection and upgrading the size of my hard drive (5 of the six RMA6 files are >40GB in size).

Read More

Sequence Extractions - C.bairdi Transcriptomes v2.0 and v3.0 Excluding Alveolata with MEGAN6 on Swoose

  • ~1 min read

Continuing to try to identify the best C.bairdi transcriptome, we decided to extract all non-dinoflagellate sequences from cbai_transcriptome_v2.0 (RNAseq shorthand: 2018, 2019, 2020-GW, 2020-UW) and cbai_transcriptome_v3.0 (RNAseq shorthand: 2018, 2019, 2020-UW). Both of these transcriptomes were assembled without any taxonomic filter applied. DIAMOND BLASTx and conversion to MEGAN6 RMA6 files was performed yesterday (20200604).

Read More

Transcriptome Comparison - C.bairdi Transcriptomes Compared with DETONATE on Mox

  • 4 min read

We’ve produced a number of C.bairdi transcriptomes and we’re interested in doing some comparisons to try to determine which one might be “best”. I previously compared the BUSCO scores of each of these transcriptomes and now will be using the DETONATE software package to perform two different types of comparisons: compared to a reference (REF-EVAL) and determine an overall quality “score” (RSEM-EVAL). I’ll be running REF-EVAL in this notebook.

Read More

Transcriptome Assembly - C.bairdi All Pooled Arthropoda-only RNAseq Data with Trinity on Mox

  • 2 min read

For completeness sake, I wanted to create an additional C.bairdi transcriptome assembly that consisted of Arthropoda only sequences from just pooled RNAseq data (since I recently generated a similar assembly without taxonomically filtered reads on 20200518). This constitutes samples we have designated: 2018, 2019, 2020-UW. A de novo assembly was run using Trinity on Mox. Since all pooled RNAseq libraries were stranded, I added this option to Trinity command.

Read More

Transcriptome Assembly - P.trituberculatus (Japanese blue crab) NCBI SRA BioProject PRJNA597187 Data with Trinity on Mox

  • 3 min read

After generating a number of C.bairdi (Tanner crab) transcriptomes, we decided we should compare them to evaluate which to help decide which one should become our “canonical” version. As part of that, the Trinity wiki offers a list of tools that one can use to check the quality of transcriptome assemblies. Some of those require a transcriptome of a related species.

Read More

SRA Library Assessment - Determine RNAseq Library Strandedness from P.trituberculatus SRA BioProject PRJNA597187

  • 3 min read

We’ve produced a number of C.bairid transcriptomes utilizing different assembly approaches (e.g. Arthropoda reads only, stranded libraries only, mixed strandedness libraries, etc) and we want to determine which of them is “best”. Trinity has a nice list of tools to assess the quality of transcriptome assemblies, but most of the tools rely on comparison to a transcriptome of a related species.

Read More

Transcriptome Assembly - C.bairdi All Pooled RNAseq Data Without Taxonomic Filters with Trinity on Mox

  • 2 min read

Steven asked that I assemble a transcriptome with just our pooled C.bairdi RNAseq data (not taxonomically filtered; see the FastQ list file linked in the Results section below). This constitutes samples we have designated: 2018, 2019, 2020-UW. A de novo assembly was run using Trinity on Mox. Since all pooled RNAseq libraries were stranded, I added this option to Trinity command.

Read More

GO to GOslim - C.bairdi Enriched GO Terms from 20200422 DEGs

  • 6 min read

After running pairwise comparisons and identify differentially expressed genes (DEGs) on 20200422 and finding enriched gene ontology terms, I decided to map the GO terms to Biological Process GOslims. Additionally, I decided to try another level of comparison (I’m not sure how valid it is), whereby I will count the number of GO terms assigned to each GOslim and then calculate the percentage of GOterms that get assigned to each of the GOslim categories. The idea being that it might help identify Biological Processes that are “favored” in a given set of DEGs. I decided to set up “fancy” pyramid plots to view a given set of GO-GOslims for each DEG comparison.

Read More

NanoPore Sequencing - C.bairdi gDNA 6129_403_26

  • 1 min read

After getting high quality gDNA from Hematodinium-infected C.bairdi hemolymph on 2020210 we decided to run some of the sample on the NanoPore MinION, since the flowcells have a very short shelf life. Additionally, the results from this will also help inform us on whether this sample might worth submitting for PacBio sequencing. And, of course, this provides us with additional sequencing data to complement our previous NanoPore runs from 20200109.

Read More

qPCR - C.bairdi RNA Check for Residual gDNA

  • 1 min read

Previuosly checked existing crab RNA for residual gDNA on 20200226 and identified samples with yields that were likely too low, as well as samples with residual gDNA. For those samples, was faster/easier to just isolate more RNA and perform the in-column DNase treatment in the ZymoResearch Quick DNA/RNA Microprep Plus Kit; this keeps samples concentrated. So, I isolated more RNA on 20200306 and now need to check for residual gDNA.

Read More

Trimming/MultiQC - Methcompare Bisulfite FastQs with fastp on Mox

  • 3 min read

Steven asked me to trim a set of FastQ files, provided by Hollie Putnam, in preparation for methylation analysis using Bismark. The analysis is part of a coral project comparing DNA methylation profiles of different species, as well as comparing different sample prep protocols. There’s a dedicated GitHub repo here:

Read More

RNA Isolation and Quantification - C.bairdi RNA from Hemolymph Pellets in RNAlater

  • ~1 min read

Based on qPCR results testing for residual gDNA from 20200225, a set of 24 samples were identified that required DNase treatment and/or additional RNA. I opted to just isolate more RNA from all samples, since the kit includes a DNase step and avoids diluting the existing RNA using the Turbo DNA-free Kit that we usully use. Isolated RNA using the Quick DNA/RNA Microprep Kit (ZymoResearch; PDF) according to the manufacturer’s protocol for liquids/cells in RNAlater.

Read More

DNA Isolation, Quantification, and Gel - C.bairdi gDNA Sample 6129_403_26

  • 1 min read

In order to do some genome sequencing on C.bairid and Hematodinium, we need hihg molecular weight gDNA. I attempted this twice before, using two different methods (Quick DNA/RNA Microprep Kit (ZymoResearch) on 20200122 and the E.Z.N.A Mollusc DNA Kit (Omega) on 20200108) using ~10yr old ethanol-preserved tissue provided by Pam Jensen. Both methods yielded highly degrade gDNA. So, I’m now attempting to get higher quality gDNA from the RNAlater-preserved hemolymph pellets from this experiment.

Read More

Gene Expression - Hematodinium MEGAN6 with Trinity and EdgeR

  • 2 min read

After completing annotation of the Hematodinium MEGAN6 taxonomic-specific Trinity assembly using Trinotate on 20200126, I performed differential gene expression analysis and gene ontology (GO) term enrichment analysis using Trinity’s scripts to run EdgeR and GOseq, respectively. The comparison listed below is the only comparison possible, as there were no reads present in the uninfected Hematodinium extractions.

Read More

Gene Expression - C.bairdi MEGAN6 with Trinity and EdgeR

  • 2 min read

After completing annotation of the C.bairdi MEGAN6 taxonomic-specific Trinity assembly using Trinotate on 20200126, I performed differential gene expression analysis and gene ontology (GO) term enrichment analysis using Trinity’s scripts to run EdgeR and GOseq, respectively, across all of the various treatment comparisons. The comparison are listed below and link to each individual SBATCH script (GitHub) used to run these on Mox.

Read More

RNA Isolation and Quantification - C.bairdi Hemocyte Pellets in RNAlater Troubleshooting

  • 1 min read

After the failure to obtain RNA from any C.bairdi hemocytes pellets (out of 24 samples processed) on 20200117, I decided to isolate RNA from just a subset of that group to determine if I screwed something up last time or something. Also, I am testing two different preparations of the kit-supplied DNase I: one Kaitlyn prepped and a fresh preparation that I made. Admittedly, I’m not doing the “proper” testing by trying the different DNase preps on the same exact sample, but it’ll do. I just want to see if I get some RNA from these samples this time…

Read More

NanoPore Sequencing - Initial NanoPore MinION Lambda Sequencing Test

  • 1 min read

We recently acquired a NanoPore MinION sequencer, FLO-MIN106 flow cell and the Rapid Sequencing Kit (SQK-RAD004). The NanoPore website provides a pretty thorough an user-friendly walk-through of how to begin using the system for the first time. With that said, I believe the user needs to have a registered account with NanoPore and needs to have purchased some products to have full access to the protocols they provide.

Read More

Back to Top ↑

2019