- 2023 34
- 2022 46
- 2021 48
- 2020 186
- 2019 165
- 2018 141
- 2017 80
- 2016 74
- 2015 153
- 2014 51
- 2013 13
- 2012 36
- 2011 85
- 2010 79
- 2009 235
- 2008 16
2023
Trimming and QC - E5 Coral sRNA-seq Trimming Parameter Tests and Comparisons
In preparation for FastQC
and trimming of the E5 coral sRNA-seq data, I noticed that my “default” trimming settings didn’t produce the results I expected. Specifically, since these are sRNAs and the NEBNext® Multiplex Small RNA Library Prep Set for Illumina (PDF) protocol indicates that the sRNAs should be ~21 - 30bp, it seemed odd that I was still ending up with read lengths of 150bp. So, I tried a couple of quick trimming comparisons on just a single pair of sRNA FastQs to use as examples to get feeback on how trimming should proceed.
Data Wrangling - P.meandrina Genome GFF to GTF Using gffread
As part of getting P.meandrina genome info added to our Lab Handbook Genomic Resources page, I will index the P.meandrina genome file (Pocillopora_meandrina_HIv1.assembly.fasta
) using HISAT2
, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread
to do this on my computer. Process is documented in Jupyter Notebook linked below.
FastQ QC and Trimming - E5 Coral RNA-seq Data for A.pulchra P.evermanni and P.meandrina Using FastQC fastp and MultiQC on Mox
After downloading and then reorganizing the E5 coral RNA-seq data from Azenta project 30-789513166, I ran FastQC for initial quality checks, followed by trimming with fastp
, and then final QC with FastQC/MultiQC. This was performed on all three species in the data sets: A.pulchra, P.evermanni, and P.meandrina. All aspects were run on Mox.
Data Management - E5 Coral RNA-seq and sRNA-seq Reorganizing and Renaming
Downloaded the E5 coral sRNA-seq data from Azenta project 30-852430235 on 20230515 and the E5 coral RNA-seq data from Azenta project 30-789513166 on 20230516. The data required some reorganization, as the project included data from three different species (Acropora pulchra, Pocillopora meandrina, and Porites evermanni). Additionally, since the project was sequencing the same exact samples with both RNA-seq and sRNA-seq, the resulting FastQ files ended up being the same. This fact seemed like it could lead to potential downstream mistakes and/or difficulty tracking whether or not someone was actually using an RNA-seq or an sRNA-seq FastQ.
Data Received - Coral RNA-seq Data from Azenta Project 30-789513166
Small RNA-seq (sRNA-seq) data was made available from the coral E5 Azenta project 30-789513166. Sample sheet:
Data Received - Coral sRNA-seq Data from Azenta Project 30-852430235
Small RNA-seq (sRNA-seq) data was made available from the coral E5 Azenta project 30-852430235. Sample sheet is below.
lncRNA Expression - P.generosa lncRNA Expression Using StringTie
After identifying lncRNA in P.generosa, Steven asked that I generate an tissue-specific expression/count matrix (GitHub Issue). Looking through the documentation for StringTie
, I decided that StringTie
would work for this. The overall approach:
Daily Bits - May 2023
20230517
lncRNA Identification - P.generosa lncRNAs using CPC2 and bedtools
After trimming P.generosa RNA-seq reads on 20230426 and then aligning and annotating them to the Panopea-generosa-v1.0 genome on 20230426, I proceeded with the final step of lncRNA identification. To do this, I used Zach’s notebook entry on lncRNA identification for guidance. I utilized the annotated GTF generated by gffcompare
during the alignment/annotation step on 20230426. I used ‘bedtools getfasta](https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html) and [
CPC2` with an aribtrary 200bp minimum length to identify lncRNAs. All of this was done in a Jupyter Notebook (links below).
Containers - Apptainer Explorations
At some point, our HPC nodes on Mox will be retired. When that happens, we’ll likely purchase new nodes on the newest UW cluster, Klone. Additionally, the coenv
nodes are no longer available on Mox. One was decommissioned and one was “migrated” to Klone. The primary issue at hand is that the base operating system for Klone appears to be very, very basic. I’d previously attempted to build/install some bioinformatics software on Klone, but could not due to a variety of missing libraries; these libraries are available by default on Mox… Part of this isn’t surprising, as UW IT has been making a concerted effort to get users to switch to containerization - specifically using Apptainer (formerly Singularity) containers.
Transcript Alignments - P.generosa RNA-seq Alignments for lncRNA Identification Using Hisat2 StingTie and gffcompare on Mox
This is a continuation of the process for identification of lncRNAs,. I aligned FastQs which were previously trimmed earlier today to our Panopea-generosa-v1.0 genome FastA using HISAT2
. I used the HISAT2
genome index created on 20190723, which was created with options to identify exons and splice sites. The GFF used was from 20220323. StringTie
was used to identify alternative transcripts, assign expression values, and create expression tables for use with ballgown
. The job was run on Mox.
FastQ Trimming and QC - P.generosa RNA-seq Data from 20220323 on Mox
Addressing the update to this GitHub Issue regarding identifying Panopea generosa (Pacific geoduck) long non-coding RNAs (lncRNAs), I used the RNA-seq data from the Nextflow NF-Core RNAseq pipeline run on 20220323. Although that data was supposed to have been trimmed in the Nextflow NF-Core RNA-seq pipeline, the FastQC reports still show adapter contamination and some funky stuff happening at the 5’ end of the reads. So, I’ve opted to trim the “trimmed” files with fastp
, using a hard 20bp trim at the 5’ end of all reads. FastQC
and MultiQC
were run before/after trimming. Job was run on Mox.
Data Wrangling - CEABIGR C.virginica Exon Expression Table
As part of the CEABIGR project (GitHub repo), Steven asked that I generate an exon expression table (GitHub Issue) where each row is a gene and the columns are the corresponding exons, filled with their expression value. For this, I planned on using the read count from the ballgown
e_data.ctab
files as an expression value.
Daily Bits - April 2023
20230403
Data Wrangling - Append Gene Ontology Aspect to P.generosa Primary Annotation File
Steven tasked me with updating our P.generosa genome annotation file (GitHub Issue) a while back and I finally managed to get it all figured out. Although I wanted to perform most of this using the GSEAbase package (PDF), as this package is geared towards storage/retrieval of gene set data, I eventually decided to abondon this approach due to the time it was taking and my lack of familiarity/understanding of how to manipulate objects in R. Despite that, GSEAbase
was still utilized for its very simple use for identifying GOlims (IDs and Terms).
Data Received - Trimmed M.magister RNA-seq from NOAA
Sequencing Read Taxonomic Classification - P.verrucosa E5 RNA-seq Using DIAMOND BLASTx and MEGAN daa-meganizer on Mox
After some discussion with Steven at Science Hour last week regarding the handling of endosymbiont sequences in the E5 P.verrucosa RNA-seq data, Steven thought it would be interesting to run the RNA-seq reads through MEGAN6 just to see what the taxonomic breakdown looks like. We may or may not (probably not) separating reads based on taxonomy. In the meantime, we’ll still proceed with HISAT2
alignments to the respective genomes as a means to separate the endosymbiont reads from the P.verrucosa reads.
Data Wrangling - C.goreaui Genome GFF to GTF Using gffread
As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I also need to get the coral endosymbiont sequence. After talking with Danielle Becker in Hollie Putnam’s Lab at Univ. of Rhode Island, she pointed me to the Cladocopium goreaui genome from Chen et. al, 2022 available here. Access to the genome requires agreeing to some licensing provisions (primarily the requirment to cite the publication whenever the genome is used), so I will not be providing any public links to the file. In order to index the Cladocopium goreaui genome file (Cladocopium_goreaui_genome_fa
) using HISAT2
for downstream isoform analysis using StringTie
and ballgown
, I need a corresponding GTF to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread
to do this on my computer. Process is documented in Jupyter Notebook linked below.
Transcript Identification and Alignments - P.verrucosa RNA-seq with Pver_genome_assembly_v1.0 Using HiSat2 and Stringtie on Mox
After getting the RNA-seq data trimmed, it was time to perform alignments and determine expression levels of transcripts/isoforms using with HISAT2
and StringTie
, respectively. StringTie
was set to output tables formatted for import into ballgown
. After those two analyses were complete, I ran gffcompare
, using the merged StringTie
GTF and the input GFF3. I caught this in one of Danielle Becker’s scripts and thought it might be interesting. The analsyes were run on Mox.
FastQ Trimming and QC - P.verrucosa RNA-seq Data from Danielle Becker in Hollie Putnam Lab Using fastp FastQC and MultiQC on Mox
After receiving the P.verrucosa RNA-seq data from Danielle Becker (Hollie Putnam’s Lab, Univ. of Rhode Island), I noticed that the trimmed reads didn’t appear to actually be trimmed. There was still adapter contamination (solely in R2 reads - suggesting the detect_adapter_for_pe
option had been omitted from the fastp
command?), but the reads had an average read length of 150bp - except when looking at the adapter content report!!??.
Data Wrangling - Create P.verrucosa GCA_014529365.1 Karyotype File
Steven asked that I create a karyotype file (GitHub Issue) from the NCBI P.verrucosa genome (GCA_014529365.1) in the following format:
Data Received - P.verrucosa RNA-seq and WGBS Full Data from Danielle Becker
Worked with Danielle Becker, as part of the Coral E5 project, to transfer data related to her repo, from her HPC (Univ. of Rhode Island; Andromeda) to ours (Univ. of Washington; Mox) in order to eventually transfer to Gannet so that these files are publicly accessible to all members of the Coral E5 project.
Daily Bits - February 2023
20230223
Genome Indexing - P.verrucosa v1.0 Assembly with HiSat2 on Mox
Working on this Issue regarding adding coral genomes to our Handbook (GitHub) and needed to generate a HISAT2
index to add to The Roberts Lab Handbook Genomic Resources.
Genome Indexing - M.capitata HIv3 Assembly with HiSat2 on Mox
Working on this Issue regarding adding coral genomes to our Handbook (GitHub) and needed to generate a HISAT2
index to add to The Roberts Lab Handbook Genomic Resources.
Genome Indexing - P.acuta HIv2 Assembly with HiSat2 on Mox
Working on this Issue regarding adding coral genomes to our Handbook (GitHub) and needed to generate a HISAT2
index to add to The Roberts Lab Handbook Genomic Resources.
Data Wrangling - P.verrucosa Genome GFF to GTF Using gffread
As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the P.verrucosa genome file (Pver_genome_assembly_v1.0.fasta
) using HISAT2
, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread
to do this on my computer. Process is documented in Jupyter Notebook linked below.
Data Wrangling - M.capitata Genome GFF to GTF Using gffread
As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the M.capitata genome file (Montipora_capitata_HIv3.assembly.fasta
) using HISAT2
, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread
to do this on my computer. Process is documented in Jupyter Notebook linked below.
Data Wrangling - P.acuta Genome GFF to GTF Conversion Using gffread
As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the P.acuta genome file using HISAT2
, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread
to do this on my computer. Process is documented in Jupyter Notebook linked below.
Genome Indexing - P.verrucosa NCBI GCA_014529365.1 with HiSat2 on Mox
Working on this Issue regarding adding coral genomes to our Handbook (GitHub) and needed to generate a HISAT2
index to add to The Roberts Lab Handbook Genomic Resources.
Genome Indexing - M.capitata NCBI GCA_006542545.1 with HiSat2 on Mox
Working on this Issue regarding adding coral genomes to our Handbook (GitHub) and needed to generate a HISAT2
index to add to The Roberts Lab Handbook Genomic Resources.
SRA Data - Coral SRA BioProject PRJNA744403 Download and QC
Per this GitHub Issue, Steven wanted me to download all of the SRA data (RNA-seq and WGBS-seq) from NCBI BioProject PRJNA744403 and run QC on the data.
Monthly Goals - January 2023
- Plot CEABIGR coefficients of variation comparisons of mean gene methylation.
Daily Bits - January 2023
20230131
2022
BS-Seq Analysis - Nextflow EpiDiverse SNP Pipeline for Haws Hawaii C.gigas BAMs from Yaamini Base Config
Yaamini asked me to run the epidiverse/snp
pipeline (GitHub Issue) on her Haws Crassostrea gigas (Pacific oyster) Hawaii bisuflite sequencing BAMs for SNP identification.
BS-Seq Analysis - Nextflow EpiDiverse SNP Pipeline for Haws Hawaii C.gigas BAMs from Yaamini
Yaamini asked me to run the epidiverse/snp
pipeline (GitHub Issue) on her Haws Crassostrea gigas (Pacific oyster) Hawaii bisuflite sequencing BAMs for SNP identification.
Daily Bits - December 2022
20221227
Daily Bits - November 2022
20221121
Daily Bits - October 2022
20221031
Data Wrangling - C.virginica NCBI GCF_002022765.2 GFF to Gene and Pseudogene Combined BED File
Working on the CEABIGR project, I was preparing to make a gene expression file to use in CIRCOS (GitHub Issue) when I realized that the Ballgown gene expression file (CSV; GitHub) had more genes than the C.virginica genes BED file we were using. After some sleuthing, I discovered that the discrepancy was caused by the lack of pseudogenes in the genes BED file I was using. Although it might not really have any impact on things, I thought it would still be prudent to have a BED file that completely matched all of the genes in the Ballgown gene expression file. Plus, having the pseudogenes might be of longterm usefulness if we we ever decide to evalute the role of long non-coding RNAs (lncRNAs) in this project.
BSseq SNP Analysis - Nextflow EpiDiverse SNP Pipeline for C.virginica CEABIGR BSseq data
Steven asked that I identify SNPs from our C.virginica CEABIGR BSseq data (GitHub Issue). So, I ran sorted, deduplicated Bismark BAMs that Steven generated through the EpiDiverse/snp Nextflow pipeline. The job was run on Mox.
Data Wrangling - Identify C.virginica Genes with Different Predominant Isoforms for CEABIGR
During today’s discussion, Yaamini recommended we generate a list of genes with different predominant isoforms between females and males, while also adding a column with a binary indicator (e.g. 0
or 1
) to mark those genes which were not different (0
) or were different (1
) between sexes. Steven had already generated files identifying predominant isoforms in each sex:
RNAseq Alignments - P.generosa Alignments and Alternative Transcript Identification Using Hisat2 and StringTie on Mox
As part of identifying long non-coding RNA (lncRNA) in Pacific geoduck(GitHub Issue), one of the first things that I wanted to do was to gather all of our geoduck RNAseq data and align it to our geoduck genome. In addition to the alignments, some of the examples I’ve been following have also utilized expression levels as one aspect of the lncRNA selection criteria, so I figured I’d get this info as well.
FastQ Trimming - Geoduck RNAseq Data Using fastp on Mox
Per this GitHub Issue, Steven asked me to identify long non-coding RNA (lncRNA) in geoduck. The first step is to aggregate all of our Panopea generosa (Pacific geoduck) RNAseq data and get it all trimmed. After that, align it to the genome, followed by Ballgown expression analysis, and then followed by a variety of selection criteria to parse out lncRNAs.
Daily Bits - September 2022
20220930
FastQ Trimming and QC - C.virginica Larval BS-seq Data from Lotterhos Lab and Part of CEABIGR Project Using fastp on Mox
We had some old Crassostrea virginica (Eastern oyster) larval/zygote BS-seq data from the Lotterhos Lab that’s part of the CEABiGR Workshop/Project (GitHub Repo) and Steven asked that I QC/trim it in this GitHub Issue.
Data Wrangling - Convert S.namaycush NCBI GFF to genes-only BED file for Use in Ballgown Analysis
In preparation for isoform identificaiton/quantification in S.namaycush RNAseq data, Ballgown will need a genes-only BED file. To generate, I used GFFutils to extract only genes from the NCBI GFF: GCF_016432855.1_SaNama_1.0_genomic.gff
. All code was documented in the following Jupyter Notebook.
Splice Site Identification - S.namaycush Liver Parasitized and Non-Parasitized SRA RNAseq Using Hisat2-Stingtie with Genome GCF_016432855.1
After previously downloading/trimming/QCing S.namaycush SRA liver RNAseq data on 20220706, Steven asked that I run through Hisat2 for splice site identification (GitHub Issue).
qPCR - Repeat of Mussel Gill Heat Stress cDNA with Ferritin Primers
My previous qPCR on these cDNA using ferritin primers (SRIDs: 1808, 1809) resulted in no amplification. This was a bit surprising and makes me suspect that I screwed up somewhere (not adding primer(s)??), so I decided to repeat the qPCR. I made fresh working primer stocks and used 1uL of cDNA for each reaction. All reactions were run in duplicate on our CFX Connect thermalcycler (BioRad) with SsoFast EVAgreen Master Mix (BioRad). See my previous post linked above for qPCR master mix calcs.
Daily Bits - August 2022
20220831
qPCR - Dorothys Mussel cDNA from 20220726
Ran qPCRs on Dorothy’s mussel gill cDNA from 20220726 using the following primers:
RNA Isolation and Quantification - Dorothy Mussel Gill Samples
Isolated RNA from a subset of Dorothy’s mussel gill samples:
BS-seq and SNP Analysis - Nextflow EpiDiverse Pipelines Trials and Tribulations
Alrighty, this notebook entry is going to have a lot to unpack, as the process to get these pipelines running and then deal with the actual data we wanted to run them with was quite involved. However, the TL;DR of this all is this:
SRA Data - S.namaycush SRA BioProject PRJNA674328 Download and QC
Per this GitHub Issue, which I accidentally forgot about for three weeks (!), Steven wanted me to download the lake trout (Salvelinus namaycush) RNAseq data from NCBI BioProject PRJNA674328 and run QC on the data.
Daily Bits - July 2022
20220727
Daily Bits - June 2022
SRA Submission - Ostrea lurida MBD BS-seq from 20160203
Submitted our Ostrea lurida (Olympia oyster) MBD BS-seq data from 20160203 to the NCBI Sequence Read Archive (SRA).
Project Summary - C.virginica CEABiGR - Female vs. Male Gonad Exposed to OA
This will be a “dynamic” notebook entry, whereby I will update this post continually as I process new samples, analyze new data, etc for this project. The hope is to make it easier to find all the work I’ve done for this without having to search my notebook to find individual notebook entries.
Data Wrangling - Create Primary P.generosa Genome Annotation File
Steven asked me to create a canonical genome annotation file (GitHub Issue). I needed/wanted to create a file containing gene IDs, SwissProt (SP) IDs, gene names, gene descriptions, and gene ontology (GO) accessions. To do so, I utilized the NCBI BLAST and DIAMOND BLAST annotations generated by our GenSas P.generosa genome annotation. Per Steven’s suggestion, I used the best match (i.e. lowest e-value
) for any given gene between the two files.
RNA Isolation - O.nerka Berdahl Brain Tissues
Server Maintenance - Fix Server Certificate Authentication Issues
We had been encounterings issues when linking to images in GitHub (e.g. notebooks, Issues/Discussions) hosted on our servers (primarily Gannet). Images always showed up as broken links and, with some work, we could see an error message related to server authentication. More recently, I also noticed that Jupyter Notebooks hosted on our servers could not be viewed in NB Viewer. Attempting to view a Jupyter Notebook hosted on one of our servers results in a 404 error, with a note regarding server certificate problems. Finally, the most annoying issue was encountered when running the shell programs wget
to retrieve files from our servers. This program always threw an error regarding our server certificates. The only way to run wget
without this error was to add the option --no-check-certificate
(which, thankfully, was a suggestion by wget
error message).
Nextflow - Trials and Tribulations of Installing and Using NF-Core RNAseq
INSTALLATION
Data Wrangling - P.generosa Genomic Feature FastA Creation
Steven wanted me to generate FastA files (GitHub Issue) for Panopea generosa (Pacific geoduck) coding sequences (CDS), genes, and mRNAs. One of the primary needs, though, was to have an ID that could be used for downstream table joining/mapping. I ended up using a combination of GFFutils and bedtools getfasta
. I took advantage of being able to create a custom name
column in BED files to generate the desired FastA description line having IDs that could identify, and map, CDS, genes, and mRNAs across FastAs and GFFs.
Differential Gene Expression - P.generosa DGE Between Tissues Using Nextlow NF-Core RNAseq Pipeline on Mox
Steven asked that I obtain relative expression values for various geoduck tissues (GitHub Issue). So, I decided to use this as an opportunity to try to use a Nextflow pipeline. There’s an RNAseq pipeline, NF-Core RNAseq which I decided to use. The pipeline appears to be ridiculously thorough (e.g. trims, removes gDNA/rRNA contamination, allows for multiple aligners to be used, quantifies/visualizes feature assignments by reads, performs differential gene expression analysis and visualization), all in one package. Sounds great, but I did have some initial problems getting things up and running. Overall, getting things set up to actually run took longer than the actual pipeline run! Oh well, it’s a learning process, so that’s not totally unexpected.
Data Analysis - C.virginica BSseq Unmapped Reads Using MEGAN6
After performing DIAMOND BLASTx and DAA “meganization” on 20220302, the next step was to import the DAA files into MEGAN6 for analyzing the resulting taxonomic assignments of the Crassostrea virginica (Eastern oyster) unmapped BSseq reads that Steven generated.
Data Analysis - C.virginica RNAseq Zymo ZR4059 Analyzed by ZymoResearch
After realizing that the Crassostrea virginica (Eastern oyster) RNAseq data had relatively low alignment rates (see this notebook entry from 20220224 for a bit more background), I contacted ZymoResearch to see if they had any insight on what might be happening. I suspected rRNA contamination. ZymoResearch was kind enough to run the RNAseq data through their pipeline and provided us. This notebook entry provides a brief overview and thoughts on the report.
Taxonomic Assignment - C.virginica BSseq Unmapped Reads Using DIAMOND BLASTx and MEGAN6 on Mox
After mapping bisulfite sequencing (BSseq) data to the Crassostrea virginica (Eastern oyster) genome, Steven noticed that there were a large number of unmapped reads. He asked that I attempt to taxonomically claissify the unmapped reads (GitHub Issue), with the idea that maybe these reads could provide additional data on an associated microbiome (GitHub Discussion).
Data Wrangling - P.generosa Genome GFF Conversion to GTF Using gffread
Steven asked in this GitHub Issue to convert our Panopea generosa (Pacific geoduck) genomic GFF to a GTF for use in the 10x Genomics Cell Ranger software. This conversion was performed using GffRead in a Jupyter Notebook.
Transcript Identification and Alignments - C.virginica RNAseq with NCBI Genome GCF_002022765.2 Using Hisat2 and Stringtie on Mox
After an additional round of trimming yesterday, I needed to identify alternative transcripts in the Crassostrea virginica (Eastern oyster) gonad RNAseq data we have. I previously used HISAT2
to index the NCBI Crassostrea virginica (Eastern oyster) genome and identify exon/splice sites on 20210720. Then, I used this genome index to run StringTie
on Mox in order to map sequencing reads to the genome/alternative isoforms.
Trimming - Additional 20bp from C.virginica Gonad RNAseq with fastp on Mox
When I previously aligned trimmed RNAseq reads to the NCBI C.virginica genome (GCF_002022765.2) on 20210726, I specifically noted that alignment rates were consistently lower for males than females. However, I let that discrepancy distract me from a the larger issue: low alignment rates. Period! This should have thrown some red flags and it eventually did after Steven asked about overall alignment rate for an alignment of this data that I performed on 20220131 in preparation for genome-guided transcriptome assembly. The overall alignment rate (in which I actually used the trimmed reads from 20210714) was ~67.6%. Realizing this was a on the low side of what one would expect, it prompted me to look into things more and I came across a few things which led me to make the decision to redo the trimming:
Data Wrangling - C.virginica lncRNA Extractions from NCBI GCF_002022765.2 Using GffRead
Continuing to work on our Crassostrea virginica (Eastern oyster) project examining the effects of OA on female and male gonads (GitHub repo), Steven tasked me with parsing out long, non-coding RNAs (GitHub Issue). To do so, I relied on the NCBI genome and associated files/annotations. I used GffRead, GFFutils, and samtools. The process was documented in the followng Jupyter Notebook:
Transcriptome Assembly - Genome-guided C.virginica Adult Gonad OA RNAseq Using Trinity on Mox
As part of this project, Steven’s asked that I identify long, non-coding RNAs (lncRNAs) (GitHub Issue) in the Crassostrea virginica (Eastern oyster) adult OA gonad RNAseq data we have. The initial step for this is to assemble transcriptome. I generated the necessary BAM alignment on 20220131. Next was to actually get the transcriptome assembled. I followed the Trinity
genome-guided procedure.
RNAseq Alignment - C.virginica Adult OA Gonad Data to GCF_002022765.2 Genome Using HISAT2 on Mox
As part of this project, Steven’s asked that I identify long, non-coding RNAs (lncRNAs) (GitHub Issue) in the Crassostrea virginica (Eastern oyster) adult OA gonad RNAseq data we have. The initial step for this is to assemble transcriptome. Since there is a published genome (NCBI RefSeq GCF_002022765.2C_virginica-3.0)](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/022/765/GCF_002022765.2_C_virginica-3.0/) for [_Crassostrea virginica (Eastern oyster), I will perform a genome-guided assembly using Trinity
. That process requires a sorted BAM file as input. In order to generate that file, I used HISAT2
. I’ve already generated the necessary HISAT2
genome index files (as of 20210720), which also identified/incorporated splice sites and exons, which the HISAT2
alignment process requires to run.
Data Wrangling - C.virginica Gonad RNAseq Transcript Counts Per Gene Per Sample Using Ballgown
As we continue to work on the analysis of impacts of OA on Crassostrea virginica (Eastern oyster) gonads via DNA methylation and RNAseq (GitHub repo), we decided to compare the number of transcripts expressed per gene per sample (GitHub Issue). As it turns out, it was quite the challenge. Ultimately, I wasn’t able to solve it myself, and turned to StackOverflow for a solution. I should’ve just done this at the beginning, as I got a response (and solution) less than five minutes after posting! Regardless, the data wrangling progress (struggle?) was documented in the following GitHub Discussion:
Project Summary - Matt George PSMFC Mytilus Byssus Project
This will be a “dynamic” notebook entry, whereby I will update this post continually as I process new samples, analyze new data, etc for this project. The hope is to make it easier to find all the work I’ve done for this without having to search my notebook to find individual notebook entries.
RNA Isolation - O.nerka Berdahl Brain Tissues
RNA Isolation - M.trossulus Gill
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):
RNA Isolation - M.trossulus Gill and Phenol Gland
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):
RNA Isolation - M.trossulus Gill and Phenol Gland
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):
RNA Isolation - M.trossulus Gill and Phenol Gland
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):
2021
RNA Isolation - M.trossulus Phenol Gland and Gill
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):
RNA Isolation - M.trossulus Foot and Phenol Gland
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot” and the “PG” indicates “phenol gland” tissues):
Project Summary - O.nerka Berdahl Samples
This will be a “dynamic” notebook entry, whereby I will update this post continually as I process new samples, analyze new data, etc for this project. The hope is to make it easier to find all the work I’ve done for this without having to search my notebook to find individual notebook entries.
RNA Isolation - O.nerka Berdahl Brain Tissues
RNA Isolation - O.nerka Berdahl Tissues
Finally got around to tackling this GitHub issue regarding isolating RNA from some Oncorhynchus nerka (sockeye salmon) tissues we have from Andrew Berdahl’s lab (a UW SAFS professor) to use for RNAseq and/or qPCR. We have blood, brain, gonad, and liver samples from individual salmon from two different groups: territorial and social individuals. We’ve decided to isolate RNA from brain, gonads, and liver from two individuals within each group. All samples are preserved in RNAlater and stored @ -80oC.
Data Wrangling - C.virginica NCBI GCF_002022765.2 GFF to Gene BED File
When working to identify differentially expressed transcripts (DETs) and genes (DEGs) for our Crassostrea virginica (Eastern oyster) RNAseq/DNA methylation comparison of changes across sex and ocean acidification conditions (https://github.com/epigeneticstoocean/2018_L18-adult-methylation), I realized that the DEG tables I was generating had excessive gene counts due to the fact that the analysis (and, in turn, the genome coordinates), were tied to transcripts. Thus, genes were counted multiple times due to the existence of multiple transcripts for a given gene, and the analysis didn’t list gene coordinate data - only transcript coordinates.
Differential Transcript Expression - C.virginica Gonad RNAseq Using Ballgown
In preparation for differential transcript analysis, I previously ran our RNAseq data through StringTie
on 20210726 to identify and quantify transcripts. Identification of differentially expressed transcripts (DETs) and genes (DEGs) will be performed using ballgown
. This notebook entry will be different than most others, as this notebook entry will simply serve as a “landing page” to access/review the analysis; as the analysis will evolve over time and won’t exist as a single computing job with a definitive endpoint.
SNP Characterization - C.bairdi v3.1 Transcriptome Assembly and Day2 DEG Pooled Samples RNAseq Data
I previously identified variants across the four Day 2 DEG pooled RNAseq samples (380822, 380823, 380824, 380825) on 20210909 within the cbai_transcriptome_v3.1
transcriptome assembly. Now, I needed to actually do some analysis on the SNPs for the revisions to the Tanner crab gene expression paper (GitHub Repo).
SNP Identification - C.bairdi Day 2 DEG Pooled Samples Using bcftools on Mox
RNAseq Alignments - C.bairdi Day 2 Infected-Uninfected Temperature Increase-Decrease RNAseq to cbai_transcriptome_v3.1.fasta with Hisat2 on Mox
Ealier today, I created the necessary Hisat2 index files for cbai_transcriptome_v3.1
. Next, I needed to actually get the alignments run. The alignments were performed using HISAT2
on Mox using the following set of trimmed FastQ files from 2020414:
Assembly Indexing - C.bairdi Transcriptome cbai_transcriptome_v3.1.fasta with Hisat2 on Mox
We recently received reviews back for the Tanner crab paper submission (“Characterization of the gene repertoire and environmentally driven expression patterns in Tanner crab (Chionoecetes bairdi)”) and one of the reviewers requested a more in-depth analysis. As part of addressing this, we’ve decided to identify SNPs withing the _Chionoecetes bairdi (Tanner crab) transcriptome used in the paper (cbai_transcriptome_v3.1
). Since the process involves aligning sequencing reads to the transcriptome, the first thing that needed to be done was to generate index files for the aligner (HISAT2
, in this particular case), so I ran HISAT2
on Mox.
Computer Servicing - APC SUA2200RM2U UPS Battery Replacement
Replaced the battery pack in the top APC SUA2200RM2U UPS in our computing “cluster” cabinet.
Computer Management - Disable Sleep and Hibernation on Raven
We’ve been having an issue with our computer Raven where it would become inaccessible after some time after a reboot. Attempts to remote in would just indicate no route to host or something like that. We realized it seemed like this was caused by a power saving setting, but changing the sleep setting in the Ubuntu GUI menu didn’t fix the issue. It also seemd like the sleep/hibernate issue was only a problem after the computer had been rebooted and no one had logged in yet…
Uninterruptible Power Supply - Battery Replacement for APS BR1000G
Replace battery in uninterruptible power supply APS BR1000G for Owl in FTR 230.
Transcript Identification and Quantification - C.virginia RNAseq With NCBI Genome GCF_002022765.2 Using StringTie on Mox
After having run HISAT2
to index and identify exons and splice sites in the NCBI Crassostrea virginica (Eastern oyster) genome (GCF_002022765.2) on 20210720, the next step was to identify and quantify transcripts from the RNAseq data using StringTie
.
Genome Annotations - Splice Site and Exon Extractions for C.virginica GCF_002022765.2 Genome Using Hisat2 on Mox
Previously performed quality trimming on the Crassostrea virginica (Eastern oyster) gonad/sperm RNAseq data on 20210714. Next, I needed to identify exons and splice sites, as well as generate a genome index using HISAT2
to be used with StringTie
downstream to identify potential alternative transcripts. This utilized the following NCBI genome files:
Trimming - C.virginica Gonad RNAseq with FastP on Mox
Needed to trim the Crassostrea virginica (Eastern oyster) gonad RNAseq data we received on 20210528.
Summary - Geoduck RNAseq Data
Per this GitHub Issue, I’ve compiled a summary table, with links, of all of our Panopea generosa (Pacific geoduck) RNAseq data as it exists in NCBI. This will be a “dynamic” notebook entry, whereby I will update this post continually as we acquire new data and/or change the information we’d like to have here.
Genome Analysis - Identification of Potential Contaminating Sequences in Panopea-generosa-v1.0 Assembly Using BlobToolKit on Mox
As part of our Panopea generosa (Pacific geoduck) genome sequencing efforts, Steven came across a tool designed to help identify if there are any contaminating sequences in your assembly. The software is BlobToolKit. The software is actually a complex pipeline of separate tools ([minimap2])https://github.com/lh3/minimap2, BLAST
, DIAMOND
BLAST, and BUSCO) which aligns sequencing reads and assigns taxonomy to the reads, as well as marking regions of the assembly with various taxonomic assignments.
FastQC-MultiQC - Yaamini’s C.virginica RNAseq and WGBS from ZymoResearch on Mox
Finally got around to running FastQC
on Yaamini’s RNAseq and WGBS sequencing data recieved on 20210528.
Data Wrangling - S.salar Gene Annotations from NCBI RefSeq GCF_000233375.1_ICSASG_v2_genomic.gff for Shelly
Shelly posted a GitHub Issue asking if I could create a file of S.salar genes with their UniProt annotations (e.g. gene name, UniProt accession, GO terms).
Data Received - Yaamini’s C.virginica WGBS and RNAseq Data from ZymoResearch
Yaamini received her sequencing data from ZymoResearch; both whole genome bisfulfite sequencing (WGBS) and RNAseq. I was tasked with downloading the data and running QC.
Assembly Comparisons - Ostrea lurida Non-scaffold Genome Assembly Comparisons Using Quast on Swoose
After generating a new Ostrea lurida genome assembly (v090) on 20210520, I decided to compare with our previous genome assemblies. Here, I compared v090 with all of our previous scaffolded assemblies using Quast. Here is a table (GitHub) which describes all of our existing assemblies (i.e. assembly name, assembly process, etc):
Assembly Assessment - Olurida_v090 Using Quast on Swoose
After running a new Ostrea lurida assembly yesterday (Olurida_v090
), I evaluated Olurida_v090
using Quast to produce some stats. This was run on my local computer with the following command:
Genome Assembly - Olurida_v090 with BGI Illumina and PacBio Hybrid Using Wengan on Mox
I was recently tasked with adding annotations for our Ostrea lurida genome assembly to NCBI. As it turns out, adding just annotation files can’t be done since the genome was initially submitted to ENA. Additionally, updating the existing ENA submission with annotations is not possible, as it requires a revocation of the existing genome assembly; requiring a brand new submission. With that being the case, I figured I’d just make a new genome submission with the annotations to NCBI. Unfortunately, there were a number of issues with our genome assembly that were going to require a fair amount of work to resolve. The primary concern was that most of the sequences are considered “low quality” by NCBI (too many and too long stretches of Ns in the sequences). Revising the assembly to make it compatible with the NCBI requirements was going to be too much, so that was abandoned.
Trimming - O.lurida BGI FastQs with FastP on Mox
After attempting to submit our Ostrea lurida (Olympia oyster) genome assembly annotations (via GFF) to NCBI, the submission process also highlighted some short comings of the Olurida_v081
assembly. When getting ready to submit the genome annotations to NCBI, I was required to calculate the genome coverage we had. NCBI suggested to calculate this simply by counting the number of bases sequenced and divide it by the genome size. Doing this resulted in an estimated coverage of ~55X coverage, yet we have significant stretches of Ns throughout the assembly. I understand why this is still technically possible, but it’s just sticking in my craw. So, I’ve decided to set up a quick assembly to see what I can come up with. Of note, the canonical assembly we’ve been using relied on the scaffolded assembly provided by BGI; we never attempted our own assembly from the raw data.
Genome Submission - Validation of Olurida_v081.fa and Annotated GFFs Prior to Submission to NCBI
Per this GitHub Issue, Steven has asked to get our Ostrea lurida (Olympia oyster) genome assembly (Olurida_v081.fa
) submitted to NCBI with annotations. The first step in the submission process is to use the NCBI table2asn_GFF
software to validate the FastA assembly, as well as the GFF annotations file. Once the software has been run, it will point out any errors which need to be corrected prior to submission.
RepeatMasker - C.gigas Rosling NCBI Genome GCA_902806645.1 on Mox
Decided to tackle this GitHub Issue about creating a transposable elements IGV track with the new Roslin C.gigas genome, since it had been sitting for a while and I have code sitting around that’s ready to roll for this type of thing.
Singularity - RStudio Server Container on Mox
Aidan recently needed to use R on a machine with more memory. Additionally, it would be ideal if he could use RStudio. So, I managed to figure out how to set up a Singularity container running rocker/rstudio.
Read Mapping - 10x-Genomics Trimmed FastQ Mapped to P.generosa v1.0 Assembly Using Minimap2 for BlobToolKit on Mox
To continue towards getting our Panopea generosa (Pacific geoduck) genome assembly (v1.0) analyzed with BlobToolKit, per this GitHub Issue, I’ve decided to run each aspect of the pipeline manually, as I continue to have issues utilizing the automatic pipeline. As such, I’ve run minimap2
according to the BlobToolKit “Getting Started” guide on Mox. This will map the trimmed 10x-Genomics reads from 20210401 to the Panopea-generosa-v1.0.fa assembly (FastA; 914MB).
Genome Annotation - P.generosa v1.0 Assembly Using DIAMOND BLASTx for BlobToolKit on Mox
To continue towards getting our Panopea generosa (Pacific geoduck) genome assembly (v1.0) analyzed with BlobToolKit, per this GitHub Issue, I’ve decided to run each aspect of the pipeline manually, as I continue to have issues utilizing the automatic pipeline. As such, I’ve run DIAMOND
BLASTx according to the BlobToolKit “Getting Started” guide on Mox.
Genome Annotation - P.generosa v1.0 Assembly Using BLASTn for BlobToolKit on Mox
To continue towards getting our Panopea generosa (Pacific geoduck) genome assembly (v1.0) analyzed with BlobToolKit, per this GitHub Issue, I’ve decided to run each aspect of the pipeline manually, as I continue to have issues utilizing the automatic pipeline. As such, I’ve run BLASTn
according to the BlobToolKit “Getting Started” guide on Mox.
Trimming P.generosa 10x Genomics HiC FastQs with fastp on Mox
Steven asked me to try running Blob Tool Kit to identify potential contaminating sequence in our Panopea generosa (Pacific geoduck) genome assembly (v1.0). In preparation for running Blob Tool Kit, I needed to trim the 10x Genomics FastQ data used by Phase Genomics. Files were trimmed using fastp
on Mox.
Transcriptome Annotation - Trinotate on C.bairdi Transcriptome v4.0 on Mox
Continued annotation of cbai_transcriptome_v4.0.fasta
[Trinity de novo assembly from 20210317(https://robertslab.github.io/sams-notebook/2021/03/17/Transcriptome-Assembly-C.bairdi-Transcriptome-v4.0-Using-Trinity-on-Mox.html)] using Trinotate
on Mox. This will provide a thorough annotation, including genoe ontology (GO) term assignments to each contig.
Transcriptome Annotation - DIAMOND BLASTx on C.bairdi Transcriptome v4.0 on Mox
Continued annotation of cbai_transcriptome_v4.0.fasta
[Trinity de novo assembly from 20210317(https://robertslab.github.io/sams-notebook/2021/03/17/Transcriptome-Assembly-C.bairdi-Transcriptome-v4.0-Using-Trinity-on-Mox.html)] using DIAMOND
BLASTx on Mox. This will be used as a component of Trinotate annotation downstream.
TransDecoder - C.bairdi Transcriptome v4.0 on Mox
Began annotation of cbai_transcriptome_v4.0.fasta
Trinity de novo assembly from 20210317] using TransDecoder on Mox. This will be used as a component of Trinotate annotation downstream.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi Transcriptome v4.0 on Mox
I previously created a C.bairdi de novo transcriptome assembly v4.0 with Trinity from all our C.bairdi RNAseq reads which had BLASTx matches to the C.opilio genome and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Assembly - C.bairdi Transcriptome v4.0 Using Trinity on Mox
Continuing to addressing this GitHub issue, to generate an additional C.bairdi transcriptome, I finally got to the point of actually running the assembly using Trinity using the extracted reads from 20210316. Those reads were identified via BLASTx agianst the C.opilio genome proteins on 20210312. Trinty was run on Mox.
Read Extractions - C.bairdi RNAseq Reads from C.opilio BLASTx Matches with seqkit on Mox
As part of addressing this GitHub issue, to generate an additional C.bairdi transcriptome, I needed to extract the reads ID’ed via BLASTX against the C.opilio genome on 20210312. Read extractions were performed using SeqKit
on Mox.
DIAMOND BLASTx - C.bairdi RNAseq vs C.opilio Genome Proteins on Mox
We want to generate an additional Tanner crab (Chionoecetes bairdi) transcriptome, per this GitHub issue, to generate an additional C.bairdi transcriptome. This has come about due to the release of the genome of a very closely related crab species, Chionoecetes opilio (Snow crab).
Transcriptome Annotation - Trinotate Hematodinium v1.7 on Mox
Transcriptome Annotation - Trinotate Hematodinium v1.6 on Mox
Transcriptome Assembly - Hematodinium Transcriptomes v1.6 and v1.7 with Trinity on Mox
I’d previously assembled hemat_transcriptome_v1.0.fasta
on 20200122, hemat_transcriptome_v1.5.fasta
on 20200408, extracted hemat_transcriptome_v2.1.fasta
from an existing FastA on 20200605, as well as extracted hemat_transcriptome_v3.1.fasta
on 20200605.
Data Wrangling - Gene ID Extraction from P.generosa Genome GFF Using Methylation Machinery Gene IDs
Per this GitHub issue, Steven provided a list of methylation-related gene names and wanted to extract the corresponding Panopea generosa ([Pacific geoduck (Panopea generosa)](http://en.wikipedia.org/wiki/Geoduck)) gene ID from our P.generosa genome, along with corresponding BLAST
e-values.
Data Wrangling - Gene ID Extraction from P.generosa Genome GFF Using Methylation Machinery List
Per this GitHub Issue Steven asked that I take a list of gene names associated with DNA methylation and see if I could extract a list of Panopea generosa (Panopea generosa) gene IDs and corresponding BLAST e-values for each from our P.generosa genome annotation (see Genomic Resources wiki for more info).
SRA Submission - A.elegantissima ONT Fast5 from Jay Dimond
At the beginning of the month, Jay sent us his A.elegantissima ONT Fast5 data in order to help him get it submitted to NCBI Sequencing Read Archive (SRA).
Data Received - Anthopleura elegantissima - aggregating anenome - NanoPore Genome Sequence from Jay Dimond
Jay asked me to help get his A.elegantissima (aggregating anenome) NanoPore gDNA sequencing data submitted to NCBI Sequencing Read Archive (SRA). He sent a hard drive (HDD) with all the NanoPore sequencing Fast5 files. The HDD was received on 2/2/2021. Here’re are details provided in the reamde file in the Ae_ONT directory.
Samples Submitted - M.magister MBD-BSseq Libraries to Univ. of Oregon GC3F
Submitted the M.magister MBD-BSseq libraries created 20201124 using the 4nM aliquots created for the MiSeq test run on 20201202 to the Univ. of Oregon GC3F sequencing core.
2020
Transcriptome Comparisons - C.bairdi Transcriptomes Evaluations with DETONATE rsem-eval on Mox
UPDATE: I’ll lead in with the fact that this failed with an error message that I can’t figure out. This will save the reader some time. I’ve posted the problem as an Issue on the DETONATE GitHub repo, however it’s clear that this software is no longer maintained, as the repo hasn’t been updated in >3yrs; even lacking responses to Issues that are that old.
Alignments - C.bairdi RNAseq Transcriptome Alignments Using Bowtie2 on Mox
I had previously attempted to compare all of our C.bairdi transcriptome assemblies using DETONATE on 20200601, but, due to hitting time limits on Mox, failed to successfully get the analysis to complete. I realized that the limiting factor was performing FastQ alignments, so I decided to run this step independently to see if I could at least get that step resolved. DETONATE (rsem-eval) will accept BAM files as input, so I’m hoping I can power through this alignment step and then provided DETONATE (rsem-eval) with the BAM files.
Samples Received - Cockle Clam Gonad H and E Slides
Today we received the H & E-stained slides from the cockle clam gonad tissue blocks/cassettes we submitted on 20201201. Slides were added to Slide Case #5 - Rows 13 - 37 (Google Sheet).
FastQC-MultiQC - M.magister MBD-BSseq Pool Test MiSeq Run on Mox
Earlier today we received the M.magister (C.magister; Dungeness crab) MiSeq data from Mac.
Data Received - M.magister MBD-BSseq Pool Test MiSeq Run
After creating _M.magister (C.magister; Dungeness crab) MBD-BSseq libraries (on 20201124), I gave the pooled set of samples to Mac for a test sequencing run on the MiSeq on 20201202.
Alignment - C.gigas RNAseq to GCF_000297895.1_oyster_v9 Genome Using STAR on Mox
Mac was getting some weird results when mapping some single cell RNAseq data to the C.gigas mitochondrial (mt) genome that she had, so she asked for some help mapping other C.gigas RNAseq data (GitHub Issue) to the C.gigas mt genome to see if someone else would get similar results.
SRA Submission - Haws Lab C.gigas Ploidy pH WGBS
Trimming - Haws Lab C.gigas Ploidy pH WGBS 10bp 5 and 3 Prime Ends Using fastp and MultiQC on Mox
Making the assumption that the 24 C.gigas ploidy pH WGBS data we receved 20201205 will be analyzed using Bismark
, I decided to go ahead and trim the files according to Bismark
guidelines for libraries made with the ZymoResearch Pico MethylSeq Kit.
FastQC-MultiQc - C.gigas Ploidy pH WGBS Raw Sequence Data from Haws Lab on Mox
Yesterday (20201205), we received the whole genome bisulfite sequencing (WGBS) data back from ZymoResearch from the 24 C.gigas diploid/triploid subjected to two different pH treatments (received from the Haws’ Lab on 20200820 that we submitted to ZymoResearch on 20200824. As part of our standard sequencing data receipt pipeline, I needed to generate FastQC
files for each sample.
Data Received - C.gigas Diploid-Triploid pH Treatments Ctenidia WGBS from ZymoResearch
Today we received the whole genome bisulfite sequencing (WGBS) from the 24 C.gigas diploid-triploid samples subjected to different pH that were submitted 20200824. The lengthy turnaround time was due to a bad lot of reagents, which forced them Zymo to find a different manufacturer in order to generate libraries.
Trimming - Ronits C.gigas Ploidy WGBS 10bp 5 and 3 Prime Ends Using fastp and MultiQC on Mox
Steven asked me to trim (GitHub Issue) Ronit’s WGBS sequencing data we received on 20201110, according to Bismark
guidelines for libraries made with the ZymoResearch Pico MethylSeq Kit.
Sample Submission - M.magister MBD BSseq Libraries for MiSeq at NOAA
Earlier today I quantified the libraries with the Qubit in preparation for sample pooling and sequencing. Before performing a full sequencing run, Mac wanted to select a subset of the libraries based on the experimental treatments to have an equal representation of samples. She also wanted to do a quick run on the MiSeq at NOAA to evaluate how well libraries map and to make sure libraries appear to be sequencing at relatively equal levels.
Library Quantification - M.magister MBD BSseq Libraries with Qubit
After reviewing the Bionalyzer assays for the MBD BSseq libraries Mac indicated she’d like to have the libraries quantified using the Qubit.
Samples Submitted - Cockle Clam Gonad Histology Cassettes for H and E
Per this GitHub Issue, Steven asked that I submit a set of fixed cockle clam gonad tissues (currently stored in histology cassettes in 70% EtOH) for Hematoxylin and eosin stain (H&E). I submitted the following samples to the UW Pathology Research Services Laboratory (UW PRSL):
Trimming - Ronits C.gigas Ploidy WGBS Using fastp and MultiQC on Mox
Steven asked me to trim (GitHub Issue) Ronit’s WGBS sequencing data we received on 20201110, according to Bismark
guidelines for libraries made with the ZymoResearch Pico MethylSeq Kit.
Bioanalyzer - M.magister MBD BSseq Libraries
MBD BSseq library construction was completed yesterday (20201124). Next, I needed to evaluate the libraries using the Roberts Lab Bioanalyzer 2100 (Agilent) to assess library sizes, yields, and qualities (i.e. primer dimers).
MBD BSseq Library Prep - M.magister MBD-selected DNA Using Pico Methyl-Seq Kit
After finishing the final set of eight MBD selections on 20201103, I’m finally ready to make the BSseq libraries using the Pico Methyl-Seq Library Prep Kit (ZymoResearch) (PDF). I followed the manufacturer’s protocols with the following notes/changes (organized by each section in the protocol):
RNA Isolation and Quantification - P.generosa Hemocytes from Shelly
Shelly asked me to isolate RNA from some P.generosa hemocytes (GitHub Issue) that she had.
SRA Submission - Ronits C.gigas Ploidy WGBS
FastQC-MultiQc - C.gigas Ploidy WGBS Raw Sequence Data from Ronits Project on Mox
Transcriptome Assessment - Crustacean Transcripome Completeness Evaluation Using BUSCO on Mox
Grace was recently working on writing up a manuscript which did a basic comparison of our C.bairdi transcriptome (cbai_transcriptome_v3.1
) (see the Genomic Resources wiki for more deets) to two other species’ transcriptome assemblies. We wanted BUSCO evaluations as part of this comparison, but the two other species did not have BUSCO scores in their respective publications. As such, I decided to generate them myself, as BUSCO runs very quickly. The job was run on Mox.
Data Received - C.gigas Ploidy WGBS from Ronits Project via ZymoResearch
We received the data from our whole genome bisulfite sequencing (WGBS) submission to ZymoResearch on 2020820 for Ronit’s C.gigas diploid/triploid dessication/heat stress project.
Data Wrangling - MultiQC on S.salar RNAseq from fastp and HISAT2 on Mox
In Shelly’s GitHub Issue for this S.salar project, she also requested a MultiQC
report for the trimming (completed on 20201029) and the genome alignments (completed on 20201103).
Hard Drive Upgrade - Gannet Synology Server
Completed upgrading the 12 x 8TB HDDs in our server, Gannet (Synology RS3618XS), to 12 x 16TB HDDs. The process was simple, but the repair process took ~20hrs for each new drive. So, the entire process required 12 separate days of pulling out one old HDD, replacing with a new HDD, and initiating the repair process in the Synology web interface.
RNAseq Alignments - S.salar HISAT2 BAMs to GCF_000233375.1_ICSASG_v2_genomic.gtf Transcriptome Using StringTie on Mox
This is a continuation of addressing Shelly Trigg’s (regarding some Salmo salar RNAseq data) request (GitHub Issue) to trim (completed 20201029), perform genome alignment (completed on 20201103), and transcriptome alignment.
RNAseq Alignments - Trimmed S.salar RNAseq to GCF_000233375.1_ICSASG_v2_genomic.fa Using Hisat2 on Mox
This is a continuation of addressing Shelly Trigg’s (regarding some Salmo salar RNAseq data) request (GitHub Issue) to trim (completed 20201029), perform genome alignment, and transcriptome alignment.
MBD Selection - M.magister Sheared Gill gDNA 16 of 24 Samples Set 3 of 3
Click here for notebook on the first eight samples processed. Click here for the second set of eight samples processed. M.magister (Dungeness crab) gill gDNA provided by Mackenzie Gavery was previously sheared on 20201026 and three samples were subjected to additional rounds of shearing on 20201027, in preparation for methyl bidning domain (MBD) selection using the MethylMiner Kit (Invitrogen).
MBD Selection - M.magister Sheared Gill gDNA 8 of 24 Samples Set 2 of 3
Click here for notebook on the first eight samples processed. M.magister (Dungeness crab) gill gDNA provided by Mackenzie Gavery was previously sheared on 20201026 and three samples were subjected to additional rounds of shearing on 20201027, in preparation for methyl bidning domain (MBD) selection using the MethylMiner Kit (Invitrogen).
Trimming - Shelly S.salar RNAseq Using fastp and MultiQC on Mox
Shelly asked that I trim, align to a genome, and perform transcriptome alignment counts in this GitHub issue with some Salmo salar RNAseq data she had and, using a subset of the NCBI Salmo salar RefSeq genome, GCF_000233375.1. She created a subset of this genome using only sequences designated as “chromosomes.” A link to the FastA (and a link to her notebook on creating this file) are in that GitHub issue link above. The transcriptome she has provided has not been subsetted in a similar fashion; maybe I’ll do that prior to alignment.
MBD Selection - M.magister Sheared Gill gDNA 8 of 24 Samples Set 1 of 3
DNA Shearing - M.magister gDNA Additional Shearing CH05-01_21 CH07-11 and Bioanalyzer
After shearing all of the M.magister gill gDNA on 20201026, there were still three samples that still had average fragment lengths that were a bit longer than desired (~750bp, but want ~250 - 550bp):
DNA Shearing - M.magister gDNA Shearing All Samples and Bioanalyzer
I previously ran some shearing tests on 20201022 to determine how many cycles to run on the sonicator (Bioruptor 300; Diagenode) to achieve an average fragment length of ~350 - 500bp in preparation for MBD-BSseq. The determination was 70 cycles (30s ON, 30s OFF; low intensity), sonicating for 35 cycles, followed by successive rounds of 5 cycles each.
DNA Shearing - M.magister CH05-21 gDNA Full Shearing Test and Bioanalyzer
Yesterday, I did some shearing of Metacarcinus magister gill gDNA on a test sample (CH05-21) to determine how many cycles to run on the sonicator (Bioruptor 300; Diagenode) to achieve an average fragment length of ~350 - 500bp in preparation for MBD-BSseq. The determination from yesterday was 70 cycles (30s ON, 30s OFF; low intensity). That determination was made by first sonicating for 35 cycles, followed by successive rounds of 5 cycles each. I decided to repeat this, except by doing it in a single round of sonication.
DNA Shearing - M.magister gDNA Shear Testing and Bioanalyzer
Steven assigned me to do some MBD-BSseq library prep (GitHub Issue) for some Dungeness crab (Metacarcinus magister) DNA samples provided by Mackenzie Gavery. The DNA was isolated from juvenile (J6/J7 developmental stages) gill tissue. One of the first steps in MBD-BSseq is to fragment DNA to a desired size (~350 - 500bp in our case). However, we haven’t worked with Metacarcinus magister DNA previously, so I need to empirically determine sonicator (Bioruptor 300; Diagenode) settings for these samples.
Read Mapping - C.bairdi 201002558-2729-Q7 and 6129-403-26-Q7 Taxa-Specific NanoPore Reads to cbai_genome_v1.01.fasta Using Minimap2 on Mox
After extracting FastQ reads using seqtk
on 20201013 from the various taxa I had been interested in, the next thing needed doing was mapping reads to the cbai_genome_v1.01
“genome” assembly from 20200917. I found that Minimap2 will map long reads (e.g. NanoPore), in addition to short reads, so I decided to give that a rip.
Data Wrangling - C.bairdi NanoPore Reads Extractions With Seqtk on Mephisto
In my pursuit to identify which contigs/scaffolds of our “C.bairdi” genome assembly from 20200917 correspond to interesting taxa, based on taxonomic assignments produced by MEGAN6 on 20200928, I used MEGAN6 to extract taxa-specific reads from cbai_genome_v1.01
on 20201007 - the output is only available in FastA format. Since I want the original reads in FastQ format, I will use the FastA sequence IDs (from the FastA index file) and provide that to seqtk
to extract the FastQ reads for each sample and corresponding taxa.
NanoPore Reads Extractions - C.bairdi Taxonomic Reads Extractions with MEGAN6 on 201002558-2729-Q7 and 6129-403-26-Q7
After completing the taxonomic comparisons of 201002558-2729-Q7 and 6129-403-26-Q7 on 20201002, I decided to extract reads assigned to the following taxa for further exploration (primarily to identify contigs/scaffolds in our cbai_genome_v1.0.fasta (19MB).
Comparison - C.bairdi 20102558-2729 vs. 6129-403-26 NanoPore Taxonomic Assignments Using MEGAN6
After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.
Taxonomic Assignments - C.bairdi 6129-403-26-Q7 NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on emu
After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.
Taxonomic Assignments - C.bairdi 20102558-2729-Q7 NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on emu
After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.
Data Wrangling - C.bairdi NanoPore 6129-403-26 Quality Filtering Using NanoFilt on Mox
Last week, I ran all of our Q7-filtered C.baird NanoPore reads through MEGAN6 to evaluate the taxonomic breakdown (on 20200917) and noticed that there were a large quantity of bases assigned to E.canceri (a known microsporidian agent of infection in crabs) and Aquifex sp. (a genus of thermophylic bacteria), in addition to the expected Arthropoda assignments. Notably, Alveolata assignments were remarkably low.
Data Wrangling - C.bairdi NanoPore 20102558-2729 Quality Filtering Using NanoFilt on Mox
Last week, I ran all of our Q7-filtered C.baird NanoPore reads through MEGAN6 to evaluate the taxonomic breakdown (on 20200917) and noticed that there were a large quantity of bases assigned to E.canceri (a known microsporidian agent of infection in crabs) and Aquifex sp. (a genus of thermophylic bacteria), in addition to the expected Arthropoda assignments. Notably, Alveolata assignments were remarkably low.
Assembly Assessment - BUSCO C.bairdi Genome v1.01 on Mox
After creating a subset of the cbai_genome_v1.0
of contigs >100bp yesterday (subset named cbai_genome_v1.01
), I wanted to generate BUSCO scores for cbai_genome_v1.01
. This is primarily just to keep info consistent on our Genomic Resources wiki, as I don’t expect these scores to differ at all from the cbai_genome_v1.0
BUSCO scores.
Data Wrangling - Subsetting cbai_genome_v1.0 Assembly with faidx
Previously assembled cbai_genome_v1.0.fasta
with our NanoPore Q7 reads on 20200917 and noticed that there were numerous sequences that were well shorter than the expected 500bp threshold that the assembler (Flye) was supposed to spit out. I created an Issue on the Flye GitHub page to find out why. The developer responded and determined it was an issue with the assembly polisher and that sequences <500bp could be safely ignored.
SRA Submissions - NanoPore C.bairdi 20102558-2729 and 6129_403_26
Submitted our C.bairdi NanoPore sequencing data from 20200109 (Sample 20102558-2729 - uninfected EtOH-preserved muscle) and from 20200311 (Sample 6129403_26 - RNAlater-preserved _Hematodinium-infected hemolymph) to the NCBI Sequencing Read Archive(SRA).
Assembly Assessment - BUSCO C.bairdi Genome v1.0 on Mox
After using Flye to perform a de novo assembly of our Q7 filtered NanoPore sequencing data on 20200917, I decided to check the “completeness” of the assembly using BUSCO on Mox.
Data Wrangling - C.bairdi NanoPore Quality Filtering Using NanoFilt on Mox
I previously converting our C.bairdi NanoPre sequencing data from the raw Fast5 format to FastQ format for our three sets of data:
Genome Assembly - C.bairdi - cbai_v1.0 - Using All NanoPore Data With Flye on Mox
After quality filtering the C.bairdi NanoPore data earlier today, I performed a de novo assembly using Flye on Mox.
Taxonomic Assignments - C.bairdi NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on swoose
Earlier today I quality filtered (>=Q7) our C.baird NanoPore reads. One of the things I’d like to do now is to attempt to filter reads taxonomically, since the NanoPore data came from both an uninfected crab and Hematodinium-infected crab.
qPCR - Geoduck Normalizing Gene Primers 28s-v4 and EF1a-v1 Tests
On Monday (20200914), I checked a set of 28s and EF1a primer sets and determined that 28s-v4 and EF1a-v1 were probably the best of the bunch, although they all looked great. So, I needed to test these out on some individual cDNA samples to see if they might be useful as normalizing genes - should have consistent Cq values across all samples/treatments.
qPCR - Geoduck Normalizing Gene Primer Checks
Shelly ordered some new primers (designed by Sam Gurr) (GitHub Issue) to potentially use as normalizing genes for her geoduck reproduction gene expression project and asked that I test them out.
Data Wrangling - Visualization of C.bairdi NanoPore Sequencing Using NanoPlot on Mox
I previously converting our C.bairdi NanoPre sequencing data from the raw Fast5 format to FastQ format for our three sets of data:
Data Wrangling - NanoPore Fast5 Conversion to FastQ of C.bairdi 6129_403_26 on Mox with GPU Node
Time to start working with the NanoPore data that I generated back in March (???!!!). In order to proceed, I first need to convert the raw Fast5 files to FastQ. To do so, I’ll use the NanoPore program guppy
.
Data Wrangling - NanoPore Fast5 Conversion to FastQ of C.bairdi 20102558-2729 Run-02 on Mox with GPU Node
Continuing to work with the NanoPore data that I generated back in January(???!!!). In order to proceed, I first need to convert the raw Fast5 files to FastQ. To do so, I’ll use the NanoPore program guppy
. I converted the first run from this flowcell earlier today.
Data Wrangling - NanoPore Fast5 Conversion to FastQ of C.bairdi 20102558-2729 Run-01 on Mox with GPU Node
Time to start working with the NanoPore data that I generated back in January(???!!!). In order to proceed, I first need to convert the raw Fast5 files to FastQ. To do so, I’ll use the NanoPore program guppy
.
Samples Submission - Supplemental Ronits C.gigas Diploid-Triploid Ctendidia gDNA for WGBS by ZymoResearch
I re-quantified samples for Zymo whole genome bisulfite sequencing (WGBS) project 3534 earlier today due to ZymoResearch indicating there was insufficient DNA for most of the samples and submitted an additional 1ug of each sample. See table below for volume of DNA sent based on today’s (20200901) quants:
DNA Quantification - Re-quant Ronits C.gigas Diploid-Triploid Ctenidia gDNA Submitted to ZymoResearch
I received notice from ZymoResearch yesterday afternoon that the DNA we sent on 20200820 for this project (Quote 3534) had insufficient DNA for sequencing for most of the samples. This was, honestly, shocking. I had even submitted well over the minimum amount of DNA required (submitted 1.75ug - only needed 1ug). So, I’m not entirely sure what happened here.
Transcriptome Annotation - Trinotate C.bairdi v2.1 on Mox
To continue annotation of our C.bairdi v2.1 transcriptome assembly], I wanted to run Trinotate.
Transcriptome Annotation - Trinotate C.bairdi v3.1 on Mox
To continue annotation of our C.bairdi v3.1 transcriptome assembly], I wanted to run Trinotate.
Transcriptome Annotation - Trinotate Hematodinium v3.1 on Mox
Transcriptome Annotation - Trinotate Hematodinium v2.1 on Mox
TransDecoder - C.bairdi Transcriptomes v2.1 and v3.1 on Mox
To continue annotation of our C.bairdi v2.1 & v3.1 transcriptome assemblies, I needed to run TransDecoder before performing the more thorough annotation with Trinotate.
qPCR - P.generosa RPL5 and TIF3s6b v2 and v3 Normalizing Gene Assessment
After testing out the RPL5 and TIF3s6b v2 and v3 primers yesterday on pooled cDNA, we determined the primers looked good, so will go forward testing them on a set of P.generosa hemolymph cDNA made by Kaitlyn on 20200212. This will evaluate whether or not these can be utilized as normalizing genes for subsequent gene expression analyses.
Sample Submitted - C.gigas Diploid-Triploid pH Treatments Ctenidia to ZymoResearch for WGBS
Submitted 1.5ug of the 24 C.gigas ctenidia ctenidia gDNA isolated last week (20200821) to ZymoResearch for whole genome bisulfite sequencing (WGBS) to compare differences in diploid/triploids and responses to elevated pH:
qPCR - P.generosa RPL5-v2-v3 and TIF3s6b-v2-v3 Primer Tests
Shelly ordered some new primers as potential normalizing genes and asked me to test them out (GitHub Issue).
DNA Isolation and Quantification - C.gigas High-Low pH Triploid and Diploid Ctenidia
Isolated DNA from 24 of the Crassostrea gigas high/low pH triploid/diploid ctenidia samples that we received yesterday from the Haws Lab. Samples selected by Steven.
Samples Received - C.gigas High-Low pH Triploid Diploid from Maria Haws Lab
Received Crassostrea gigas tissue samples from Maria Haws’ Lab at the University of Hawaii Hilo. Tissue samples are a set of triploid and diploid subjected to different pHs. All sample info is in the sample sheet provided below.
Samples Submitted - Ronits C.gigas Diploid and Triploid Ctenidia to ZymoResearch for WGBS
Submitted 1.75ug of gDNA from 10 Crassostrea gigas ctenidia samples from Ronit’s dessication/temp/ploidy experiment to ZymoResearch for whole genome bisulfite sequencing (BSseq). They will sequence to ~30x coverage, using 150bp PE reads.
Assembly Stats - C.bairdi Transcriptomes v2.1 and v3.1 Trinity Stats on Mox
Realized that transcriptomes v2.1 and v3.1 (extracted from BLASTx-annotated FastAs from 20200605) didn’t have any associated stats.
Trimming-FastQC-MultiQC - Robertos C.gigas WGBS FastQ Data with fastp FastQC and MultiQC on Mox
Steven asked me to trim Roberto’s C.gigas whole genome bisulfite sequencing (WGBS) reads (GitHub Issue) “following his methods”. The only thing specified is trimming Illumina adaptors and then trimming 10bp from the 5’ end of reads. No mention of which software was used.
TransDecoder - Hematodinium Transcriptomes v1.6, v1.7, v2.1 and v3.1 on Mox
To continue annotation of our Hematodinium v1.6, v1.7, v2.1 & v3.1 transcriptome assemblies, I needed to run TransDecoder before performing the more thorough annotation with Trinotate.
Assembly Stats - cbaiodinium Transcriptomes v2.1 and v3.1 Trinity Stats on Mox
Working on dealing with our various cbaiodinium sp. transcriptomes and realized that transcriptomes v2.1 and v3.1 (extracted from BLASTx-annotated FastAs from 20200605) didn’t have any associated stats.
Transcriptome Assessment - BUSCO Metazoa on Hematodinium v1.6 v1.7 v2.1 and v3.1 on Mox
Transcriptome Annotation - Hematodinium Transcriptomes v1.6 v1.7 v2.1 v3.1 with DIAMOND BLASTx on Mox
Needed to annotate the Hematodinium sp. transcriptomes that I’ve assembled using DIAMOND BLASTx. This will also be used for additional downstream annotation (TransDecoder, Trinotate):
qPCR - P.generosa APLP and TIF3s8-1 with cDNA
Shelly asked me to run some qPCRs (GitHub Issue), after some of the qPCR results I got from primer tests with normalzing genes and potential gene targets.
FastQ Read Alignment and Quantification - P.generosa Water Metagenomic Libraries to MetaGeneMark Assembly with Hisat2 on Mox
Continuing working on the manuscript for this data, Emma wanted the number of reads aligned to each gene. I previously created and assembly with genes/proteins using MetaGeneMark on 20190103, but the assemby process didn’t output any sort of stastics on read counts.
Primer Design and In-Silico Testing - Geoduck Reproduction Primers
Shelly asked that I re-run the primer design pipeline that Kaitlyn had previously run to design a set of reproduction-related qPCR primers. Unfortunately, Kaitlyn’s Jupyter Notebook wasn’t backed up and she accidentally deleted it, I believe, so there’s no real record of how she designed the primers. However, I do know that she was unable to run the EMBOSS primersearch tool, which will check your primers against a set of sequences for any other matches. This is useful for confirming specificity.
qPCR - Testing P.generosa Reproduction-related Primers
Ran some qPCRs on some other primers on 20200723 and then Shelly has asked me to test some additional qPCR primers that might have acceptable melt curves and be usable as normalizing genes.
SRA Submission - P.generosa Metagenomics Data
Added our P.generosa metagenomics sequencing data to NCBI sequencing read archive (SRA).
qPCR - Testing P.generosa Reproduction-related Primers
Shelly has asked me to test some qPCR primers related to geoduck reproduction.
DNA Isolation and Quantification - C.gigas Diploid (Ronit) and Triploid (Nisbet)
Isolated some gDNA from the triploid Nisbet oysters we received on 20200218 and one of Ronit’s diploid ctenidia samples (Google Sheet) using the E.Z.N.A. Mollusc DNA Kit (Omega). See the “Results” section for sample info.
SRA Submission - Geoduck Epigenetic Ocean Acidification RNAseq
Can’t remember where it was discussed (probably lab meeting), but I created a GitHub Issue to add all of geoduck RNAseq data to NCBI Short Read Archive (SRA). Anyway, got all the remaining RNAseq data uploaded to the NCBI SRA and organized into the correct BioSamples and BioProjects.
ENA Submission - Ostrea lurida draft genome Olurida_v081.fa
Submitted our Ostrea lurida v081 genome assembly FastA to the European Nucloetide Archive.
Metagenomics - Data Extractions Using MEGAN6
Decided to finally take the time to methodically extract data from our metagenomics project so that I have the tables handy when I need them and I can easily share them with other people. Previously, I hadn’t done this due to limitations on looking at the data remotely. I finally downloaded all of the RMA6 files from 20191014 after being fed up with the remote desktop connection and upgrading the size of my hard drive (5 of the six RMA6 files are >40GB in size).
Transcriptome Annotation - C.bairdi Transcriptomes v2.1 and v3.1 Using DIAMOND BLASTx on Mox
Decided to annotate the two C.bairdi transcriptomes , cbai_transcriptome_v2.1
and cbai_transcriptome_v3.1
, generated on 20200605 using DIAMOND BLASTx on Mox.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi Transcriptome v3.1
Continuing to try to identify the best C.bairdi transcriptome, we decided to extract all non-dinoflagellate sequences from cbai_transcriptome_v2.0
(RNAseq shorthand: 2018, 2019, 2020-GW, 2020-UW) and cbai_transcriptome_v3.0
(RNAseq shorthand: 2018, 2019, 2020-UW).
Transcriptome Assessment - BUSCO Metazoa on C.bairdi Transcriptome v2.1
Continuing to try to identify the best C.bairdi transcriptome, we decided to extract all non-dinoflagellate sequences from cbai_transcriptome_v2.0
(RNAseq shorthand: 2018, 2019, 2020-GW, 2020-UW) and cbai_transcriptome_v3.0
(RNAseq shorthand: 2018, 2019, 2020-UW).
Sequence Extractions - C.bairdi Transcriptomes v2.0 and v3.0 Excluding Alveolata with MEGAN6 on Swoose
Continuing to try to identify the best C.bairdi transcriptome, we decided to extract all non-dinoflagellate sequences from cbai_transcriptome_v2.0
(RNAseq shorthand: 2018, 2019, 2020-GW, 2020-UW) and cbai_transcriptome_v3.0
(RNAseq shorthand: 2018, 2019, 2020-UW). Both of these transcriptomes were assembled without any taxonomic filter applied. DIAMOND BLASTx and conversion to MEGAN6 RMA6 files was performed yesterday (20200604).
Transcriptome Annotation - C.bairdi Transcriptomes v2.0 and v3.0 with DIAMOND BLASTx on Mox
Continuing to try to identify the best C.bairdi transcriptome, we decided to extract all non-dinoflagellate sequences from cbai_transcriptome_v2.0
(RNAseq shorthand: 2018, 2019, 2020-GW, 2020-UW) and cbai_transcriptome_v3.0
(RNAseq shorthand: 2018, 2019, 2020-UW). Both of these transcriptomes were assembled without any taxonomic filter applied.
Transcriptome Comparison - C.bairdi Transcriptomes Evaluations with DETONATE on Mox
Transcriptome Comparison - C.bairdi Transcriptomes Compared with DETONATE on Mox
We’ve produced a number of C.bairdi transcriptomes and we’re interested in doing some comparisons to try to determine which one might be “best”. I previously compared the BUSCO scores of each of these transcriptomes and now will be using the DETONATE software package to perform two different types of comparisons: compared to a reference (REF-EVAL) and determine an overall quality “score” (RSEM-EVAL). I’ll be running REF-EVAL in this notebook.
Transcriptome Annotation - Trinotate C.bairdi Transcriptome-v1.7 on Mox
After creating a de novo assembly of C.bairdi transcriptome v1.7 on 20200527, performing BLASTx annotation on 202000527, and TransDecoder for ORF identification on 20200527, I continued the annotation process by running Trinotate.
Transcriptome Comparisons - C.bairdi BUSCO Scores
Since we’ve generated a number of versions of the C.bairdi transcriptome, we’ve decided to compare them using various metrics. Here, I’ve compared the BUSCO scores generated for each transcriptome using BUSCO’s built-in plotting script. The script generates a stacked bar plot of all BUSCO short summary files that it is provided with, as well as the R code used to generate the plot.
TransDecoder - C.bairdi Transcriptome v1.7 on Mox
Need to run TransDecoder on Mox on the C.bairdi transcriptome v1.7 from 20200527.
Transcriptome Annotation - C.bairdi Transcriptome v1.7 Using DIAMOND BLASTx on Mox
As part of annotating cbai_transcriptome_v1.7.fasta from 20200527, I need to run DIAMOND BLASTx to use with Trinotate.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi Transcriptome v1.7
I previously created a C.bairdi de novo transcriptome assembly v1.7 with Trinity from all our C.bairdi taxonomically filtered pooled RNAseq samples on 20200527 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Assembly - C.bairdi All Pooled Arthropoda-only RNAseq Data with Trinity on Mox
For completeness sake, I wanted to create an additional C.bairdi transcriptome assembly that consisted of Arthropoda only sequences from just pooled RNAseq data (since I recently generated a similar assembly without taxonomically filtered reads on 20200518). This constitutes samples we have designated: 2018, 2019, 2020-UW. A de novo assembly was run using Trinity on Mox. Since all pooled RNAseq libraries were stranded, I added this option to Trinity command.
Transcriptome Annotation - Trinotate C.bairdi Transcriptome-v3.0 on Mox
After performing de novo assembly on all of our Tanner crab RNAseq data (no taxonomic filter applied, either) on 20200518, I continued the annotation process by running Trinotate.
Transcriptome Assembly - P.trituberculatus (Japanese blue crab) NCBI SRA BioProject PRJNA597187 Data with Trinity on Mox
After generating a number of C.bairdi (Tanner crab) transcriptomes, we decided we should compare them to evaluate which to help decide which one should become our “canonical” version. As part of that, the Trinity wiki offers a list of tools that one can use to check the quality of transcriptome assemblies. Some of those require a transcriptome of a related species.
SRA Library Assessment - Determine RNAseq Library Strandedness from P.trituberculatus SRA BioProject PRJNA597187
We’ve produced a number of C.bairid transcriptomes utilizing different assembly approaches (e.g. Arthropoda reads only, stranded libraries only, mixed strandedness libraries, etc) and we want to determine which of them is “best”. Trinity has a nice list of tools to assess the quality of transcriptome assemblies, but most of the tools rely on comparison to a transcriptome of a related species.
Tutorial - SRA Toolkit for Data Retrieval and Conversion to FastQ
I was looking for some crab transcriptomic data today and, unable to find any previously assembled transcriptomes, turned to the good ol’ NCBI SRA. In order to simplify retrieval and conversion of SRA data, need to use the SRA Toolkit software suite. Since I haven’t used this in many years, I figured I might as well put together a brief guide/tutorial so I can refer back to it in the future.
Transcriptome Annotation - Trinotate C.bairdi Transcriptome-v1.6 on Mox
After creating a de novo assembly of C.bairdi transcriptome v1.6 on 20200518, performing BLASTx annotation on 202000519, and TransDecoder for ORF identification on 20200519, I continued the annotation process by running Trinotate.
TransDecoder - C.bairdi Transcriptome v1.6 on Mox
Need to run TransDecoder on Mox on the C.bairdi transcriptome v1.6 from 20200518.
TransDecoder - C.bairdi Transcriptome v3.0 from 20200518 on Mox
Need to run TransDecoder on Mox on the C.bairdi transcriptome v3.0 from 20200518.
Transcriptome Annotation - C.bairdi Transcriptome v1.6 Using DIAMOND BLASTx on Mox
As part of annotating cbai_transcriptome_v1.6.fasta from 20200518, I need to run DIAMOND BLASTx to use with Trinotate.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi Transcriptome v1.6
I previously created a C.bairdi de novo transcriptome assembly v1.6 with Trinity from all our C.bairdi taxonomically filtered RNAseq on 20200518 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Annotation - C.bairdi Transcriptome v3.0 Using DIAMOND BLASTx on Mox
As part of annotating cbai_transcriptome_v3.0.fasta from 20200518, I need to run DIAMOND BLASTx to use with Trinotate.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi Transcriptome v3.0
I previously created a C.bairdi de novo transcriptome assembly with Trinity from all our C.bairdi pooled RNAseq (not taxonomically filtered) on 20200518 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Assembly - C.bairdi All Arthropoda-specific RNAseq Data with Trinity on Mox
I realized I hadn’t performed taxonomic read separation from one set of RNAseq data we had. And, since I was on a transcriptome assembly kick, I figured I’d generate another C.bairdi transcriptome that included only Arthropoda-specific sequence data from all of our RNAseq.
Data Wrangling - Arthropoda and Alveolata D26 Pool RNAseq FastQ Extractions
After using MEGAN6 to extract Arthropoda and Alveolata reads from our RNAseq data on 20200114, I had then extracted taxonomic-specific reads and aggregated each into basic Read 1 and Read 2 FastQs to simplify transcriptome assembly for C.bairdi and for Hematodinium. That was fine and all, but wasn’t fully thought through.
Transcriptome Assembly - C.bairdi All Pooled RNAseq Data Without Taxonomic Filters with Trinity on Mox
Steven asked that I assemble a transcriptome with just our pooled C.bairdi RNAseq data (not taxonomically filtered; see the FastQ list file linked in the Results section below). This constitutes samples we have designated: 2018, 2019, 2020-UW. A de novo assembly was run using Trinity on Mox. Since all pooled RNAseq libraries were stranded, I added this option to Trinity command.
Transcriptome Annotation - Trinotate C.bairdi Transcriptome v2.0 from 20200502 on Mox
After performing de novo assembly on all of our Tanner crab RNAseq data (no taxonomic filter applied, either) on 20200502 and performing BLASTx annotation on 20200508, I continued the annotation process by running Trinotate.
TransDecoder - C.bairdi Transcriptome v2.0 from 20200502 on Mox
Need to run TransDecoder on Mox on the C.bairdi transcriptome v2.0 from 20200502.
Transcriptome Annotation - C.bairdi Transcriptome v2.0 Using DIAMOND BLASTx on Mox
As part of annotating the C.bairdi v2.0 transcriptome assembly from 20200502, I need to run DIAMOND BLASTx to use with Trinotate.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi v2.0 Transcriptome
I previously created a C.bairdi de novo transcriptome assembly with Trinity using all existing, unfiltered (i.e. no taxonomic selection) RNAseq data on 20200502 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Assembly - C.bairdi All RNAseq Data Without Taxonomic Filters with Trinity on Mox
Steven asked that I assemble an unfiltered (i.e. no taxonomic selection) transcriptome with all of our C.bairdi RNAseq data (see the FastQ list file linked in the Results section below). A de novo assembly was run using Trinity on Mox. It should be noted that this assembly is a mixture of stranded/non-stranded library preps.
Transcript Abundance - C.bairdi Alignment-free with Salmon Using 2020-GW Data on Mox
Clarified with Steven an approach for tackling multi-condition comparisons (see this GitHub Issue). As such, I need to have individual transcript abundances for each sample from the 2020 Genewiz RNAseq data before I can proceed. So, I ran salmon (v1.2.1) to perform an alignment-free set of transcript abundances. It’s ridiculously fast, btw…
GO to GOslim - C.bairdi Enriched GO Terms from 20200422 DEGs
After running pairwise comparisons and identify differentially expressed genes (DEGs) on 20200422 and finding enriched gene ontology terms, I decided to map the GO terms to Biological Process GOslims. Additionally, I decided to try another level of comparison (I’m not sure how valid it is), whereby I will count the number of GO terms assigned to each GOslim and then calculate the percentage of GOterms that get assigned to each of the GOslim categories. The idea being that it might help identify Biological Processes that are “favored” in a given set of DEGs. I decided to set up “fancy” pyramid plots to view a given set of GO-GOslims for each DEG comparison.
FastQC-MultiQC - Laura Spencer’s QuantSeq Data
Laura Spencer received her O.lurida QuantSeq data, so I put it through FastQC/MultiQC and put the pertinent info in the nightingales Google Sheet. I also moved the data to /owl/nightingales/O_lurida
, updated the readme file and checksums file. There were 148 individual samples, so I won’t list them all here.
Gene Expression - C.bairdi Pairwise DEG Comparisons with 2019 RNAseq using Trinity-Salmon-EdgeR on Mox
Per a Slack request, Steven asked me to take the Genewize RNAseq data (received 2020318) through edgeR. Ran the analysis using the Trinity differential expression pipeline:
RNAseq Reads Extractions - C.bairdi Taxonomic Reads Extractions with MEGAN6 on swoose
Transcript Abundance - C.bairdi Alignment-free with Salmon on Mox for Grace
Per this GitHub Issue, Grace and Steven asked if I could help by generating a transcript abundance file for Grace to use with EdgeR. To do so, I used Salmon for alignment-free transcript abundance estimates due to its speed and its incorporation into Trinity with the following files:
SRA Submission - C.bairdi RNAseq Data
Since we received the last of our RNAseq data for this project on 20200413, I submitted all of it to the NCBI Sequencing Read Archive (SRA). Data was released today and all accession numbers can be found in the table below:
TrimmingFastQCMultiQC—C.bairdi-RNAseq-FastQ-with-fastp-on-Mox
After receiving our RNAseq data from Genewiz earlier today, needed to trim and check trimmed reads with FastQC.
Taxonomic Assignments - C.bairdi RNAseq Using DIAMOND BLASTx on Mox and MEGAN6 Meganizer on swoose
After receiving/trimming the latest round of C.bairdi RNAseq data on 20200413, need to get the data ready to perform taxonomic selection of sequencing reads. To do this, I first need to run DIAMOND BLASTx, then “meganize” the output files in preparation for loading into MEGAN6, which will allow for taxonomic-specific read separation.
FastQC-MultiQC - C.bairdi Raw RNAseq from NWGSC
Yesterday, we received the last of the RNAseq data for the C.bairdi crab project from NWGSC. FastQC, followed by MultiQC was run on the raw FastQ reads on my computer (swoose).
Data Wrangling - Arthropoda and Alveolata Day and Treatment Taxonomic RNAseq FastQ Extractions
After using MEGAN6 to extract Arthropoda and Alveolata reads from our RNAseq data on 20200330, I had then extracted taxonomic-specific reads and aggregated each into basic Read 1 and Read 2 FastQs to simplify transcriptome assembly for C.bairdi and for Hematodinium. That was fine and all, but wasn’t fully thought through.
Data Received - C.bairdi RNAseq from NWGSC
Received the C.bairdi RNAseq data from UW NWGSC that Grace submitted on 20200131. Below is a table of NWGSC sample name and corresponding labels that Grace provided. The 0/1 indicates uninfected/infected, respectively.
Transcriptome Annotation - Trinotate C.bairdi MEGAN6 Taxonomic-specific Trinity Assembly on Mox
After performing de novo assembly on our Tanner crab MEGAN6 taxonomic-specific RNAseq data on 20200330 and performing BLASTx annotation on 20200408, I continued the annotation process by running Trinotate.
Transcriptome Annotation - C.bairdi MEGAN Trinity Assembly Using DIAMOND BLASTx on Mox
As part of annotating the most recent transcriptome assembly from the MEGAN6 Arthropoda taxonomic-specific reads, I need to run DIAMOND BLASTx to use with Trinotate.
Transcriptome Annotation - Trinotate Hematodinium MEGAN6 Taxonomic-specific Trinity Assembly on Mox
After performing de novo assembly on our Hematodinium MEGAN6 taxonomic-specific RNAseq data on 20200330 and performing BLASTx annotation on 20200331, I continued the annotation process by running Trinotate.
Transdecoder - Hematodinium MEGAN6 Taxonomic-Specific Reads Assembly from 20200330
Ran Trinity to de novo assembly on the the Alveolata MEGAN6 taxonomic-specific RNAseq data on 20120330 and now will begin annotating the transcriptome using TransDecoder on Mox.
Transdecoder - C.bairdi MEGAN6 Taxonomic-Specific Reads Assembly from 20200330
Ran Trinity to de novo assembly on the the Arthropoda MEGAN6 taxonomic-specific RNAseq data on 20120330 and now will begin annotating the transcriptome using TransDecoder on Mox.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi MEGAN Transcriptome
I previously created a C.bairdi de novo transcriptome assembly with Trinity from the MEGAN6 taxonomic-specific reads for Arthropoda on 20200330 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Annotation - Hematodinium MEGAN Trinity Assembly Using DIAMOND BLASTx on Mox
As part of annotating the most recent transcriptome assembly from the MEGAN6 Hematodinium taxonomic-specific reads, I need to run DIAMOND BLASTx to use with Trinotate.
Transcriptome Assessment - BUSCO Metazoa on Hematodinium MEGAN Transcriptome
I previously created a C.bairdi de novo transcriptome assembly with Trinity from the MEGAN6 taxonomic-specific reads for Alveolata on 20200331 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Assembly - Hematodinium with MEGAN6 Taxonomy-specific Reads with Trinity on Mox
Ran a de novo assembly using the extracted reads classified under Alveolata from:
Transcriptome Assembly - C.bairdi with MEGAN6 Taxonomy-specific Reads with Trinity on Mox
Ran a de novo assembly using the extracted reads classified under Arthropoda from:
RNAseq Reads Extractions - C.bairdi Taxonomic Reads Extractions with MEGAN6 on swoose
Transcriptome Annotation - C.bairdi Using DIAMOND BLASTx on Mox and MEGAN6 Meganizer on swoose
After receiving/trimming the latest round of C.bairdi RNAseq data on 20200318, need to get the data ready to perform taxonomic selection of sequencing reads. To do this, I first need to run DIAMOND BLASTx, then “meganize” the output files in preparation for loading into MEGAN6, which will allow for taxonomic-specific read separation.
Trimming/FastQC/MultiQC - C.bairdi RNAseq FastQ with fastp on Mox
After receiving our RNAseq data from Genewiz earlier today, needed to run FastQC, trim, check trimmed reads with FastQC.
Data Received - C.bairdi RNAseq Data from Genewiz
We received the RNAseq data from the RNA that was sent out by Grace on 20200212.
DNA Isolation and Quantification - C.bairdi Hemocyte Pellets in RNAlater
Isolated DNA from 22 samples (see Qubit spreadsheet in “Results” below for sample IDs) using the Quick DNA/RNA Microprep Kit (ZymoResearch; PDF) according to the manufacturer’s protocol for liquids/cells in RNAlater.
NanoPore Sequencing - C.bairdi gDNA 6129_403_26
After getting high quality gDNA from Hematodinium-infected C.bairdi hemolymph on 2020210 we decided to run some of the sample on the NanoPore MinION, since the flowcells have a very short shelf life. Additionally, the results from this will also help inform us on whether this sample might worth submitting for PacBio sequencing. And, of course, this provides us with additional sequencing data to complement our previous NanoPore runs from 20200109.
qPCR - C.bairdi RNA Check for Residual gDNA
Previuosly checked existing crab RNA for residual gDNA on 20200226 and identified samples with yields that were likely too low, as well as samples with residual gDNA. For those samples, was faster/easier to just isolate more RNA and perform the in-column DNase treatment in the ZymoResearch Quick DNA/RNA Microprep Plus Kit; this keeps samples concentrated. So, I isolated more RNA on 20200306 and now need to check for residual gDNA.
Trimming/MultiQC - Methcompare Bisulfite FastQs with fastp on Mox
Steven asked me to trim a set of FastQ files, provided by Hollie Putnam, in preparation for methylation analysis using Bismark. The analysis is part of a coral project comparing DNA methylation profiles of different species, as well as comparing different sample prep protocols. There’s a dedicated GitHub repo here:
RNA Isolation and Quantification - C.bairdi RNA from Hemolymph Pellets in RNAlater
Based on qPCR results testing for residual gDNA from 20200225, a set of 24 samples were identified that required DNase treatment and/or additional RNA. I opted to just isolate more RNA from all samples, since the kit includes a DNase step and avoids diluting the existing RNA using the Turbo DNA-free Kit that we usully use. Isolated RNA using the Quick DNA/RNA Microprep Kit (ZymoResearch; PDF) according to the manufacturer’s protocol for liquids/cells in RNAlater.
Data Wrangling - Create Canonical Olurida_v081 Genes FastA
I finally had some time to tackle this GitHub Issue and create a canonical genes FastA file using the MAKER IDs, instead of the original contig IDs from our Olympia oyster genome assembly - https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa (FastA; 1.1GB).
qPCR - C.bairdi RNA Check for Residual gDNA
After deciding on a primer set to use for gDNA detection on 20200225, went ahead and ran a qPCR on most of the RNA samples described in Grace’s Google Sheet. Some samples were not run, as they had not yet been located at the time I began the qPCR.
qPCR - C.bairdi Primer Tests on gDNA
We received the primers I ordered on 20200220 and now need to test them to see if they detect gDNA. If yes, then they’re good candidates to assess the presence of residual gDNA in our RNA samples before we proceed with reverse transcription.
Primer Design - C.bairdi Primers for Checking RNA for Residual gDNA
Getting ready to run some qPCRs and first we need to confirm that our RNA is actually DNA-free. Before we can do that, we need some primers to use, so I decided to semi-arbitrarily select three different gene targets from our MEGAN6 taxonomic-specific Trinity assembly from 20200122.
DNA Isolation & Quantification - C.bairdi RNA from Samples 6212_132_9 6212_334_12 6212_485_26
Isolated DNA from three samples (see Qubit spreadsheet in “Results” below for sample IDs) using the Quick DNA/RNA Microprep Kit (ZymoResearch; PDF) according to the manufacturer’s protocol for liquids/cells in RNAlater.
RNA Isolation & Quantification - C.bairdi RNA from Samples 6212_132_9 6212_334_12 6212_485_26
We are supposed to get RNA sent out for sequencing today, but it turns out that a few of the designated samples have insufficient RNA in them. So, I’m going to attempt to isolate enough RNA from the following samples in order to have enough RNA to send to Genewiz today:
RNA Isolation & Quantification - C.bairdi RNA from Sample 6129_403_26
Since I was isolating gDNA from C.bairdi 6129_403_26 hemolymph, I figured I might as well co-isolate RNA since I was using the Quick DNA/RNA Microprep Plus Kit (ZymoResearch).
DNA Isolation & Quantification - Additional C.bairdi gDNA from Sample 6129_403_26
Earlier today I isolated gDNA from C.bairi 6129_403_26 hemolymph pellets and recovered decently intact gDNA that could be used for sequencing. However, I still need more gDNA, so will isolate that (and co-isolate RNA, since I’m going through the procedure anyway) using the rest of the sample using the Quick DNA/RNA Microprep Plus Kit (ZymoResearch).
DNA Isolation, Quantification, and Gel - C.bairdi gDNA Sample 6129_403_26
In order to do some genome sequencing on C.bairid and Hematodinium, we need hihg molecular weight gDNA. I attempted this twice before, using two different methods (Quick DNA/RNA Microprep Kit (ZymoResearch) on 20200122 and the E.Z.N.A Mollusc DNA Kit (Omega) on 20200108) using ~10yr old ethanol-preserved tissue provided by Pam Jensen. Both methods yielded highly degrade gDNA. So, I’m now attempting to get higher quality gDNA from the RNAlater-preserved hemolymph pellets from this experiment.
Transcriptome Assessment - BUSCO Metazoa on Hematodinium MEGAN Transcriptome
I previously created a Hematodinium de novo transcriptome assembly with Trinity from the MEGAN6 taxonomic-specific reads for Alveolata on 20200122 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi MEGAN Transcriptome
I previously created a C.bairdi de novo transcriptome assembly with Trinity from the MEGAN6 taxonomic-specific reads for Arthropoda on 20200122 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Gene Expression - Hematodinium MEGAN6 with Trinity and EdgeR
After completing annotation of the Hematodinium MEGAN6 taxonomic-specific Trinity assembly using Trinotate on 20200126, I performed differential gene expression analysis and gene ontology (GO) term enrichment analysis using Trinity’s scripts to run EdgeR and GOseq, respectively. The comparison listed below is the only comparison possible, as there were no reads present in the uninfected Hematodinium extractions.
Gene Expression - C.bairdi MEGAN6 with Trinity and EdgeR
After completing annotation of the C.bairdi MEGAN6 taxonomic-specific Trinity assembly using Trinotate on 20200126, I performed differential gene expression analysis and gene ontology (GO) term enrichment analysis using Trinity’s scripts to run EdgeR and GOseq, respectively, across all of the various treatment comparisons. The comparison are listed below and link to each individual SBATCH script (GitHub) used to run these on Mox.
Data Wrangling - Arthropoda and Alveolata Day and Treatment Taxonomic RNAseq FastQ Extractions
After using MEGAN6 to extract Arthropoda and Alveolata reads from our RNAseq data on 20200114, I had then extracted taxonomic-specific reads and aggregated each into basic Read 1 and Read 2 FastQs to simplify transcriptome assembly for C.bairdi and for Hematodinium. That was fine and all, but wasn’t fully thought through.
DNA Isolation and Quantification - C.bairdi Hemocyte Pellets in RNAlater
Isolated DNA from 56 samples (see Qubit spreadsheet in “Results” below for sample IDs) using the Quick DNA/RNA Microprep Kit (ZymoResearch; PDF) according to the manufacturer’s protocol for liquids/cells in RNAlater.
Transcriptome Annotation - Trinotate Hematodinium MEGAN6 Taxonomic-specific Trinity Assembly on Mox
After performing de novo assembly on our Hematodinium MEGAN6 taxonomic-specific RNAseq data on 20200122 and performing BLASTx annotation on 20200123, I continued the annotation process by running Trinotate.
Transcriptome Annotation - Trinotate C.bairdi MEGAN6 Taxonomic-specific Trinity Assembly on Mox
After performing de novo assembly on our Tanner crab MEGAN6 taxonomic-specific RNAseq data on 20200122 and performing BLASTx annotation on 20200123, I continued the annotation process by running Trinotate.
RNA Isolation and Quantification - C.bairdi Hemocyte Pellets in RNAlater
Isolated RNA from the following hemolymph pellet samples:
RNA Isolation and Quantification - C.bairdi Hemocyte Pellets in RNAlater
Isolated RNA from the following hemolymph pellet samples:
Transdecoder - Hematodinium MEGAN6 Taxonomic-Specific Reads Assembly from 20200122
Ran Trinity to de novo assembly on the the C.bairdi MEGAN6 taxonomic-specific RNAseq data on 201200122 and now will begin annotating the transcriptome using TransDecoder on Mox.
Transdecoder - C.bairdi MEGAN6 Taxonomic-Specific Reads Assembly from 20200122
Ran Trinity to de novo assembly on the the Hematodinium MEGAN6 taxonomic-specific RNAseq data on 201200122 and now will begin annotating the transcriptome using TransDecoder on Mox.
Transcriptome Annotation - Hematodinium MEGAN Trinity Assembly Using DIAMOND BLASTx on Mox
As part of annotating the transcriptome assembly from the MEGAN6 Hematodinium taxonomic-specific reads, I need to run DIAMOND BLASTx to use with Trinotate.
Transcriptome Annotation - C.bairdi MEGAN Trinity Assembly Using DIAMOND BLASTx on Mox
As part of annotating the transcriptome assembly from the MEGAN6 C.bairdi taxonomic-specific reads, I need to run DIAMOND BLASTx to use with Trinotate.
RNA Isolation and Quantification - C.bairdi Hemocyte Pellets in RNAlater Troubleshooting
After the failure to obtain RNA from any C.bairdi hemocytes pellets (out of 24 samples processed) on 20200117, I decided to isolate RNA from just a subset of that group to determine if I screwed something up last time or something. Also, I am testing two different preparations of the kit-supplied DNase I: one Kaitlyn prepped and a fresh preparation that I made. Admittedly, I’m not doing the “proper” testing by trying the different DNase preps on the same exact sample, but it’ll do. I just want to see if I get some RNA from these samples this time…
DNA Quality Assessment - Agarose Gel for C.bairdi 20102558-2729 gDNA from 20200122
Earlier today, I isolated gDNA from C.bairdi 20102558-2729 ethanol-preserved muscle tissue using the Quick DNA/RNA MicroPrep Plus Kit (ZymoResearch) and prepared the tissue in three different ways to see how they would compare:
Data Wrangling - Arthropoda and Alveolata Taxonomic RNAseq FastQ Extractions
After using MEGAN6 to extract Arthropoda and Alveolata reads from our RNAseq data on 20200114 (for reference, these include RNAseq data using a newly established “shorthand”: 2018, 2019), I realized that the FastA headers were incomplete and did not distinguish between paired reads. Here’s an example:
DNA Isolation - C.bairdi 20102558-2729 EtOH-preserved Tissue via Three Variations Using Quick DNA-RNA MicroPrep Kit
Previously, I isolated gDNA from a C.bairdi EtOH-preserved muscle sample (20102558-2729) on 20200108 using the E.Z.N.A. Mollusc DNA Kit (Omega). Although the yields were excellent, the DNA looked completely degraded on a gel and running that DNA on a minION flowcell yielded relatively short reads (which wasn’t terribly surprising).
Transcriptome Assembly - Hematodinium with MEGAN6 Taxonomy-specific Reads with Trinity on Mox
Ran a de novo assembly using the extracted reads classified under Alveolata from 20200122 The assembly was performed with Trinity on Mox.
Transcriptome Assembly - C.bairdi with MEGAN6 Taxonomy-specific Reads with Trinity on Mox
Ran a de novo assembly using the extracted reads classified under Arthropoda from 20200122 (for reference, these include RNAseq data using a newly established “shorthand”: 2018, 2019). The assembly was performed with Trinity on Mox.
DNA Isolation and Quantification - C.bairdi Hemolymph Pellets in RNAlater
Isolated DNA from the following 23 samples:
RNA Isolation and Quantification - C.bairdi Hemolymph Pellets in RNAlater
TL;DR - Recovered absolutely no RNA from any sample! However, I did recover DNA from each sample.
RNAseq Reads Extractions - C.bairdi Taxonomic Reads Extractions with MEGAN6 on swoose
I previously ran BLASTx and “meganized” the output DAA files on 20200103 (for reference, these include RNAseq data using a newly established “shorthand”: 2018, 2019) and now need to use MEGAN6 to bin the results into the proper taxonomies. This is accomplished using the MEGAN6 graphical user interface (GUI). This is how the process goes:
Lab Maintenance - Cluster UPS Battery Replacement
Replaced the batteries on one of the APC uninterruptable power supplies (UPS) on our local server cabinet.
NanoPore Sequencing - C.bairdi gDNA Sample 20102558-2729
I performed the initial Lambda sequencing test on 20200107 and everything went smoothly, so I’m ready to give the NanoPore (ONT) MinION a run with an actual sample!
DNA Quality Assessment - Agarose Gel and NanoDrop on C.bairdi gDNA
I isolated C.bairdi gDNA yesterday (20200108) and now want to get an idea if it’s any good (i.e. no contaminants, high molecule weight).
DNA Isolation and Quantification - C.bairdi gDNA from EtOH Preserved Tissue
I isolated gDNA from ethanol-preserved C.bairdi muscle tissue from sample 20102558-2729 (SPNO-ReferenceNO). This sample was chosen as it had 0
in the SMEAR_result
and BCS_PCR_results
columns, indicating it should be free of Hematodinium. See the sample spreadsheet linked below for more info.
NanoPore Sequencing - Initial NanoPore MinION Lambda Sequencing Test
We recently acquired a NanoPore MinION sequencer, FLO-MIN106 flow cell and the Rapid Sequencing Kit (SQK-RAD004). The NanoPore website provides a pretty thorough an user-friendly walk-through of how to begin using the system for the first time. With that said, I believe the user needs to have a registered account with NanoPore and needs to have purchased some products to have full access to the protocols they provide.
Transcriptome Annotation - C.bairdi Using DIAMOND BLASTx on Mox and MEGAN6 Meganizer
Although I previously annotated our C.bairdi transcriptome from 20191218, I realized that the assembly and annotations were combine infected/uninfected samples, possibly making separating crab/Hematodinium sequences a bit more difficult.
Reagent Prep - RNA Pico Ladder Aliquoting and Testing
Created aliquots of RNA Pico Ladder received on 20191101, per the manufacturer’s instructions. Created single-use aliquots of 1.5uL and stored at -80oC: