- Miscellaneous 909
- Geoduck Genome Sequencing 97
- Olympia Oyster Genome Sequencing 72
- Olympia oyster reciprocal transplant 67
- Tanner Crab RNAseq 55
- PROPS 36
- Computer Servicing 28
- Samples Submitted 26
- Crassostrea gigas larvae OA (2011) bisulfite sequencing 24
- LSU C.virginica Oil Spill MBD BS Sequencing 22
- 2bRAD Library Tests for Sequencing at Genewiz 22
- Genotype-by-sequencing at BGI 22
- Goals 20
- Samples Received 17
- Protein expression profiles during sexual maturation in Geoduck 14
- Lineage-specific DNA methylation patterns in developing oysters 11
- BS-seq Libraries for Sequencing at Genewiz 11
- Daily Bits 11
- Sea star RNA-seq 10
- Data Received 9
- E5 9
- Reagent Prep 8
- MBD Enrichment for Sequencing at ZymoResearch 8
- SRA Submissions 6
- SRA Submission 6
- Olympia Oyster Genome Assembly 4
- Project Summary 4
- Myostatin Interacting Proteins 3
- CEABIGR 2
- 1
- 1
- Miscellneous 1
- Tutorials 1
- Sample Submission 1
- Genome Assembly 1
- Data received 1
- Computer 1
- Servicing 1
- Monthly Goals 1
- Data Management 1
Miscellaneous
lncRNA Expression - P.generosa lncRNA Expression Using StringTie
After identifying lncRNA in P.generosa, Steven asked that I generate an tissue-specific expression/count matrix (GitHub Issue). Looking through the documentation for StringTie
, I decided that StringTie
would work for this. The overall approach:
lncRNA Identification - P.generosa lncRNAs using CPC2 and bedtools
After trimming P.generosa RNA-seq reads on 20230426 and then aligning and annotating them to the Panopea-generosa-v1.0 genome on 20230426, I proceeded with the final step of lncRNA identification. To do this, I used Zach’s notebook entry on lncRNA identification for guidance. I utilized the annotated GTF generated by gffcompare
during the alignment/annotation step on 20230426. I used ‘bedtools getfasta](https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html) and [
CPC2` with an aribtrary 200bp minimum length to identify lncRNAs. All of this was done in a Jupyter Notebook (links below).
Containers - Apptainer Explorations
At some point, our HPC nodes on Mox will be retired. When that happens, we’ll likely purchase new nodes on the newest UW cluster, Klone. Additionally, the coenv
nodes are no longer available on Mox. One was decommissioned and one was “migrated” to Klone. The primary issue at hand is that the base operating system for Klone appears to be very, very basic. I’d previously attempted to build/install some bioinformatics software on Klone, but could not due to a variety of missing libraries; these libraries are available by default on Mox… Part of this isn’t surprising, as UW IT has been making a concerted effort to get users to switch to containerization - specifically using Apptainer (formerly Singularity) containers.
Transcript Alignments - P.generosa RNA-seq Alignments for lncRNA Identification Using Hisat2 StingTie and gffcompare on Mox
This is a continuation of the process for identification of lncRNAs,. I aligned FastQs which were previously trimmed earlier today to our Panopea-generosa-v1.0 genome FastA using HISAT2
. I used the HISAT2
genome index created on 20190723, which was created with options to identify exons and splice sites. The GFF used was from 20220323. StringTie
was used to identify alternative transcripts, assign expression values, and create expression tables for use with ballgown
. The job was run on Mox.
FastQ Trimming and QC - P.generosa RNA-seq Data from 20220323 on Mox
Addressing the update to this GitHub Issue regarding identifying Panopea generosa (Pacific geoduck) long non-coding RNAs (lncRNAs), I used the RNA-seq data from the Nextflow NF-Core RNAseq pipeline run on 20220323. Although that data was supposed to have been trimmed in the Nextflow NF-Core RNA-seq pipeline, the FastQC reports still show adapter contamination and some funky stuff happening at the 5’ end of the reads. So, I’ve opted to trim the “trimmed” files with fastp
, using a hard 20bp trim at the 5’ end of all reads. FastQC
and MultiQC
were run before/after trimming. Job was run on Mox.
Data Wrangling - Append Gene Ontology Aspect to P.generosa Primary Annotation File
Steven tasked me with updating our P.generosa genome annotation file (GitHub Issue) a while back and I finally managed to get it all figured out. Although I wanted to perform most of this using the GSEAbase package (PDF), as this package is geared towards storage/retrieval of gene set data, I eventually decided to abondon this approach due to the time it was taking and my lack of familiarity/understanding of how to manipulate objects in R. Despite that, GSEAbase
was still utilized for its very simple use for identifying GOlims (IDs and Terms).
Genome Indexing - P.verrucosa v1.0 Assembly with HiSat2 on Mox
Working on this Issue regarding adding coral genomes to our Handbook (GitHub) and needed to generate a HISAT2
index to add to The Roberts Lab Handbook Genomic Resources.
Genome Indexing - M.capitata HIv3 Assembly with HiSat2 on Mox
Working on this Issue regarding adding coral genomes to our Handbook (GitHub) and needed to generate a HISAT2
index to add to The Roberts Lab Handbook Genomic Resources.
Genome Indexing - P.acuta HIv2 Assembly with HiSat2 on Mox
Working on this Issue regarding adding coral genomes to our Handbook (GitHub) and needed to generate a HISAT2
index to add to The Roberts Lab Handbook Genomic Resources.
Data Wrangling - P.verrucosa Genome GFF to GTF Using gffread
As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the P.verrucosa genome file (Pver_genome_assembly_v1.0.fasta
) using HISAT2
, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread
to do this on my computer. Process is documented in Jupyter Notebook linked below.
Data Wrangling - M.capitata Genome GFF to GTF Using gffread
As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the M.capitata genome file (Montipora_capitata_HIv3.assembly.fasta
) using HISAT2
, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread
to do this on my computer. Process is documented in Jupyter Notebook linked below.
Data Wrangling - P.acuta Genome GFF to GTF Conversion Using gffread
As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the P.acuta genome file using HISAT2
, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread
to do this on my computer. Process is documented in Jupyter Notebook linked below.
Genome Indexing - P.verrucosa NCBI GCA_014529365.1 with HiSat2 on Mox
Working on this Issue regarding adding coral genomes to our Handbook (GitHub) and needed to generate a HISAT2
index to add to The Roberts Lab Handbook Genomic Resources.
Genome Indexing - M.capitata NCBI GCA_006542545.1 with HiSat2 on Mox
Working on this Issue regarding adding coral genomes to our Handbook (GitHub) and needed to generate a HISAT2
index to add to The Roberts Lab Handbook Genomic Resources.
SRA Data - Coral SRA BioProject PRJNA744403 Download and QC
Per this GitHub Issue, Steven wanted me to download all of the SRA data (RNA-seq and WGBS-seq) from NCBI BioProject PRJNA744403 and run QC on the data.
BS-Seq Analysis - Nextflow EpiDiverse SNP Pipeline for Haws Hawaii C.gigas BAMs from Yaamini Base Config
Yaamini asked me to run the epidiverse/snp
pipeline (GitHub Issue) on her Haws Crassostrea gigas (Pacific oyster) Hawaii bisuflite sequencing BAMs for SNP identification.
BS-Seq Analysis - Nextflow EpiDiverse SNP Pipeline for Haws Hawaii C.gigas BAMs from Yaamini
Yaamini asked me to run the epidiverse/snp
pipeline (GitHub Issue) on her Haws Crassostrea gigas (Pacific oyster) Hawaii bisuflite sequencing BAMs for SNP identification.
Data Wrangling - C.virginica NCBI GCF_002022765.2 GFF to Gene and Pseudogene Combined BED File
Working on the CEABIGR project, I was preparing to make a gene expression file to use in CIRCOS (GitHub Issue) when I realized that the Ballgown gene expression file (CSV; GitHub) had more genes than the C.virginica genes BED file we were using. After some sleuthing, I discovered that the discrepancy was caused by the lack of pseudogenes in the genes BED file I was using. Although it might not really have any impact on things, I thought it would still be prudent to have a BED file that completely matched all of the genes in the Ballgown gene expression file. Plus, having the pseudogenes might be of longterm usefulness if we we ever decide to evalute the role of long non-coding RNAs (lncRNAs) in this project.
BSseq SNP Analysis - Nextflow EpiDiverse SNP Pipeline for C.virginica CEABIGR BSseq data
Steven asked that I identify SNPs from our C.virginica CEABIGR BSseq data (GitHub Issue). So, I ran sorted, deduplicated Bismark BAMs that Steven generated through the EpiDiverse/snp Nextflow pipeline. The job was run on Mox.
RNAseq Alignments - P.generosa Alignments and Alternative Transcript Identification Using Hisat2 and StringTie on Mox
As part of identifying long non-coding RNA (lncRNA) in Pacific geoduck(GitHub Issue), one of the first things that I wanted to do was to gather all of our geoduck RNAseq data and align it to our geoduck genome. In addition to the alignments, some of the examples I’ve been following have also utilized expression levels as one aspect of the lncRNA selection criteria, so I figured I’d get this info as well.
FastQ Trimming - Geoduck RNAseq Data Using fastp on Mox
Per this GitHub Issue, Steven asked me to identify long non-coding RNA (lncRNA) in geoduck. The first step is to aggregate all of our Panopea generosa (Pacific geoduck) RNAseq data and get it all trimmed. After that, align it to the genome, followed by Ballgown expression analysis, and then followed by a variety of selection criteria to parse out lncRNAs.
FastQ Trimming and QC - C.virginica Larval BS-seq Data from Lotterhos Lab and Part of CEABIGR Project Using fastp on Mox
We had some old Crassostrea virginica (Eastern oyster) larval/zygote BS-seq data from the Lotterhos Lab that’s part of the CEABiGR Workshop/Project (GitHub Repo) and Steven asked that I QC/trim it in this GitHub Issue.
Data Wrangling - Convert S.namaycush NCBI GFF to genes-only BED file for Use in Ballgown Analysis
In preparation for isoform identificaiton/quantification in S.namaycush RNAseq data, Ballgown will need a genes-only BED file. To generate, I used GFFutils to extract only genes from the NCBI GFF: GCF_016432855.1_SaNama_1.0_genomic.gff
. All code was documented in the following Jupyter Notebook.
Splice Site Identification - S.namaycush Liver Parasitized and Non-Parasitized SRA RNAseq Using Hisat2-Stingtie with Genome GCF_016432855.1
After previously downloading/trimming/QCing S.namaycush SRA liver RNAseq data on 20220706, Steven asked that I run through Hisat2 for splice site identification (GitHub Issue).
qPCR - Repeat of Mussel Gill Heat Stress cDNA with Ferritin Primers
My previous qPCR on these cDNA using ferritin primers (SRIDs: 1808, 1809) resulted in no amplification. This was a bit surprising and makes me suspect that I screwed up somewhere (not adding primer(s)??), so I decided to repeat the qPCR. I made fresh working primer stocks and used 1uL of cDNA for each reaction. All reactions were run in duplicate on our CFX Connect thermalcycler (BioRad) with SsoFast EVAgreen Master Mix (BioRad). See my previous post linked above for qPCR master mix calcs.
qPCR - Dorothys Mussel cDNA from 20220726
Ran qPCRs on Dorothy’s mussel gill cDNA from 20220726 using the following primers:
RNA Isolation and Quantification - Dorothy Mussel Gill Samples
Isolated RNA from a subset of Dorothy’s mussel gill samples:
BS-seq and SNP Analysis - Nextflow EpiDiverse Pipelines Trials and Tribulations
Alrighty, this notebook entry is going to have a lot to unpack, as the process to get these pipelines running and then deal with the actual data we wanted to run them with was quite involved. However, the TL;DR of this all is this:
SRA Data - S.namaycush SRA BioProject PRJNA674328 Download and QC
Per this GitHub Issue, which I accidentally forgot about for three weeks (!), Steven wanted me to download the lake trout (Salvelinus namaycush) RNAseq data from NCBI BioProject PRJNA674328 and run QC on the data.
Data Wrangling - Create Primary P.generosa Genome Annotation File
Steven asked me to create a canonical genome annotation file (GitHub Issue). I needed/wanted to create a file containing gene IDs, SwissProt (SP) IDs, gene names, gene descriptions, and gene ontology (GO) accessions. To do so, I utilized the NCBI BLAST and DIAMOND BLAST annotations generated by our GenSas P.generosa genome annotation. Per Steven’s suggestion, I used the best match (i.e. lowest e-value
) for any given gene between the two files.
RNA Isolation - O.nerka Berdahl Brain Tissues
Nextflow - Trials and Tribulations of Installing and Using NF-Core RNAseq
INSTALLATION
Data Wrangling - P.generosa Genomic Feature FastA Creation
Steven wanted me to generate FastA files (GitHub Issue) for Panopea generosa (Pacific geoduck) coding sequences (CDS), genes, and mRNAs. One of the primary needs, though, was to have an ID that could be used for downstream table joining/mapping. I ended up using a combination of GFFutils and bedtools getfasta
. I took advantage of being able to create a custom name
column in BED files to generate the desired FastA description line having IDs that could identify, and map, CDS, genes, and mRNAs across FastAs and GFFs.
Differential Gene Expression - P.generosa DGE Between Tissues Using Nextlow NF-Core RNAseq Pipeline on Mox
Steven asked that I obtain relative expression values for various geoduck tissues (GitHub Issue). So, I decided to use this as an opportunity to try to use a Nextflow pipeline. There’s an RNAseq pipeline, NF-Core RNAseq which I decided to use. The pipeline appears to be ridiculously thorough (e.g. trims, removes gDNA/rRNA contamination, allows for multiple aligners to be used, quantifies/visualizes feature assignments by reads, performs differential gene expression analysis and visualization), all in one package. Sounds great, but I did have some initial problems getting things up and running. Overall, getting things set up to actually run took longer than the actual pipeline run! Oh well, it’s a learning process, so that’s not totally unexpected.
Data Analysis - C.virginica BSseq Unmapped Reads Using MEGAN6
After performing DIAMOND BLASTx and DAA “meganization” on 20220302, the next step was to import the DAA files into MEGAN6 for analyzing the resulting taxonomic assignments of the Crassostrea virginica (Eastern oyster) unmapped BSseq reads that Steven generated.
Data Analysis - C.virginica RNAseq Zymo ZR4059 Analyzed by ZymoResearch
After realizing that the Crassostrea virginica (Eastern oyster) RNAseq data had relatively low alignment rates (see this notebook entry from 20220224 for a bit more background), I contacted ZymoResearch to see if they had any insight on what might be happening. I suspected rRNA contamination. ZymoResearch was kind enough to run the RNAseq data through their pipeline and provided us. This notebook entry provides a brief overview and thoughts on the report.
Taxonomic Assignment - C.virginica BSseq Unmapped Reads Using DIAMOND BLASTx and MEGAN6 on Mox
After mapping bisulfite sequencing (BSseq) data to the Crassostrea virginica (Eastern oyster) genome, Steven noticed that there were a large number of unmapped reads. He asked that I attempt to taxonomically claissify the unmapped reads (GitHub Issue), with the idea that maybe these reads could provide additional data on an associated microbiome (GitHub Discussion).
Data Wrangling - P.generosa Genome GFF Conversion to GTF Using gffread
Steven asked in this GitHub Issue to convert our Panopea generosa (Pacific geoduck) genomic GFF to a GTF for use in the 10x Genomics Cell Ranger software. This conversion was performed using GffRead in a Jupyter Notebook.
Transcript Identification and Alignments - C.virginica RNAseq with NCBI Genome GCF_002022765.2 Using Hisat2 and Stringtie on Mox
After an additional round of trimming yesterday, I needed to identify alternative transcripts in the Crassostrea virginica (Eastern oyster) gonad RNAseq data we have. I previously used HISAT2
to index the NCBI Crassostrea virginica (Eastern oyster) genome and identify exon/splice sites on 20210720. Then, I used this genome index to run StringTie
on Mox in order to map sequencing reads to the genome/alternative isoforms.
Trimming - Additional 20bp from C.virginica Gonad RNAseq with fastp on Mox
When I previously aligned trimmed RNAseq reads to the NCBI C.virginica genome (GCF_002022765.2) on 20210726, I specifically noted that alignment rates were consistently lower for males than females. However, I let that discrepancy distract me from a the larger issue: low alignment rates. Period! This should have thrown some red flags and it eventually did after Steven asked about overall alignment rate for an alignment of this data that I performed on 20220131 in preparation for genome-guided transcriptome assembly. The overall alignment rate (in which I actually used the trimmed reads from 20210714) was ~67.6%. Realizing this was a on the low side of what one would expect, it prompted me to look into things more and I came across a few things which led me to make the decision to redo the trimming:
Data Wrangling - C.virginica lncRNA Extractions from NCBI GCF_002022765.2 Using GffRead
Continuing to work on our Crassostrea virginica (Eastern oyster) project examining the effects of OA on female and male gonads (GitHub repo), Steven tasked me with parsing out long, non-coding RNAs (GitHub Issue). To do so, I relied on the NCBI genome and associated files/annotations. I used GffRead, GFFutils, and samtools. The process was documented in the followng Jupyter Notebook:
Transcriptome Assembly - Genome-guided C.virginica Adult Gonad OA RNAseq Using Trinity on Mox
As part of this project, Steven’s asked that I identify long, non-coding RNAs (lncRNAs) (GitHub Issue) in the Crassostrea virginica (Eastern oyster) adult OA gonad RNAseq data we have. The initial step for this is to assemble transcriptome. I generated the necessary BAM alignment on 20220131. Next was to actually get the transcriptome assembled. I followed the Trinity
genome-guided procedure.
RNAseq Alignment - C.virginica Adult OA Gonad Data to GCF_002022765.2 Genome Using HISAT2 on Mox
As part of this project, Steven’s asked that I identify long, non-coding RNAs (lncRNAs) (GitHub Issue) in the Crassostrea virginica (Eastern oyster) adult OA gonad RNAseq data we have. The initial step for this is to assemble transcriptome. Since there is a published genome (NCBI RefSeq GCF_002022765.2C_virginica-3.0)](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/022/765/GCF_002022765.2_C_virginica-3.0/) for [_Crassostrea virginica (Eastern oyster), I will perform a genome-guided assembly using Trinity
. That process requires a sorted BAM file as input. In order to generate that file, I used HISAT2
. I’ve already generated the necessary HISAT2
genome index files (as of 20210720), which also identified/incorporated splice sites and exons, which the HISAT2
alignment process requires to run.
Data Wrangling - C.virginica Gonad RNAseq Transcript Counts Per Gene Per Sample Using Ballgown
As we continue to work on the analysis of impacts of OA on Crassostrea virginica (Eastern oyster) gonads via DNA methylation and RNAseq (GitHub repo), we decided to compare the number of transcripts expressed per gene per sample (GitHub Issue). As it turns out, it was quite the challenge. Ultimately, I wasn’t able to solve it myself, and turned to StackOverflow for a solution. I should’ve just done this at the beginning, as I got a response (and solution) less than five minutes after posting! Regardless, the data wrangling progress (struggle?) was documented in the following GitHub Discussion:
RNA Isolation - O.nerka Berdahl Brain Tissues
RNA Isolation - M.trossulus Gill
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):
RNA Isolation - M.trossulus Gill and Phenol Gland
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):
RNA Isolation - M.trossulus Gill and Phenol Gland
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):
RNA Isolation - M.trossulus Gill and Phenol Gland
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):
RNA Isolation - M.trossulus Phenol Gland and Gill
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):
RNA Isolation - M.trossulus Foot and Phenol Gland
As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot” and the “PG” indicates “phenol gland” tissues):
RNA Isolation - O.nerka Berdahl Brain Tissues
RNA Isolation - O.nerka Berdahl Tissues
Finally got around to tackling this GitHub issue regarding isolating RNA from some Oncorhynchus nerka (sockeye salmon) tissues we have from Andrew Berdahl’s lab (a UW SAFS professor) to use for RNAseq and/or qPCR. We have blood, brain, gonad, and liver samples from individual salmon from two different groups: territorial and social individuals. We’ve decided to isolate RNA from brain, gonads, and liver from two individuals within each group. All samples are preserved in RNAlater and stored @ -80oC.
Data Wrangling - C.virginica NCBI GCF_002022765.2 GFF to Gene BED File
When working to identify differentially expressed transcripts (DETs) and genes (DEGs) for our Crassostrea virginica (Eastern oyster) RNAseq/DNA methylation comparison of changes across sex and ocean acidification conditions (https://github.com/epigeneticstoocean/2018_L18-adult-methylation), I realized that the DEG tables I was generating had excessive gene counts due to the fact that the analysis (and, in turn, the genome coordinates), were tied to transcripts. Thus, genes were counted multiple times due to the existence of multiple transcripts for a given gene, and the analysis didn’t list gene coordinate data - only transcript coordinates.
Differential Transcript Expression - C.virginica Gonad RNAseq Using Ballgown
In preparation for differential transcript analysis, I previously ran our RNAseq data through StringTie
on 20210726 to identify and quantify transcripts. Identification of differentially expressed transcripts (DETs) and genes (DEGs) will be performed using ballgown
. This notebook entry will be different than most others, as this notebook entry will simply serve as a “landing page” to access/review the analysis; as the analysis will evolve over time and won’t exist as a single computing job with a definitive endpoint.
Transcript Identification and Quantification - C.virginia RNAseq With NCBI Genome GCF_002022765.2 Using StringTie on Mox
After having run HISAT2
to index and identify exons and splice sites in the NCBI Crassostrea virginica (Eastern oyster) genome (GCF_002022765.2) on 20210720, the next step was to identify and quantify transcripts from the RNAseq data using StringTie
.
Genome Annotations - Splice Site and Exon Extractions for C.virginica GCF_002022765.2 Genome Using Hisat2 on Mox
Previously performed quality trimming on the Crassostrea virginica (Eastern oyster) gonad/sperm RNAseq data on 20210714. Next, I needed to identify exons and splice sites, as well as generate a genome index using HISAT2
to be used with StringTie
downstream to identify potential alternative transcripts. This utilized the following NCBI genome files:
Trimming - C.virginica Gonad RNAseq with FastP on Mox
Needed to trim the Crassostrea virginica (Eastern oyster) gonad RNAseq data we received on 20210528.
FastQC-MultiQC - Yaamini’s C.virginica RNAseq and WGBS from ZymoResearch on Mox
Finally got around to running FastQC
on Yaamini’s RNAseq and WGBS sequencing data recieved on 20210528.
Data Wrangling - S.salar Gene Annotations from NCBI RefSeq GCF_000233375.1_ICSASG_v2_genomic.gff for Shelly
Shelly posted a GitHub Issue asking if I could create a file of S.salar genes with their UniProt annotations (e.g. gene name, UniProt accession, GO terms).
RepeatMasker - C.gigas Rosling NCBI Genome GCA_902806645.1 on Mox
Decided to tackle this GitHub Issue about creating a transposable elements IGV track with the new Roslin C.gigas genome, since it had been sitting for a while and I have code sitting around that’s ready to roll for this type of thing.
Singularity - RStudio Server Container on Mox
Aidan recently needed to use R on a machine with more memory. Additionally, it would be ideal if he could use RStudio. So, I managed to figure out how to set up a Singularity container running rocker/rstudio.
Read Mapping - 10x-Genomics Trimmed FastQ Mapped to P.generosa v1.0 Assembly Using Minimap2 for BlobToolKit on Mox
To continue towards getting our Panopea generosa (Pacific geoduck) genome assembly (v1.0) analyzed with BlobToolKit, per this GitHub Issue, I’ve decided to run each aspect of the pipeline manually, as I continue to have issues utilizing the automatic pipeline. As such, I’ve run minimap2
according to the BlobToolKit “Getting Started” guide on Mox. This will map the trimmed 10x-Genomics reads from 20210401 to the Panopea-generosa-v1.0.fa assembly (FastA; 914MB).
Genome Annotation - P.generosa v1.0 Assembly Using DIAMOND BLASTx for BlobToolKit on Mox
To continue towards getting our Panopea generosa (Pacific geoduck) genome assembly (v1.0) analyzed with BlobToolKit, per this GitHub Issue, I’ve decided to run each aspect of the pipeline manually, as I continue to have issues utilizing the automatic pipeline. As such, I’ve run DIAMOND
BLASTx according to the BlobToolKit “Getting Started” guide on Mox.
Genome Annotation - P.generosa v1.0 Assembly Using BLASTn for BlobToolKit on Mox
To continue towards getting our Panopea generosa (Pacific geoduck) genome assembly (v1.0) analyzed with BlobToolKit, per this GitHub Issue, I’ve decided to run each aspect of the pipeline manually, as I continue to have issues utilizing the automatic pipeline. As such, I’ve run BLASTn
according to the BlobToolKit “Getting Started” guide on Mox.
Trimming P.generosa 10x Genomics HiC FastQs with fastp on Mox
Steven asked me to try running Blob Tool Kit to identify potential contaminating sequence in our Panopea generosa (Pacific geoduck) genome assembly (v1.0). In preparation for running Blob Tool Kit, I needed to trim the 10x Genomics FastQ data used by Phase Genomics. Files were trimmed using fastp
on Mox.
Transcriptome Annotation - Trinotate on C.bairdi Transcriptome v4.0 on Mox
Continued annotation of cbai_transcriptome_v4.0.fasta
[Trinity de novo assembly from 20210317(https://robertslab.github.io/sams-notebook/2021/03/17/Transcriptome-Assembly-C.bairdi-Transcriptome-v4.0-Using-Trinity-on-Mox.html)] using Trinotate
on Mox. This will provide a thorough annotation, including genoe ontology (GO) term assignments to each contig.
Transcriptome Annotation - DIAMOND BLASTx on C.bairdi Transcriptome v4.0 on Mox
Continued annotation of cbai_transcriptome_v4.0.fasta
[Trinity de novo assembly from 20210317(https://robertslab.github.io/sams-notebook/2021/03/17/Transcriptome-Assembly-C.bairdi-Transcriptome-v4.0-Using-Trinity-on-Mox.html)] using DIAMOND
BLASTx on Mox. This will be used as a component of Trinotate annotation downstream.
Transcriptome Annotation - Trinotate Hematodinium v1.7 on Mox
Transcriptome Annotation - Trinotate Hematodinium v1.6 on Mox
Transcriptome Assembly - Hematodinium Transcriptomes v1.6 and v1.7 with Trinity on Mox
I’d previously assembled hemat_transcriptome_v1.0.fasta
on 20200122, hemat_transcriptome_v1.5.fasta
on 20200408, extracted hemat_transcriptome_v2.1.fasta
from an existing FastA on 20200605, as well as extracted hemat_transcriptome_v3.1.fasta
on 20200605.
Data Wrangling - Gene ID Extraction from P.generosa Genome GFF Using Methylation Machinery Gene IDs
Per this GitHub issue, Steven provided a list of methylation-related gene names and wanted to extract the corresponding Panopea generosa ([Pacific geoduck (Panopea generosa)](http://en.wikipedia.org/wiki/Geoduck)) gene ID from our P.generosa genome, along with corresponding BLAST
e-values.
Data Wrangling - Gene ID Extraction from P.generosa Genome GFF Using Methylation Machinery List
Per this GitHub Issue Steven asked that I take a list of gene names associated with DNA methylation and see if I could extract a list of Panopea generosa (Panopea generosa) gene IDs and corresponding BLAST e-values for each from our P.generosa genome annotation (see Genomic Resources wiki for more info).
Data Received - Anthopleura elegantissima - aggregating anenome - NanoPore Genome Sequence from Jay Dimond
Jay asked me to help get his A.elegantissima (aggregating anenome) NanoPore gDNA sequencing data submitted to NCBI Sequencing Read Archive (SRA). He sent a hard drive (HDD) with all the NanoPore sequencing Fast5 files. The HDD was received on 2/2/2021. Here’re are details provided in the reamde file in the Ae_ONT directory.
Samples Submitted - M.magister MBD-BSseq Libraries to Univ. of Oregon GC3F
Submitted the M.magister MBD-BSseq libraries created 20201124 using the 4nM aliquots created for the MiSeq test run on 20201202 to the Univ. of Oregon GC3F sequencing core.
FastQC-MultiQC - M.magister MBD-BSseq Pool Test MiSeq Run on Mox
Earlier today we received the M.magister (C.magister; Dungeness crab) MiSeq data from Mac.
Data Received - M.magister MBD-BSseq Pool Test MiSeq Run
After creating _M.magister (C.magister; Dungeness crab) MBD-BSseq libraries (on 20201124), I gave the pooled set of samples to Mac for a test sequencing run on the MiSeq on 20201202.
Alignment - C.gigas RNAseq to GCF_000297895.1_oyster_v9 Genome Using STAR on Mox
Mac was getting some weird results when mapping some single cell RNAseq data to the C.gigas mitochondrial (mt) genome that she had, so she asked for some help mapping other C.gigas RNAseq data (GitHub Issue) to the C.gigas mt genome to see if someone else would get similar results.
Trimming - Haws Lab C.gigas Ploidy pH WGBS 10bp 5 and 3 Prime Ends Using fastp and MultiQC on Mox
Making the assumption that the 24 C.gigas ploidy pH WGBS data we receved 20201205 will be analyzed using Bismark
, I decided to go ahead and trim the files according to Bismark
guidelines for libraries made with the ZymoResearch Pico MethylSeq Kit.
FastQC-MultiQc - C.gigas Ploidy pH WGBS Raw Sequence Data from Haws Lab on Mox
Yesterday (20201205), we received the whole genome bisulfite sequencing (WGBS) data back from ZymoResearch from the 24 C.gigas diploid/triploid subjected to two different pH treatments (received from the Haws’ Lab on 20200820 that we submitted to ZymoResearch on 20200824. As part of our standard sequencing data receipt pipeline, I needed to generate FastQC
files for each sample.
Trimming - Ronits C.gigas Ploidy WGBS 10bp 5 and 3 Prime Ends Using fastp and MultiQC on Mox
Steven asked me to trim (GitHub Issue) Ronit’s WGBS sequencing data we received on 20201110, according to Bismark
guidelines for libraries made with the ZymoResearch Pico MethylSeq Kit.
Library Quantification - M.magister MBD BSseq Libraries with Qubit
After reviewing the Bionalyzer assays for the MBD BSseq libraries Mac indicated she’d like to have the libraries quantified using the Qubit.
Trimming - Ronits C.gigas Ploidy WGBS Using fastp and MultiQC on Mox
Steven asked me to trim (GitHub Issue) Ronit’s WGBS sequencing data we received on 20201110, according to Bismark
guidelines for libraries made with the ZymoResearch Pico MethylSeq Kit.
Bioanalyzer - M.magister MBD BSseq Libraries
MBD BSseq library construction was completed yesterday (20201124). Next, I needed to evaluate the libraries using the Roberts Lab Bioanalyzer 2100 (Agilent) to assess library sizes, yields, and qualities (i.e. primer dimers).
MBD BSseq Library Prep - M.magister MBD-selected DNA Using Pico Methyl-Seq Kit
After finishing the final set of eight MBD selections on 20201103, I’m finally ready to make the BSseq libraries using the Pico Methyl-Seq Library Prep Kit (ZymoResearch) (PDF). I followed the manufacturer’s protocols with the following notes/changes (organized by each section in the protocol):
RNA Isolation and Quantification - P.generosa Hemocytes from Shelly
Shelly asked me to isolate RNA from some P.generosa hemocytes (GitHub Issue) that she had.
FastQC-MultiQc - C.gigas Ploidy WGBS Raw Sequence Data from Ronits Project on Mox
Transcriptome Assessment - Crustacean Transcripome Completeness Evaluation Using BUSCO on Mox
Grace was recently working on writing up a manuscript which did a basic comparison of our C.bairdi transcriptome (cbai_transcriptome_v3.1
) (see the Genomic Resources wiki for more deets) to two other species’ transcriptome assemblies. We wanted BUSCO evaluations as part of this comparison, but the two other species did not have BUSCO scores in their respective publications. As such, I decided to generate them myself, as BUSCO runs very quickly. The job was run on Mox.
Data Wrangling - MultiQC on S.salar RNAseq from fastp and HISAT2 on Mox
In Shelly’s GitHub Issue for this S.salar project, she also requested a MultiQC
report for the trimming (completed on 20201029) and the genome alignments (completed on 20201103).
RNAseq Alignments - S.salar HISAT2 BAMs to GCF_000233375.1_ICSASG_v2_genomic.gtf Transcriptome Using StringTie on Mox
This is a continuation of addressing Shelly Trigg’s (regarding some Salmo salar RNAseq data) request (GitHub Issue) to trim (completed 20201029), perform genome alignment (completed on 20201103), and transcriptome alignment.
RNAseq Alignments - Trimmed S.salar RNAseq to GCF_000233375.1_ICSASG_v2_genomic.fa Using Hisat2 on Mox
This is a continuation of addressing Shelly Trigg’s (regarding some Salmo salar RNAseq data) request (GitHub Issue) to trim (completed 20201029), perform genome alignment, and transcriptome alignment.
MBD Selection - M.magister Sheared Gill gDNA 16 of 24 Samples Set 3 of 3
Click here for notebook on the first eight samples processed. Click here for the second set of eight samples processed. M.magister (Dungeness crab) gill gDNA provided by Mackenzie Gavery was previously sheared on 20201026 and three samples were subjected to additional rounds of shearing on 20201027, in preparation for methyl bidning domain (MBD) selection using the MethylMiner Kit (Invitrogen).
MBD Selection - M.magister Sheared Gill gDNA 8 of 24 Samples Set 2 of 3
Click here for notebook on the first eight samples processed. M.magister (Dungeness crab) gill gDNA provided by Mackenzie Gavery was previously sheared on 20201026 and three samples were subjected to additional rounds of shearing on 20201027, in preparation for methyl bidning domain (MBD) selection using the MethylMiner Kit (Invitrogen).
Trimming - Shelly S.salar RNAseq Using fastp and MultiQC on Mox
Shelly asked that I trim, align to a genome, and perform transcriptome alignment counts in this GitHub issue with some Salmo salar RNAseq data she had and, using a subset of the NCBI Salmo salar RefSeq genome, GCF_000233375.1. She created a subset of this genome using only sequences designated as “chromosomes.” A link to the FastA (and a link to her notebook on creating this file) are in that GitHub issue link above. The transcriptome she has provided has not been subsetted in a similar fashion; maybe I’ll do that prior to alignment.
MBD Selection - M.magister Sheared Gill gDNA 8 of 24 Samples Set 1 of 3
DNA Shearing - M.magister gDNA Additional Shearing CH05-01_21 CH07-11 and Bioanalyzer
After shearing all of the M.magister gill gDNA on 20201026, there were still three samples that still had average fragment lengths that were a bit longer than desired (~750bp, but want ~250 - 550bp):
DNA Shearing - M.magister gDNA Shearing All Samples and Bioanalyzer
I previously ran some shearing tests on 20201022 to determine how many cycles to run on the sonicator (Bioruptor 300; Diagenode) to achieve an average fragment length of ~350 - 500bp in preparation for MBD-BSseq. The determination was 70 cycles (30s ON, 30s OFF; low intensity), sonicating for 35 cycles, followed by successive rounds of 5 cycles each.
DNA Shearing - M.magister CH05-21 gDNA Full Shearing Test and Bioanalyzer
Yesterday, I did some shearing of Metacarcinus magister gill gDNA on a test sample (CH05-21) to determine how many cycles to run on the sonicator (Bioruptor 300; Diagenode) to achieve an average fragment length of ~350 - 500bp in preparation for MBD-BSseq. The determination from yesterday was 70 cycles (30s ON, 30s OFF; low intensity). That determination was made by first sonicating for 35 cycles, followed by successive rounds of 5 cycles each. I decided to repeat this, except by doing it in a single round of sonication.
DNA Shearing - M.magister gDNA Shear Testing and Bioanalyzer
Steven assigned me to do some MBD-BSseq library prep (GitHub Issue) for some Dungeness crab (Metacarcinus magister) DNA samples provided by Mackenzie Gavery. The DNA was isolated from juvenile (J6/J7 developmental stages) gill tissue. One of the first steps in MBD-BSseq is to fragment DNA to a desired size (~350 - 500bp in our case). However, we haven’t worked with Metacarcinus magister DNA previously, so I need to empirically determine sonicator (Bioruptor 300; Diagenode) settings for these samples.
Read Mapping - C.bairdi 201002558-2729-Q7 and 6129-403-26-Q7 Taxa-Specific NanoPore Reads to cbai_genome_v1.01.fasta Using Minimap2 on Mox
After extracting FastQ reads using seqtk
on 20201013 from the various taxa I had been interested in, the next thing needed doing was mapping reads to the cbai_genome_v1.01
“genome” assembly from 20200917. I found that Minimap2 will map long reads (e.g. NanoPore), in addition to short reads, so I decided to give that a rip.
Data Wrangling - C.bairdi NanoPore Reads Extractions With Seqtk on Mephisto
In my pursuit to identify which contigs/scaffolds of our “C.bairdi” genome assembly from 20200917 correspond to interesting taxa, based on taxonomic assignments produced by MEGAN6 on 20200928, I used MEGAN6 to extract taxa-specific reads from cbai_genome_v1.01
on 20201007 - the output is only available in FastA format. Since I want the original reads in FastQ format, I will use the FastA sequence IDs (from the FastA index file) and provide that to seqtk
to extract the FastQ reads for each sample and corresponding taxa.
NanoPore Reads Extractions - C.bairdi Taxonomic Reads Extractions with MEGAN6 on 201002558-2729-Q7 and 6129-403-26-Q7
After completing the taxonomic comparisons of 201002558-2729-Q7 and 6129-403-26-Q7 on 20201002, I decided to extract reads assigned to the following taxa for further exploration (primarily to identify contigs/scaffolds in our cbai_genome_v1.0.fasta (19MB).
Comparison - C.bairdi 20102558-2729 vs. 6129-403-26 NanoPore Taxonomic Assignments Using MEGAN6
After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.
Taxonomic Assignments - C.bairdi 6129-403-26-Q7 NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on emu
After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.
Taxonomic Assignments - C.bairdi 20102558-2729-Q7 NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on emu
After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.
Data Wrangling - C.bairdi NanoPore 6129-403-26 Quality Filtering Using NanoFilt on Mox
Last week, I ran all of our Q7-filtered C.baird NanoPore reads through MEGAN6 to evaluate the taxonomic breakdown (on 20200917) and noticed that there were a large quantity of bases assigned to E.canceri (a known microsporidian agent of infection in crabs) and Aquifex sp. (a genus of thermophylic bacteria), in addition to the expected Arthropoda assignments. Notably, Alveolata assignments were remarkably low.
Data Wrangling - C.bairdi NanoPore 20102558-2729 Quality Filtering Using NanoFilt on Mox
Last week, I ran all of our Q7-filtered C.baird NanoPore reads through MEGAN6 to evaluate the taxonomic breakdown (on 20200917) and noticed that there were a large quantity of bases assigned to E.canceri (a known microsporidian agent of infection in crabs) and Aquifex sp. (a genus of thermophylic bacteria), in addition to the expected Arthropoda assignments. Notably, Alveolata assignments were remarkably low.
Assembly Assessment - BUSCO C.bairdi Genome v1.01 on Mox
After creating a subset of the cbai_genome_v1.0
of contigs >100bp yesterday (subset named cbai_genome_v1.01
), I wanted to generate BUSCO scores for cbai_genome_v1.01
. This is primarily just to keep info consistent on our Genomic Resources wiki, as I don’t expect these scores to differ at all from the cbai_genome_v1.0
BUSCO scores.
Data Wrangling - Subsetting cbai_genome_v1.0 Assembly with faidx
Previously assembled cbai_genome_v1.0.fasta
with our NanoPore Q7 reads on 20200917 and noticed that there were numerous sequences that were well shorter than the expected 500bp threshold that the assembler (Flye) was supposed to spit out. I created an Issue on the Flye GitHub page to find out why. The developer responded and determined it was an issue with the assembly polisher and that sequences <500bp could be safely ignored.
Assembly Assessment - BUSCO C.bairdi Genome v1.0 on Mox
After using Flye to perform a de novo assembly of our Q7 filtered NanoPore sequencing data on 20200917, I decided to check the “completeness” of the assembly using BUSCO on Mox.
Data Wrangling - C.bairdi NanoPore Quality Filtering Using NanoFilt on Mox
I previously converting our C.bairdi NanoPre sequencing data from the raw Fast5 format to FastQ format for our three sets of data:
Taxonomic Assignments - C.bairdi NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on swoose
Earlier today I quality filtered (>=Q7) our C.baird NanoPore reads. One of the things I’d like to do now is to attempt to filter reads taxonomically, since the NanoPore data came from both an uninfected crab and Hematodinium-infected crab.
qPCR - Geoduck Normalizing Gene Primers 28s-v4 and EF1a-v1 Tests
On Monday (20200914), I checked a set of 28s and EF1a primer sets and determined that 28s-v4 and EF1a-v1 were probably the best of the bunch, although they all looked great. So, I needed to test these out on some individual cDNA samples to see if they might be useful as normalizing genes - should have consistent Cq values across all samples/treatments.
qPCR - Geoduck Normalizing Gene Primer Checks
Shelly ordered some new primers (designed by Sam Gurr) (GitHub Issue) to potentially use as normalizing genes for her geoduck reproduction gene expression project and asked that I test them out.
Data Wrangling - Visualization of C.bairdi NanoPore Sequencing Using NanoPlot on Mox
I previously converting our C.bairdi NanoPre sequencing data from the raw Fast5 format to FastQ format for our three sets of data:
Data Wrangling - NanoPore Fast5 Conversion to FastQ of C.bairdi 6129_403_26 on Mox with GPU Node
Time to start working with the NanoPore data that I generated back in March (???!!!). In order to proceed, I first need to convert the raw Fast5 files to FastQ. To do so, I’ll use the NanoPore program guppy
.
Data Wrangling - NanoPore Fast5 Conversion to FastQ of C.bairdi 20102558-2729 Run-02 on Mox with GPU Node
Continuing to work with the NanoPore data that I generated back in January(???!!!). In order to proceed, I first need to convert the raw Fast5 files to FastQ. To do so, I’ll use the NanoPore program guppy
. I converted the first run from this flowcell earlier today.
Data Wrangling - NanoPore Fast5 Conversion to FastQ of C.bairdi 20102558-2729 Run-01 on Mox with GPU Node
Time to start working with the NanoPore data that I generated back in January(???!!!). In order to proceed, I first need to convert the raw Fast5 files to FastQ. To do so, I’ll use the NanoPore program guppy
.
DNA Quantification - Re-quant Ronits C.gigas Diploid-Triploid Ctenidia gDNA Submitted to ZymoResearch
I received notice from ZymoResearch yesterday afternoon that the DNA we sent on 20200820 for this project (Quote 3534) had insufficient DNA for sequencing for most of the samples. This was, honestly, shocking. I had even submitted well over the minimum amount of DNA required (submitted 1.75ug - only needed 1ug). So, I’m not entirely sure what happened here.
Transcriptome Annotation - Trinotate Hematodinium v3.1 on Mox
Transcriptome Annotation - Trinotate Hematodinium v2.1 on Mox
qPCR - P.generosa RPL5 and TIF3s6b v2 and v3 Normalizing Gene Assessment
After testing out the RPL5 and TIF3s6b v2 and v3 primers yesterday on pooled cDNA, we determined the primers looked good, so will go forward testing them on a set of P.generosa hemolymph cDNA made by Kaitlyn on 20200212. This will evaluate whether or not these can be utilized as normalizing genes for subsequent gene expression analyses.
Sample Submitted - C.gigas Diploid-Triploid pH Treatments Ctenidia to ZymoResearch for WGBS
Submitted 1.5ug of the 24 C.gigas ctenidia ctenidia gDNA isolated last week (20200821) to ZymoResearch for whole genome bisulfite sequencing (WGBS) to compare differences in diploid/triploids and responses to elevated pH:
qPCR - P.generosa RPL5-v2-v3 and TIF3s6b-v2-v3 Primer Tests
Shelly ordered some new primers as potential normalizing genes and asked me to test them out (GitHub Issue).
DNA Isolation and Quantification - C.gigas High-Low pH Triploid and Diploid Ctenidia
Isolated DNA from 24 of the Crassostrea gigas high/low pH triploid/diploid ctenidia samples that we received yesterday from the Haws Lab. Samples selected by Steven.
Samples Submitted - Ronits C.gigas Diploid and Triploid Ctenidia to ZymoResearch for WGBS
Submitted 1.75ug of gDNA from 10 Crassostrea gigas ctenidia samples from Ronit’s dessication/temp/ploidy experiment to ZymoResearch for whole genome bisulfite sequencing (BSseq). They will sequence to ~30x coverage, using 150bp PE reads.
Assembly Stats - C.bairdi Transcriptomes v2.1 and v3.1 Trinity Stats on Mox
Realized that transcriptomes v2.1 and v3.1 (extracted from BLASTx-annotated FastAs from 20200605) didn’t have any associated stats.
Trimming-FastQC-MultiQC - Robertos C.gigas WGBS FastQ Data with fastp FastQC and MultiQC on Mox
Steven asked me to trim Roberto’s C.gigas whole genome bisulfite sequencing (WGBS) reads (GitHub Issue) “following his methods”. The only thing specified is trimming Illumina adaptors and then trimming 10bp from the 5’ end of reads. No mention of which software was used.
TransDecoder - Hematodinium Transcriptomes v1.6, v1.7, v2.1 and v3.1 on Mox
To continue annotation of our Hematodinium v1.6, v1.7, v2.1 & v3.1 transcriptome assemblies, I needed to run TransDecoder before performing the more thorough annotation with Trinotate.
Assembly Stats - cbaiodinium Transcriptomes v2.1 and v3.1 Trinity Stats on Mox
Working on dealing with our various cbaiodinium sp. transcriptomes and realized that transcriptomes v2.1 and v3.1 (extracted from BLASTx-annotated FastAs from 20200605) didn’t have any associated stats.
Transcriptome Assessment - BUSCO Metazoa on Hematodinium v1.6 v1.7 v2.1 and v3.1 on Mox
Transcriptome Annotation - Hematodinium Transcriptomes v1.6 v1.7 v2.1 v3.1 with DIAMOND BLASTx on Mox
Needed to annotate the Hematodinium sp. transcriptomes that I’ve assembled using DIAMOND BLASTx. This will also be used for additional downstream annotation (TransDecoder, Trinotate):
qPCR - P.generosa APLP and TIF3s8-1 with cDNA
Shelly asked me to run some qPCRs (GitHub Issue), after some of the qPCR results I got from primer tests with normalzing genes and potential gene targets.
FastQ Read Alignment and Quantification - P.generosa Water Metagenomic Libraries to MetaGeneMark Assembly with Hisat2 on Mox
Continuing working on the manuscript for this data, Emma wanted the number of reads aligned to each gene. I previously created and assembly with genes/proteins using MetaGeneMark on 20190103, but the assemby process didn’t output any sort of stastics on read counts.
Primer Design and In-Silico Testing - Geoduck Reproduction Primers
Shelly asked that I re-run the primer design pipeline that Kaitlyn had previously run to design a set of reproduction-related qPCR primers. Unfortunately, Kaitlyn’s Jupyter Notebook wasn’t backed up and she accidentally deleted it, I believe, so there’s no real record of how she designed the primers. However, I do know that she was unable to run the EMBOSS primersearch tool, which will check your primers against a set of sequences for any other matches. This is useful for confirming specificity.
qPCR - Testing P.generosa Reproduction-related Primers
Ran some qPCRs on some other primers on 20200723 and then Shelly has asked me to test some additional qPCR primers that might have acceptable melt curves and be usable as normalizing genes.
SRA Submission - P.generosa Metagenomics Data
Added our P.generosa metagenomics sequencing data to NCBI sequencing read archive (SRA).
qPCR - Testing P.generosa Reproduction-related Primers
Shelly has asked me to test some qPCR primers related to geoduck reproduction.
DNA Isolation and Quantification - C.gigas Diploid (Ronit) and Triploid (Nisbet)
Isolated some gDNA from the triploid Nisbet oysters we received on 20200218 and one of Ronit’s diploid ctenidia samples (Google Sheet) using the E.Z.N.A. Mollusc DNA Kit (Omega). See the “Results” section for sample info.
Metagenomics - Data Extractions Using MEGAN6
Decided to finally take the time to methodically extract data from our metagenomics project so that I have the tables handy when I need them and I can easily share them with other people. Previously, I hadn’t done this due to limitations on looking at the data remotely. I finally downloaded all of the RMA6 files from 20191014 after being fed up with the remote desktop connection and upgrading the size of my hard drive (5 of the six RMA6 files are >40GB in size).
Transcriptome Annotation - C.bairdi Transcriptomes v2.1 and v3.1 Using DIAMOND BLASTx on Mox
Decided to annotate the two C.bairdi transcriptomes , cbai_transcriptome_v2.1
and cbai_transcriptome_v3.1
, generated on 20200605 using DIAMOND BLASTx on Mox.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi Transcriptome v3.1
Continuing to try to identify the best C.bairdi transcriptome, we decided to extract all non-dinoflagellate sequences from cbai_transcriptome_v2.0
(RNAseq shorthand: 2018, 2019, 2020-GW, 2020-UW) and cbai_transcriptome_v3.0
(RNAseq shorthand: 2018, 2019, 2020-UW).
Transcriptome Assessment - BUSCO Metazoa on C.bairdi Transcriptome v2.1
Continuing to try to identify the best C.bairdi transcriptome, we decided to extract all non-dinoflagellate sequences from cbai_transcriptome_v2.0
(RNAseq shorthand: 2018, 2019, 2020-GW, 2020-UW) and cbai_transcriptome_v3.0
(RNAseq shorthand: 2018, 2019, 2020-UW).
Sequence Extractions - C.bairdi Transcriptomes v2.0 and v3.0 Excluding Alveolata with MEGAN6 on Swoose
Continuing to try to identify the best C.bairdi transcriptome, we decided to extract all non-dinoflagellate sequences from cbai_transcriptome_v2.0
(RNAseq shorthand: 2018, 2019, 2020-GW, 2020-UW) and cbai_transcriptome_v3.0
(RNAseq shorthand: 2018, 2019, 2020-UW). Both of these transcriptomes were assembled without any taxonomic filter applied. DIAMOND BLASTx and conversion to MEGAN6 RMA6 files was performed yesterday (20200604).
Transcriptome Annotation - C.bairdi Transcriptomes v2.0 and v3.0 with DIAMOND BLASTx on Mox
Continuing to try to identify the best C.bairdi transcriptome, we decided to extract all non-dinoflagellate sequences from cbai_transcriptome_v2.0
(RNAseq shorthand: 2018, 2019, 2020-GW, 2020-UW) and cbai_transcriptome_v3.0
(RNAseq shorthand: 2018, 2019, 2020-UW). Both of these transcriptomes were assembled without any taxonomic filter applied.
Transcriptome Comparison - C.bairdi Transcriptomes Evaluations with DETONATE on Mox
Transcriptome Comparison - C.bairdi Transcriptomes Compared with DETONATE on Mox
We’ve produced a number of C.bairdi transcriptomes and we’re interested in doing some comparisons to try to determine which one might be “best”. I previously compared the BUSCO scores of each of these transcriptomes and now will be using the DETONATE software package to perform two different types of comparisons: compared to a reference (REF-EVAL) and determine an overall quality “score” (RSEM-EVAL). I’ll be running REF-EVAL in this notebook.
Transcriptome Annotation - Trinotate C.bairdi Transcriptome-v1.7 on Mox
After creating a de novo assembly of C.bairdi transcriptome v1.7 on 20200527, performing BLASTx annotation on 202000527, and TransDecoder for ORF identification on 20200527, I continued the annotation process by running Trinotate.
Transcriptome Comparisons - C.bairdi BUSCO Scores
Since we’ve generated a number of versions of the C.bairdi transcriptome, we’ve decided to compare them using various metrics. Here, I’ve compared the BUSCO scores generated for each transcriptome using BUSCO’s built-in plotting script. The script generates a stacked bar plot of all BUSCO short summary files that it is provided with, as well as the R code used to generate the plot.
TransDecoder - C.bairdi Transcriptome v1.7 on Mox
Need to run TransDecoder on Mox on the C.bairdi transcriptome v1.7 from 20200527.
Transcriptome Annotation - C.bairdi Transcriptome v1.7 Using DIAMOND BLASTx on Mox
As part of annotating cbai_transcriptome_v1.7.fasta from 20200527, I need to run DIAMOND BLASTx to use with Trinotate.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi Transcriptome v1.7
I previously created a C.bairdi de novo transcriptome assembly v1.7 with Trinity from all our C.bairdi taxonomically filtered pooled RNAseq samples on 20200527 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Assembly - C.bairdi All Pooled Arthropoda-only RNAseq Data with Trinity on Mox
For completeness sake, I wanted to create an additional C.bairdi transcriptome assembly that consisted of Arthropoda only sequences from just pooled RNAseq data (since I recently generated a similar assembly without taxonomically filtered reads on 20200518). This constitutes samples we have designated: 2018, 2019, 2020-UW. A de novo assembly was run using Trinity on Mox. Since all pooled RNAseq libraries were stranded, I added this option to Trinity command.
Transcriptome Annotation - Trinotate C.bairdi Transcriptome-v3.0 on Mox
After performing de novo assembly on all of our Tanner crab RNAseq data (no taxonomic filter applied, either) on 20200518, I continued the annotation process by running Trinotate.
Transcriptome Assembly - P.trituberculatus (Japanese blue crab) NCBI SRA BioProject PRJNA597187 Data with Trinity on Mox
After generating a number of C.bairdi (Tanner crab) transcriptomes, we decided we should compare them to evaluate which to help decide which one should become our “canonical” version. As part of that, the Trinity wiki offers a list of tools that one can use to check the quality of transcriptome assemblies. Some of those require a transcriptome of a related species.
SRA Library Assessment - Determine RNAseq Library Strandedness from P.trituberculatus SRA BioProject PRJNA597187
We’ve produced a number of C.bairid transcriptomes utilizing different assembly approaches (e.g. Arthropoda reads only, stranded libraries only, mixed strandedness libraries, etc) and we want to determine which of them is “best”. Trinity has a nice list of tools to assess the quality of transcriptome assemblies, but most of the tools rely on comparison to a transcriptome of a related species.
Transcriptome Annotation - Trinotate C.bairdi Transcriptome-v1.6 on Mox
After creating a de novo assembly of C.bairdi transcriptome v1.6 on 20200518, performing BLASTx annotation on 202000519, and TransDecoder for ORF identification on 20200519, I continued the annotation process by running Trinotate.
TransDecoder - C.bairdi Transcriptome v1.6 on Mox
Need to run TransDecoder on Mox on the C.bairdi transcriptome v1.6 from 20200518.
TransDecoder - C.bairdi Transcriptome v3.0 from 20200518 on Mox
Need to run TransDecoder on Mox on the C.bairdi transcriptome v3.0 from 20200518.
Transcriptome Annotation - C.bairdi Transcriptome v1.6 Using DIAMOND BLASTx on Mox
As part of annotating cbai_transcriptome_v1.6.fasta from 20200518, I need to run DIAMOND BLASTx to use with Trinotate.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi Transcriptome v1.6
I previously created a C.bairdi de novo transcriptome assembly v1.6 with Trinity from all our C.bairdi taxonomically filtered RNAseq on 20200518 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Annotation - C.bairdi Transcriptome v3.0 Using DIAMOND BLASTx on Mox
As part of annotating cbai_transcriptome_v3.0.fasta from 20200518, I need to run DIAMOND BLASTx to use with Trinotate.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi Transcriptome v3.0
I previously created a C.bairdi de novo transcriptome assembly with Trinity from all our C.bairdi pooled RNAseq (not taxonomically filtered) on 20200518 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Assembly - C.bairdi All Arthropoda-specific RNAseq Data with Trinity on Mox
I realized I hadn’t performed taxonomic read separation from one set of RNAseq data we had. And, since I was on a transcriptome assembly kick, I figured I’d generate another C.bairdi transcriptome that included only Arthropoda-specific sequence data from all of our RNAseq.
Data Wrangling - Arthropoda and Alveolata D26 Pool RNAseq FastQ Extractions
After using MEGAN6 to extract Arthropoda and Alveolata reads from our RNAseq data on 20200114, I had then extracted taxonomic-specific reads and aggregated each into basic Read 1 and Read 2 FastQs to simplify transcriptome assembly for C.bairdi and for Hematodinium. That was fine and all, but wasn’t fully thought through.
Transcriptome Assembly - C.bairdi All Pooled RNAseq Data Without Taxonomic Filters with Trinity on Mox
Steven asked that I assemble a transcriptome with just our pooled C.bairdi RNAseq data (not taxonomically filtered; see the FastQ list file linked in the Results section below). This constitutes samples we have designated: 2018, 2019, 2020-UW. A de novo assembly was run using Trinity on Mox. Since all pooled RNAseq libraries were stranded, I added this option to Trinity command.
Transcriptome Annotation - Trinotate C.bairdi Transcriptome v2.0 from 20200502 on Mox
After performing de novo assembly on all of our Tanner crab RNAseq data (no taxonomic filter applied, either) on 20200502 and performing BLASTx annotation on 20200508, I continued the annotation process by running Trinotate.
TransDecoder - C.bairdi Transcriptome v2.0 from 20200502 on Mox
Need to run TransDecoder on Mox on the C.bairdi transcriptome v2.0 from 20200502.
Transcriptome Annotation - C.bairdi Transcriptome v2.0 Using DIAMOND BLASTx on Mox
As part of annotating the C.bairdi v2.0 transcriptome assembly from 20200502, I need to run DIAMOND BLASTx to use with Trinotate.
Transcriptome Assessment - BUSCO Metazoa on C.bairdi v2.0 Transcriptome
I previously created a C.bairdi de novo transcriptome assembly with Trinity using all existing, unfiltered (i.e. no taxonomic selection) RNAseq data on 20200502 and decided to assess its “completeness” using BUSCO and the metazoa_odb9
database.
Transcriptome Assembly - C.bairdi All RNAseq Data Without Taxonomic Filters with Trinity on Mox
Steven asked that I assemble an unfiltered (i.e. no taxonomic selection) transcriptome with all of our C.bairdi RNAseq data (see the FastQ list file linked in the Results section below). A de novo assembly was run using Trinity on Mox. It should be noted that this assembly is a mixture of stranded/non-stranded library preps.
GO to GOslim - C.bairdi Enriched GO Terms from 20200422 DEGs
After running pairwise comparisons and identify differentially expressed genes (DEGs) on 20200422 and finding enriched gene ontology terms, I decided to map the GO terms to Biological Process GOslims. Additionally, I decided to try another level of comparison (I’m not sure how valid it is), whereby I will count the number of GO terms assigned to each GOslim and then calculate the percentage of GOterms that get assigned to each of the GOslim categories. The idea being that it might help identify Biological Processes that are “favored” in a given set of DEGs. I decided to set up “fancy” pyramid plots to view a given set of GO-GOslims for each DEG comparison.
FastQC-MultiQC - Laura Spencer’s QuantSeq Data
Laura Spencer received her O.lurida QuantSeq data, so I put it through FastQC/MultiQC and put the pertinent info in the nightingales Google Sheet. I also moved the data to /owl/nightingales/O_lurida
, updated the readme file and checksums file. There were 148 individual samples, so I won’t list them all here.
Gene Expression - C.bairdi Pairwise DEG Comparisons with 2019 RNAseq using Trinity-Salmon-EdgeR on Mox
Per a Slack request, Steven asked me to take the Genewize RNAseq data (received 2020318) through edgeR. Ran the analysis using the Trinity differential expression pipeline: