Projects

Miscellaneous

lncRNA Identification - P.generosa lncRNAs using CPC2 and bedtools

  • ~1 min read

After trimming P.generosa RNA-seq reads on 20230426 and then aligning and annotating them to the Panopea-generosa-v1.0 genome on 20230426, I proceeded with the final step of lncRNA identification. To do this, I used Zach’s notebook entry on lncRNA identification for guidance. I utilized the annotated GTF generated by gffcompare during the alignment/annotation step on 20230426. I used ‘bedtools getfasta](https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html) and [CPC2` with an aribtrary 200bp minimum length to identify lncRNAs. All of this was done in a Jupyter Notebook (links below).

Read More

Containers - Apptainer Explorations

  • 6 min read

At some point, our HPC nodes on Mox will be retired. When that happens, we’ll likely purchase new nodes on the newest UW cluster, Klone. Additionally, the coenv nodes are no longer available on Mox. One was decommissioned and one was “migrated” to Klone. The primary issue at hand is that the base operating system for Klone appears to be very, very basic. I’d previously attempted to build/install some bioinformatics software on Klone, but could not due to a variety of missing libraries; these libraries are available by default on Mox… Part of this isn’t surprising, as UW IT has been making a concerted effort to get users to switch to containerization - specifically using Apptainer (formerly Singularity) containers.

Read More

Transcript Alignments - P.generosa RNA-seq Alignments for lncRNA Identification Using Hisat2 StingTie and gffcompare on Mox

  • 10 min read

This is a continuation of the process for identification of lncRNAs,. I aligned FastQs which were previously trimmed earlier today to our Panopea-generosa-v1.0 genome FastA using HISAT2. I used the HISAT2 genome index created on 20190723, which was created with options to identify exons and splice sites. The GFF used was from 20220323. StringTie was used to identify alternative transcripts, assign expression values, and create expression tables for use with ballgown. The job was run on Mox.

Read More

FastQ Trimming and QC - P.generosa RNA-seq Data from 20220323 on Mox

  • 4 min read

Addressing the update to this GitHub Issue regarding identifying Panopea generosa (Pacific geoduck) long non-coding RNAs (lncRNAs), I used the RNA-seq data from the Nextflow NF-Core RNAseq pipeline run on 20220323. Although that data was supposed to have been trimmed in the Nextflow NF-Core RNA-seq pipeline, the FastQC reports still show adapter contamination and some funky stuff happening at the 5’ end of the reads. So, I’ve opted to trim the “trimmed” files with fastp, using a hard 20bp trim at the 5’ end of all reads. FastQC and MultiQC were run before/after trimming. Job was run on Mox.

Read More

Data Wrangling - Append Gene Ontology Aspect to P.generosa Primary Annotation File

  • 1 min read

Steven tasked me with updating our P.generosa genome annotation file (GitHub Issue) a while back and I finally managed to get it all figured out. Although I wanted to perform most of this using the GSEAbase package (PDF), as this package is geared towards storage/retrieval of gene set data, I eventually decided to abondon this approach due to the time it was taking and my lack of familiarity/understanding of how to manipulate objects in R. Despite that, GSEAbase was still utilized for its very simple use for identifying GOlims (IDs and Terms).

Read More

Data Wrangling - P.verrucosa Genome GFF to GTF Using gffread

  • ~1 min read

As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the P.verrucosa genome file (Pver_genome_assembly_v1.0.fasta) using HISAT2, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread to do this on my computer. Process is documented in Jupyter Notebook linked below.

Read More

Data Wrangling - M.capitata Genome GFF to GTF Using gffread

  • ~1 min read

As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the M.capitata genome file (Montipora_capitata_HIv3.assembly.fasta) using HISAT2, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread to do this on my computer. Process is documented in Jupyter Notebook linked below.

Read More

Data Wrangling - P.acuta Genome GFF to GTF Conversion Using gffread

  • ~1 min read

As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I will index the P.acuta genome file using HISAT2, but need a GTF file to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread to do this on my computer. Process is documented in Jupyter Notebook linked below.

Read More

Data Wrangling - C.virginica NCBI GCF_002022765.2 GFF to Gene and Pseudogene Combined BED File

  • ~1 min read

Working on the CEABIGR project, I was preparing to make a gene expression file to use in CIRCOS (GitHub Issue) when I realized that the Ballgown gene expression file (CSV; GitHub) had more genes than the C.virginica genes BED file we were using. After some sleuthing, I discovered that the discrepancy was caused by the lack of pseudogenes in the genes BED file I was using. Although it might not really have any impact on things, I thought it would still be prudent to have a BED file that completely matched all of the genes in the Ballgown gene expression file. Plus, having the pseudogenes might be of longterm usefulness if we we ever decide to evalute the role of long non-coding RNAs (lncRNAs) in this project.

Read More

RNAseq Alignments - P.generosa Alignments and Alternative Transcript Identification Using Hisat2 and StringTie on Mox

  • 15 min read

As part of identifying long non-coding RNA (lncRNA) in Pacific geoduck(GitHub Issue), one of the first things that I wanted to do was to gather all of our geoduck RNAseq data and align it to our geoduck genome. In addition to the alignments, some of the examples I’ve been following have also utilized expression levels as one aspect of the lncRNA selection criteria, so I figured I’d get this info as well.

Read More

qPCR - Repeat of Mussel Gill Heat Stress cDNA with Ferritin Primers

  • ~1 min read

My previous qPCR on these cDNA using ferritin primers (SRIDs: 1808, 1809) resulted in no amplification. This was a bit surprising and makes me suspect that I screwed up somewhere (not adding primer(s)??), so I decided to repeat the qPCR. I made fresh working primer stocks and used 1uL of cDNA for each reaction. All reactions were run in duplicate on our CFX Connect thermalcycler (BioRad) with SsoFast EVAgreen Master Mix (BioRad). See my previous post linked above for qPCR master mix calcs.

Read More

Data Wrangling - Create Primary P.generosa Genome Annotation File

  • 1 min read

Steven asked me to create a canonical genome annotation file (GitHub Issue). I needed/wanted to create a file containing gene IDs, SwissProt (SP) IDs, gene names, gene descriptions, and gene ontology (GO) accessions. To do so, I utilized the NCBI BLAST and DIAMOND BLAST annotations generated by our GenSas P.generosa genome annotation. Per Steven’s suggestion, I used the best match (i.e. lowest e-value) for any given gene between the two files.

Read More

Data Wrangling - P.generosa Genomic Feature FastA Creation

  • 1 min read

Steven wanted me to generate FastA files (GitHub Issue) for Panopea generosa (Pacific geoduck) coding sequences (CDS), genes, and mRNAs. One of the primary needs, though, was to have an ID that could be used for downstream table joining/mapping. I ended up using a combination of GFFutils and bedtools getfasta. I took advantage of being able to create a custom name column in BED files to generate the desired FastA description line having IDs that could identify, and map, CDS, genes, and mRNAs across FastAs and GFFs.

Read More

Differential Gene Expression - P.generosa DGE Between Tissues Using Nextlow NF-Core RNAseq Pipeline on Mox

  • 7 min read

Steven asked that I obtain relative expression values for various geoduck tissues (GitHub Issue). So, I decided to use this as an opportunity to try to use a Nextflow pipeline. There’s an RNAseq pipeline, NF-Core RNAseq which I decided to use. The pipeline appears to be ridiculously thorough (e.g. trims, removes gDNA/rRNA contamination, allows for multiple aligners to be used, quantifies/visualizes feature assignments by reads, performs differential gene expression analysis and visualization), all in one package. Sounds great, but I did have some initial problems getting things up and running. Overall, getting things set up to actually run took longer than the actual pipeline run! Oh well, it’s a learning process, so that’s not totally unexpected.

Read More

Data Analysis - C.virginica RNAseq Zymo ZR4059 Analyzed by ZymoResearch

  • 2 min read

After realizing that the Crassostrea virginica (Eastern oyster) RNAseq data had relatively low alignment rates (see this notebook entry from 20220224 for a bit more background), I contacted ZymoResearch to see if they had any insight on what might be happening. I suspected rRNA contamination. ZymoResearch was kind enough to run the RNAseq data through their pipeline and provided us. This notebook entry provides a brief overview and thoughts on the report.

Read More

Transcript Identification and Alignments - C.virginica RNAseq with NCBI Genome GCF_002022765.2 Using Hisat2 and Stringtie on Mox

  • 14 min read

After an additional round of trimming yesterday, I needed to identify alternative transcripts in the Crassostrea virginica (Eastern oyster) gonad RNAseq data we have. I previously used HISAT2 to index the NCBI Crassostrea virginica (Eastern oyster) genome and identify exon/splice sites on 20210720. Then, I used this genome index to run StringTie on Mox in order to map sequencing reads to the genome/alternative isoforms.

Read More

Trimming - Additional 20bp from C.virginica Gonad RNAseq with fastp on Mox

  • 6 min read

When I previously aligned trimmed RNAseq reads to the NCBI C.virginica genome (GCF_002022765.2) on 20210726, I specifically noted that alignment rates were consistently lower for males than females. However, I let that discrepancy distract me from a the larger issue: low alignment rates. Period! This should have thrown some red flags and it eventually did after Steven asked about overall alignment rate for an alignment of this data that I performed on 20220131 in preparation for genome-guided transcriptome assembly. The overall alignment rate (in which I actually used the trimmed reads from 20210714) was ~67.6%. Realizing this was a on the low side of what one would expect, it prompted me to look into things more and I came across a few things which led me to make the decision to redo the trimming:

Read More

Data Wrangling - C.virginica lncRNA Extractions from NCBI GCF_002022765.2 Using GffRead

  • ~1 min read

Continuing to work on our Crassostrea virginica (Eastern oyster) project examining the effects of OA on female and male gonads (GitHub repo), Steven tasked me with parsing out long, non-coding RNAs (GitHub Issue). To do so, I relied on the NCBI genome and associated files/annotations. I used GffRead, GFFutils, and samtools. The process was documented in the followng Jupyter Notebook:

Read More

Transcriptome Assembly - Genome-guided C.virginica Adult Gonad OA RNAseq Using Trinity on Mox

  • 4 min read

As part of this project, Steven’s asked that I identify long, non-coding RNAs (lncRNAs) (GitHub Issue) in the Crassostrea virginica (Eastern oyster) adult OA gonad RNAseq data we have. The initial step for this is to assemble transcriptome. I generated the necessary BAM alignment on 20220131. Next was to actually get the transcriptome assembled. I followed the Trinity genome-guided procedure.

Read More

RNAseq Alignment - C.virginica Adult OA Gonad Data to GCF_002022765.2 Genome Using HISAT2 on Mox

  • 5 min read

As part of this project, Steven’s asked that I identify long, non-coding RNAs (lncRNAs) (GitHub Issue) in the Crassostrea virginica (Eastern oyster) adult OA gonad RNAseq data we have. The initial step for this is to assemble transcriptome. Since there is a published genome (NCBI RefSeq GCF_002022765.2C_virginica-3.0)](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/022/765/GCF_002022765.2_C_virginica-3.0/) for [_Crassostrea virginica (Eastern oyster), I will perform a genome-guided assembly using Trinity. That process requires a sorted BAM file as input. In order to generate that file, I used HISAT2. I’ve already generated the necessary HISAT2 genome index files (as of 20210720), which also identified/incorporated splice sites and exons, which the HISAT2 alignment process requires to run.

Read More

Data Wrangling - C.virginica Gonad RNAseq Transcript Counts Per Gene Per Sample Using Ballgown

  • ~1 min read

As we continue to work on the analysis of impacts of OA on Crassostrea virginica (Eastern oyster) gonads via DNA methylation and RNAseq (GitHub repo), we decided to compare the number of transcripts expressed per gene per sample (GitHub Issue). As it turns out, it was quite the challenge. Ultimately, I wasn’t able to solve it myself, and turned to StackOverflow for a solution. I should’ve just done this at the beginning, as I got a response (and solution) less than five minutes after posting! Regardless, the data wrangling progress (struggle?) was documented in the following GitHub Discussion:

Read More

RNA Isolation - M.trossulus Gill and Phenol Gland

  • 2 min read

As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):

Read More

RNA Isolation - M.trossulus Gill and Phenol Gland

  • 1 min read

As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):

Read More

RNA Isolation - M.trossulus Gill and Phenol Gland

  • 2 min read

As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):

Read More

RNA Isolation - M.trossulus Phenol Gland and Gill

  • 2 min read

As part of a mussel project that Matt George has with the Pacific States Marine Fisheries Commission (PSMFC), I’m helping by continuing isolating RNA from a relatively large number of samples. The samples are listed/described in this GitHub Issue. Today, I isolated RNA from the following samples (the “F” indicates “foot”, “PG” indicates “phenol gland”, and “G” indicates “gill” tissues):

Read More

RNA Isolation - O.nerka Berdahl Tissues

  • 3 min read

Finally got around to tackling this GitHub issue regarding isolating RNA from some Oncorhynchus nerka (sockeye salmon) tissues we have from Andrew Berdahl’s lab (a UW SAFS professor) to use for RNAseq and/or qPCR. We have blood, brain, gonad, and liver samples from individual salmon from two different groups: territorial and social individuals. We’ve decided to isolate RNA from brain, gonads, and liver from two individuals within each group. All samples are preserved in RNAlater and stored @ -80oC.

Read More

Data Wrangling - C.virginica NCBI GCF_002022765.2 GFF to Gene BED File

  • 1 min read

When working to identify differentially expressed transcripts (DETs) and genes (DEGs) for our Crassostrea virginica (Eastern oyster) RNAseq/DNA methylation comparison of changes across sex and ocean acidification conditions (https://github.com/epigeneticstoocean/2018_L18-adult-methylation), I realized that the DEG tables I was generating had excessive gene counts due to the fact that the analysis (and, in turn, the genome coordinates), were tied to transcripts. Thus, genes were counted multiple times due to the existence of multiple transcripts for a given gene, and the analysis didn’t list gene coordinate data - only transcript coordinates.

Read More

Differential Transcript Expression - C.virginica Gonad RNAseq Using Ballgown

  • 1 min read

In preparation for differential transcript analysis, I previously ran our RNAseq data through StringTie on 20210726 to identify and quantify transcripts. Identification of differentially expressed transcripts (DETs) and genes (DEGs) will be performed using ballgown. This notebook entry will be different than most others, as this notebook entry will simply serve as a “landing page” to access/review the analysis; as the analysis will evolve over time and won’t exist as a single computing job with a definitive endpoint.

Read More

Read Mapping - 10x-Genomics Trimmed FastQ Mapped to P.generosa v1.0 Assembly Using Minimap2 for BlobToolKit on Mox

  • 2 min read

To continue towards getting our Panopea generosa (Pacific geoduck) genome assembly (v1.0) analyzed with BlobToolKit, per this GitHub Issue, I’ve decided to run each aspect of the pipeline manually, as I continue to have issues utilizing the automatic pipeline. As such, I’ve run minimap2 according to the BlobToolKit “Getting Started” guide on Mox. This will map the trimmed 10x-Genomics reads from 20210401 to the Panopea-generosa-v1.0.fa assembly (FastA; 914MB).

Read More

FastQC-MultiQc - C.gigas Ploidy pH WGBS Raw Sequence Data from Haws Lab on Mox

  • 2 min read

Yesterday (20201205), we received the whole genome bisulfite sequencing (WGBS) data back from ZymoResearch from the 24 C.gigas diploid/triploid subjected to two different pH treatments (received from the Haws’ Lab on 20200820 that we submitted to ZymoResearch on 20200824. As part of our standard sequencing data receipt pipeline, I needed to generate FastQC files for each sample.

Read More

Transcriptome Assessment - Crustacean Transcripome Completeness Evaluation Using BUSCO on Mox

  • 4 min read

Grace was recently working on writing up a manuscript which did a basic comparison of our C.bairdi transcriptome (cbai_transcriptome_v3.1) (see the Genomic Resources wiki for more deets) to two other species’ transcriptome assemblies. We wanted BUSCO evaluations as part of this comparison, but the two other species did not have BUSCO scores in their respective publications. As such, I decided to generate them myself, as BUSCO runs very quickly. The job was run on Mox.

Read More

MBD Selection - M.magister Sheared Gill gDNA 16 of 24 Samples Set 3 of 3

  • 1 min read

Click here for notebook on the first eight samples processed. Click here for the second set of eight samples processed. M.magister (Dungeness crab) gill gDNA provided by Mackenzie Gavery was previously sheared on 20201026 and three samples were subjected to additional rounds of shearing on 20201027, in preparation for methyl bidning domain (MBD) selection using the MethylMiner Kit (Invitrogen).

Read More

Trimming - Shelly S.salar RNAseq Using fastp and MultiQC on Mox

  • 3 min read

Shelly asked that I trim, align to a genome, and perform transcriptome alignment counts in this GitHub issue with some Salmo salar RNAseq data she had and, using a subset of the NCBI Salmo salar RefSeq genome, GCF_000233375.1. She created a subset of this genome using only sequences designated as “chromosomes.” A link to the FastA (and a link to her notebook on creating this file) are in that GitHub issue link above. The transcriptome she has provided has not been subsetted in a similar fashion; maybe I’ll do that prior to alignment.

Read More

DNA Shearing - M.magister CH05-21 gDNA Full Shearing Test and Bioanalyzer

  • 2 min read

Yesterday, I did some shearing of Metacarcinus magister gill gDNA on a test sample (CH05-21) to determine how many cycles to run on the sonicator (Bioruptor 300; Diagenode) to achieve an average fragment length of ~350 - 500bp in preparation for MBD-BSseq. The determination from yesterday was 70 cycles (30s ON, 30s OFF; low intensity). That determination was made by first sonicating for 35 cycles, followed by successive rounds of 5 cycles each. I decided to repeat this, except by doing it in a single round of sonication.

Read More

DNA Shearing - M.magister gDNA Shear Testing and Bioanalyzer

  • 1 min read

Steven assigned me to do some MBD-BSseq library prep (GitHub Issue) for some Dungeness crab (Metacarcinus magister) DNA samples provided by Mackenzie Gavery. The DNA was isolated from juvenile (J6/J7 developmental stages) gill tissue. One of the first steps in MBD-BSseq is to fragment DNA to a desired size (~350 - 500bp in our case). However, we haven’t worked with Metacarcinus magister DNA previously, so I need to empirically determine sonicator (Bioruptor 300; Diagenode) settings for these samples.

Read More

Read Mapping - C.bairdi 201002558-2729-Q7 and 6129-403-26-Q7 Taxa-Specific NanoPore Reads to cbai_genome_v1.01.fasta Using Minimap2 on Mox

  • 2 min read

After extracting FastQ reads using seqtk on 20201013 from the various taxa I had been interested in, the next thing needed doing was mapping reads to the cbai_genome_v1.01 “genome” assembly from 20200917. I found that Minimap2 will map long reads (e.g. NanoPore), in addition to short reads, so I decided to give that a rip.

Read More

Data Wrangling - C.bairdi NanoPore Reads Extractions With Seqtk on Mephisto

  • 1 min read

In my pursuit to identify which contigs/scaffolds of our C.bairdi” genome assembly from 20200917 correspond to interesting taxa, based on taxonomic assignments produced by MEGAN6 on 20200928, I used MEGAN6 to extract taxa-specific reads from cbai_genome_v1.01 on 20201007 - the output is only available in FastA format. Since I want the original reads in FastQ format, I will use the FastA sequence IDs (from the FastA index file) and provide that to seqtk to extract the FastQ reads for each sample and corresponding taxa.

Read More

Taxonomic Assignments - C.bairdi 6129-403-26-Q7 NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on emu

  • 3 min read

After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.

Read More

Taxonomic Assignments - C.bairdi 20102558-2729-Q7 NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on emu

  • 3 min read

After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.

Read More

Data Wrangling - C.bairdi NanoPore 6129-403-26 Quality Filtering Using NanoFilt on Mox

  • 2 min read

Last week, I ran all of our Q7-filtered C.baird NanoPore reads through MEGAN6 to evaluate the taxonomic breakdown (on 20200917) and noticed that there were a large quantity of bases assigned to E.canceri (a known microsporidian agent of infection in crabs) and Aquifex sp. (a genus of thermophylic bacteria), in addition to the expected Arthropoda assignments. Notably, Alveolata assignments were remarkably low.

Read More

Data Wrangling - C.bairdi NanoPore 20102558-2729 Quality Filtering Using NanoFilt on Mox

  • 2 min read

Last week, I ran all of our Q7-filtered C.baird NanoPore reads through MEGAN6 to evaluate the taxonomic breakdown (on 20200917) and noticed that there were a large quantity of bases assigned to E.canceri (a known microsporidian agent of infection in crabs) and Aquifex sp. (a genus of thermophylic bacteria), in addition to the expected Arthropoda assignments. Notably, Alveolata assignments were remarkably low.

Read More

Data Wrangling - Subsetting cbai_genome_v1.0 Assembly with faidx

  • 1 min read

Previously assembled cbai_genome_v1.0.fasta with our NanoPore Q7 reads on 20200917 and noticed that there were numerous sequences that were well shorter than the expected 500bp threshold that the assembler (Flye) was supposed to spit out. I created an Issue on the Flye GitHub page to find out why. The developer responded and determined it was an issue with the assembly polisher and that sequences <500bp could be safely ignored.

Read More

DNA Quantification - Re-quant Ronits C.gigas Diploid-Triploid Ctenidia gDNA Submitted to ZymoResearch

  • 1 min read

I received notice from ZymoResearch yesterday afternoon that the DNA we sent on 20200820 for this project (Quote 3534) had insufficient DNA for sequencing for most of the samples. This was, honestly, shocking. I had even submitted well over the minimum amount of DNA required (submitted 1.75ug - only needed 1ug). So, I’m not entirely sure what happened here.

Read More

Primer Design and In-Silico Testing - Geoduck Reproduction Primers

  • 1 min read

Shelly asked that I re-run the primer design pipeline that Kaitlyn had previously run to design a set of reproduction-related qPCR primers. Unfortunately, Kaitlyn’s Jupyter Notebook wasn’t backed up and she accidentally deleted it, I believe, so there’s no real record of how she designed the primers. However, I do know that she was unable to run the EMBOSS primersearch tool, which will check your primers against a set of sequences for any other matches. This is useful for confirming specificity.

Read More

Metagenomics - Data Extractions Using MEGAN6

  • 1 min read

Decided to finally take the time to methodically extract data from our metagenomics project so that I have the tables handy when I need them and I can easily share them with other people. Previously, I hadn’t done this due to limitations on looking at the data remotely. I finally downloaded all of the RMA6 files from 20191014 after being fed up with the remote desktop connection and upgrading the size of my hard drive (5 of the six RMA6 files are >40GB in size).

Read More

Sequence Extractions - C.bairdi Transcriptomes v2.0 and v3.0 Excluding Alveolata with MEGAN6 on Swoose

  • ~1 min read

Continuing to try to identify the best C.bairdi transcriptome, we decided to extract all non-dinoflagellate sequences from cbai_transcriptome_v2.0 (RNAseq shorthand: 2018, 2019, 2020-GW, 2020-UW) and cbai_transcriptome_v3.0 (RNAseq shorthand: 2018, 2019, 2020-UW). Both of these transcriptomes were assembled without any taxonomic filter applied. DIAMOND BLASTx and conversion to MEGAN6 RMA6 files was performed yesterday (20200604).

Read More

Transcriptome Comparison - C.bairdi Transcriptomes Compared with DETONATE on Mox

  • 4 min read

We’ve produced a number of C.bairdi transcriptomes and we’re interested in doing some comparisons to try to determine which one might be “best”. I previously compared the BUSCO scores of each of these transcriptomes and now will be using the DETONATE software package to perform two different types of comparisons: compared to a reference (REF-EVAL) and determine an overall quality “score” (RSEM-EVAL). I’ll be running REF-EVAL in this notebook.

Read More

Transcriptome Assembly - C.bairdi All Pooled Arthropoda-only RNAseq Data with Trinity on Mox

  • 2 min read

For completeness sake, I wanted to create an additional C.bairdi transcriptome assembly that consisted of Arthropoda only sequences from just pooled RNAseq data (since I recently generated a similar assembly without taxonomically filtered reads on 20200518). This constitutes samples we have designated: 2018, 2019, 2020-UW. A de novo assembly was run using Trinity on Mox. Since all pooled RNAseq libraries were stranded, I added this option to Trinity command.

Read More

Transcriptome Assembly - P.trituberculatus (Japanese blue crab) NCBI SRA BioProject PRJNA597187 Data with Trinity on Mox

  • 3 min read

After generating a number of C.bairdi (Tanner crab) transcriptomes, we decided we should compare them to evaluate which to help decide which one should become our “canonical” version. As part of that, the Trinity wiki offers a list of tools that one can use to check the quality of transcriptome assemblies. Some of those require a transcriptome of a related species.

Read More

SRA Library Assessment - Determine RNAseq Library Strandedness from P.trituberculatus SRA BioProject PRJNA597187

  • 3 min read

We’ve produced a number of C.bairid transcriptomes utilizing different assembly approaches (e.g. Arthropoda reads only, stranded libraries only, mixed strandedness libraries, etc) and we want to determine which of them is “best”. Trinity has a nice list of tools to assess the quality of transcriptome assemblies, but most of the tools rely on comparison to a transcriptome of a related species.

Read More

Transcriptome Assembly - C.bairdi All Pooled RNAseq Data Without Taxonomic Filters with Trinity on Mox

  • 2 min read

Steven asked that I assemble a transcriptome with just our pooled C.bairdi RNAseq data (not taxonomically filtered; see the FastQ list file linked in the Results section below). This constitutes samples we have designated: 2018, 2019, 2020-UW. A de novo assembly was run using Trinity on Mox. Since all pooled RNAseq libraries were stranded, I added this option to Trinity command.

Read More

GO to GOslim - C.bairdi Enriched GO Terms from 20200422 DEGs

  • 6 min read

After running pairwise comparisons and identify differentially expressed genes (DEGs) on 20200422 and finding enriched gene ontology terms, I decided to map the GO terms to Biological Process GOslims. Additionally, I decided to try another level of comparison (I’m not sure how valid it is), whereby I will count the number of GO terms assigned to each GOslim and then calculate the percentage of GOterms that get assigned to each of the GOslim categories. The idea being that it might help identify Biological Processes that are “favored” in a given set of DEGs. I decided to set up “fancy” pyramid plots to view a given set of GO-GOslims for each DEG comparison.

Read More