Sam’s Notebook

University of Washington - Fishery Sciences - Roberts Lab

Posts - Page 2 of 145

Transcript Alignments - P.generosa RNA-seq Alignments for lncRNA Identification Using Hisat2 StingTie and gffcompare on Mox

  • 10 min read

This is a continuation of the process for identification of lncRNAs,. I aligned FastQs which were previously trimmed earlier today to our Panopea-generosa-v1.0 genome FastA using HISAT2. I used the HISAT2 genome index created on 20190723, which was created with options to identify exons and splice sites. The GFF used was from 20220323. StringTie was used to identify alternative transcripts, assign expression values, and create expression tables for use with ballgown. The job was run on Mox.

Read More

FastQ Trimming and QC - P.generosa RNA-seq Data from 20220323 on Mox

  • 4 min read

Addressing the update to this GitHub Issue regarding identifying Panopea generosa (Pacific geoduck) long non-coding RNAs (lncRNAs), I used the RNA-seq data from the Nextflow NF-Core RNAseq pipeline run on 20220323. Although that data was supposed to have been trimmed in the Nextflow NF-Core RNA-seq pipeline, the FastQC reports still show adapter contamination and some funky stuff happening at the 5’ end of the reads. So, I’ve opted to trim the “trimmed” files with fastp, using a hard 20bp trim at the 5’ end of all reads. FastQC and MultiQC were run before/after trimming. Job was run on Mox.

Read More

Data Wrangling - Append Gene Ontology Aspect to P.generosa Primary Annotation File

  • 1 min read

Steven tasked me with updating our P.generosa genome annotation file (GitHub Issue) a while back and I finally managed to get it all figured out. Although I wanted to perform most of this using the GSEAbase package (PDF), as this package is geared towards storage/retrieval of gene set data, I eventually decided to abondon this approach due to the time it was taking and my lack of familiarity/understanding of how to manipulate objects in R. Despite that, GSEAbase was still utilized for its very simple use for identifying GOlims (IDs and Terms).

Read More

Sequencing Read Taxonomic Classification - P.verrucosa E5 RNA-seq Using DIAMOND BLASTx and MEGAN daa-meganizer on Mox

  • 5 min read

After some discussion with Steven at Science Hour last week regarding the handling of endosymbiont sequences in the E5 P.verrucosa RNA-seq data, Steven thought it would be interesting to run the RNA-seq reads through MEGAN6 just to see what the taxonomic breakdown looks like. We may or may not (probably not) separating reads based on taxonomy. In the meantime, we’ll still proceed with HISAT2 alignments to the respective genomes as a means to separate the endosymbiont reads from the P.verrucosa reads.

Read More

Data Wrangling - C.goreaui Genome GFF to GTF Using gffread

  • ~1 min read

As part of getting these three coral species genome files (GitHub Issue) added to our Lab Handbook Genomic Resources page, I also need to get the coral endosymbiont sequence. After talking with Danielle Becker in Hollie Putnam’s Lab at Univ. of Rhode Island, she pointed me to the Cladocopium goreaui genome from Chen et. al, 2022 available here. Access to the genome requires agreeing to some licensing provisions (primarily the requirment to cite the publication whenever the genome is used), so I will not be providing any public links to the file. In order to index the Cladocopium goreaui genome file (Cladocopium_goreaui_genome_fa) using HISAT2 for downstream isoform analysis using StringTie and ballgown, I need a corresponding GTF to also identify exon/intro splice sites. Since a GTF file is not available, but a GFF file is, I needed to convert the GFF to GTF. Used gffread to do this on my computer. Process is documented in Jupyter Notebook linked below.

Read More

Transcript Identification and Alignments - P.verrucosa RNA-seq with Pver_genome_assembly_v1.0 Using HiSat2 and Stringtie on Mox

  • 19 min read

After getting the RNA-seq data trimmed, it was time to perform alignments and determine expression levels of transcripts/isoforms using with HISAT2 and StringTie, respectively. StringTie was set to output tables formatted for import into ballgown. After those two analyses were complete, I ran gffcompare, using the merged StringTie GTF and the input GFF3. I caught this in one of Danielle Becker’s scripts and thought it might be interesting. The analsyes were run on Mox.

Read More

FastQ Trimming and QC - P.verrucosa RNA-seq Data from Danielle Becker in Hollie Putnam Lab Using fastp FastQC and MultiQC on Mox

  • 5 min read

After receiving the P.verrucosa RNA-seq data from Danielle Becker (Hollie Putnam’s Lab, Univ. of Rhode Island), I noticed that the trimmed reads didn’t appear to actually be trimmed. There was still adapter contamination (solely in R2 reads - suggesting the detect_adapter_for_pe option had been omitted from the fastp command?), but the reads had an average read length of 150bp - except when looking at the adapter content report!!??.

Read More