Sam’s Notebook

University of Washington - Fishery Sciences - Roberts Lab

Posts

Data Wrangling - C.virginica NCBI GCF_002022765.2 GFF to Gene and Pseudogene Combined BED File

  • ~1 min read

Working on the CEABIGR project, I was preparing to make a gene expression file to use in CIRCOS (GitHub Issue) when I realized that the Ballgown gene expression file (CSV; GitHub) had more genes than the C.virginica genes BED file we were using. After some sleuthing, I discovered that the discrepancy was caused by the lack of pseudogenes in the genes BED file I was using. Although it might not really have any impact on things, I thought it would still be prudent to have a BED file that completely matched all of the genes in the Ballgown gene expression file. Plus, having the pseudogenes might be of longterm usefulness if we we ever decide to evalute the role of long non-coding RNAs (lncRNAs) in this project.

Read More

Data Wrangling - Identify C.virginica Genes with Different Predominant Isoforms for CEABIGR

  • ~1 min read

During today’s discussion, Yaamini recommended we generate a list of genes with different predominant isoforms between females and males, while also adding a column with a binary indicator (e.g. 0 or 1) to mark those genes which were not different (0) or were different (1) between sexes. Steven had already generated files identifying predominant isoforms in each sex:

Read More

RNAseq Alignments - P.generosa Alignments and Alternative Transcript Identification Using Hisat2 and StringTie on Mox

  • 15 min read

As part of identifying long non-coding RNA (lncRNA) in Pacific geoduck(GitHub Issue), one of the first things that I wanted to do was to gather all of our geoduck RNAseq data and align it to our geoduck genome. In addition to the alignments, some of the examples I’ve been following have also utilized expression levels as one aspect of the lncRNA selection criteria, so I figured I’d get this info as well.

Read More