Sam’s Notebook

University of Washington - Fishery Sciences - Roberts Lab

Posts

Data Wrangling - Create Primary P.generosa Genome Annotation File

  • 1 min read

Steven asked me to create a canonical genome annotation file (GitHub Issue). I needed/wanted to create a file containing gene IDs, SwissProt (SP) IDs, gene names, gene descriptions, and gene ontology (GO) accessions. To do so, I utilized the NCBI BLAST and DIAMOND BLAST annotations generated by our GenSas P.generosa genome annotation. Per Steven’s suggestion, I used the best match (i.e. lowest e-value) for any given gene between the two files.

Read More

Server Maintenance - Fix Server Certificate Authentication Issues

  • 2 min read

We had been encounterings issues when linking to images in GitHub (e.g. notebooks, Issues/Discussions) hosted on our servers (primarily Gannet). Images always showed up as broken links and, with some work, we could see an error message related to server authentication. More recently, I also noticed that Jupyter Notebooks hosted on our servers could not be viewed in NB Viewer. Attempting to view a Jupyter Notebook hosted on one of our servers results in a 404 error, with a note regarding server certificate problems. Finally, the most annoying issue was encountered when running the shell programs wget to retrieve files from our servers. This program always threw an error regarding our server certificates. The only way to run wget without this error was to add the option --no-check-certificate (which, thankfully, was a suggestion by wget error message).

Read More

Data Wrangling - P.generosa Genomic Feature FastA Creation

  • 1 min read

Steven wanted me to generate FastA files (GitHub Issue) for Panopea generosa (Pacific geoduck) coding sequences (CDS), genes, and mRNAs. One of the primary needs, though, was to have an ID that could be used for downstream table joining/mapping. I ended up using a combination of GFFutils and bedtools getfasta. I took advantage of being able to create a custom name column in BED files to generate the desired FastA description line having IDs that could identify, and map, CDS, genes, and mRNAs across FastAs and GFFs.

Read More

Differential Gene Expression - P.generosa DGE Between Tissues Using Nextlow NF-Core RNAseq Pipeline on Mox

  • 7 min read

Steven asked that I obtain relative expression values for various geoduck tissues (GitHub Issue). So, I decided to use this as an opportunity to try to use a Nextflow pipeline. There’s an RNAseq pipeline, NF-Core RNAseq which I decided to use. The pipeline appears to be ridiculously thorough (e.g. trims, removes gDNA/rRNA contamination, allows for multiple aligners to be used, quantifies/visualizes feature assignments by reads, performs differential gene expression analysis and visualization), all in one package. Sounds great, but I did have some initial problems getting things up and running. Overall, getting things set up to actually run took longer than the actual pipeline run! Oh well, it’s a learning process, so that’s not totally unexpected.

Read More

Data Analysis - C.virginica RNAseq Zymo ZR4059 Analyzed by ZymoResearch

  • 2 min read

After realizing that the Crassostrea virginica (Eastern oyster) RNAseq data had relatively low alignment rates (see this notebook entry from 20220224 for a bit more background), I contacted ZymoResearch to see if they had any insight on what might be happening. I suspected rRNA contamination. ZymoResearch was kind enough to run the RNAseq data through their pipeline and provided us. This notebook entry provides a brief overview and thoughts on the report.

Read More