Posts
Project Summary - C.virginica CEABiGR - Female vs. Male Gonad Exposed to OA
This will be a “dynamic” notebook entry, whereby I will update this post continually as I process new samples, analyze new data, etc for this project. The hope is to make it easier to find all the work I’ve done for this without having to search my notebook to find individual notebook entries.
Data Wrangling - Create Primary P.generosa Genome Annotation File
Steven asked me to create a canonical genome annotation file (GitHub Issue). I needed/wanted to create a file containing gene IDs, SwissProt (SP) IDs, gene names, gene descriptions, and gene ontology (GO) accessions. To do so, I utilized the NCBI BLAST and DIAMOND BLAST annotations generated by our GenSas P.generosa genome annotation. Per Steven’s suggestion, I used the best match (i.e. lowest e-value
) for any given gene between the two files.
RNA Isolation - O.nerka Berdahl Brain Tissues
Server Maintenance - Fix Server Certificate Authentication Issues
We had been encounterings issues when linking to images in GitHub (e.g. notebooks, Issues/Discussions) hosted on our servers (primarily Gannet). Images always showed up as broken links and, with some work, we could see an error message related to server authentication. More recently, I also noticed that Jupyter Notebooks hosted on our servers could not be viewed in NB Viewer. Attempting to view a Jupyter Notebook hosted on one of our servers results in a 404 error, with a note regarding server certificate problems. Finally, the most annoying issue was encountered when running the shell programs wget
to retrieve files from our servers. This program always threw an error regarding our server certificates. The only way to run wget
without this error was to add the option --no-check-certificate
(which, thankfully, was a suggestion by wget
error message).
Nextflow - Trials and Tribulations of Installing and Using NF-Core RNAseq
INSTALLATION
Data Wrangling - P.generosa Genomic Feature FastA Creation
Steven wanted me to generate FastA files (GitHub Issue) for Panopea generosa (Pacific geoduck) coding sequences (CDS), genes, and mRNAs. One of the primary needs, though, was to have an ID that could be used for downstream table joining/mapping. I ended up using a combination of GFFutils and bedtools getfasta
. I took advantage of being able to create a custom name
column in BED files to generate the desired FastA description line having IDs that could identify, and map, CDS, genes, and mRNAs across FastAs and GFFs.
Differential Gene Expression - P.generosa DGE Between Tissues Using Nextlow NF-Core RNAseq Pipeline on Mox
Steven asked that I obtain relative expression values for various geoduck tissues (GitHub Issue). So, I decided to use this as an opportunity to try to use a Nextflow pipeline. There’s an RNAseq pipeline, NF-Core RNAseq which I decided to use. The pipeline appears to be ridiculously thorough (e.g. trims, removes gDNA/rRNA contamination, allows for multiple aligners to be used, quantifies/visualizes feature assignments by reads, performs differential gene expression analysis and visualization), all in one package. Sounds great, but I did have some initial problems getting things up and running. Overall, getting things set up to actually run took longer than the actual pipeline run! Oh well, it’s a learning process, so that’s not totally unexpected.
Data Analysis - C.virginica BSseq Unmapped Reads Using MEGAN6
After performing DIAMOND BLASTx and DAA “meganization” on 20220302, the next step was to import the DAA files into MEGAN6 for analyzing the resulting taxonomic assignments of the Crassostrea virginica (Eastern oyster) unmapped BSseq reads that Steven generated.
Data Analysis - C.virginica RNAseq Zymo ZR4059 Analyzed by ZymoResearch
After realizing that the Crassostrea virginica (Eastern oyster) RNAseq data had relatively low alignment rates (see this notebook entry from 20220224 for a bit more background), I contacted ZymoResearch to see if they had any insight on what might be happening. I suspected rRNA contamination. ZymoResearch was kind enough to run the RNAseq data through their pipeline and provided us. This notebook entry provides a brief overview and thoughts on the report.
Taxonomic Assignment - C.virginica BSseq Unmapped Reads Using DIAMOND BLASTx and MEGAN6 on Mox
After mapping bisulfite sequencing (BSseq) data to the Crassostrea virginica (Eastern oyster) genome, Steven noticed that there were a large number of unmapped reads. He asked that I attempt to taxonomically claissify the unmapped reads (GitHub Issue), with the idea that maybe these reads could provide additional data on an associated microbiome (GitHub Discussion).