Projects

Miscellaneous

RNAseq Alignments - C.virginica Gonad Data to GCF_002022765.2 Genome Using StringTie on Mox

  • 11 min read

As part of identifying alternative transcripts in the Crassostrea virginica (Eastern oyster) gonad RNAseq data we have, I previously used HISAT2 to index the NCBI Crassostrea virginica (Eastern oyster) genome and identify exon/splice sites on 20210720. Then, I used this genome index to run StringTie on Mox in order to map sequencing reads to the genome/alternative isoforms.

Read More

Read Mapping - 10x-Genomics Trimmed FastQ Mapped to P.generosa v1.0 Assembly Using Minimap2 for BlobToolKit on Mox

  • 2 min read

To continue towards getting our Panopea generosa (Pacific geoduck) genome assembly (v1.0) analyzed with BlobToolKit, per this GitHub Issue, I’ve decided to run each aspect of the pipeline manually, as I continue to have issues utilizing the automatic pipeline. As such, I’ve run minimap2 according to the BlobToolKit “Getting Started” guide on Mox. This will map the trimmed 10x-Genomics reads from 20210401 to the Panopea-generosa-v1.0.fa assembly (FastA; 914MB).

Read More

FastQC-MultiQc - C.gigas Ploidy pH WGBS Raw Sequence Data from Haws Lab on Mox

  • 2 min read

Yesterday (20201205), we received the whole genome bisulfite sequencing (WGBS) data back from ZymoResearch from the 24 C.gigas diploid/triploid subjected to two different pH treatments (received from the Haws’ Lab on 20200820 that we submitted to ZymoResearch on 20200824. As part of our standard sequencing data receipt pipeline, I needed to generate FastQC files for each sample.

Read More

Transcriptome Assessment - Crustacean Transcripome Completeness Evaluation Using BUSCO on Mox

  • 4 min read

Grace was recently working on writing up a manuscript which did a basic comparison of our C.bairdi transcriptome (cbai_transcriptome_v3.1) (see the Genomic Resources wiki for more deets) to two other species’ transcriptome assemblies. We wanted BUSCO evaluations as part of this comparison, but the two other species did not have BUSCO scores in their respective publications. As such, I decided to generate them myself, as BUSCO runs very quickly. The job was run on Mox.

Read More

MBD Selection - M.magister Sheared Gill gDNA 16 of 24 Samples Set 3 of 3

  • 1 min read

Click here for notebook on the first eight samples processed. Click here for the second set of eight samples processed. M.magister (Dungeness crab) gill gDNA provided by Mackenzie Gavery was previously sheared on 20201026 and three samples were subjected to additional rounds of shearing on 20201027, in preparation for methyl bidning domain (MBD) selection using the MethylMiner Kit (Invitrogen).

Read More

Trimming - Shelly S.salar RNAseq Using fastp and MultiQC on Mox

  • 3 min read

Shelly asked that I trim, align to a genome, and perform transcriptome alignment counts in this GitHub issue with some Salmo salar RNAseq data she had and, using a subset of the NCBI Salmo salar RefSeq genome, GCF_000233375.1. She created a subset of this genome using only sequences designated as “chromosomes.” A link to the FastA (and a link to her notebook on creating this file) are in that GitHub issue link above. The transcriptome she has provided has not been subsetted in a similar fashion; maybe I’ll do that prior to alignment.

Read More

DNA Shearing - M.magister CH05-21 gDNA Full Shearing Test and Bioanalyzer

  • 2 min read

Yesterday, I did some shearing of Metacarcinus magister gill gDNA on a test sample (CH05-21) to determine how many cycles to run on the sonicator (Bioruptor 300; Diagenode) to achieve an average fragment length of ~350 - 500bp in preparation for MBD-BSseq. The determination from yesterday was 70 cycles (30s ON, 30s OFF; low intensity). That determination was made by first sonicating for 35 cycles, followed by successive rounds of 5 cycles each. I decided to repeat this, except by doing it in a single round of sonication.

Read More

DNA Shearing - M.magister gDNA Shear Testing and Bioanalyzer

  • 1 min read

Steven assigned me to do some MBD-BSseq library prep (GitHub Issue) for some Dungeness crab (Metacarcinus magister) DNA samples provided by Mackenzie Gavery. The DNA was isolated from juvenile (J6/J7 developmental stages) gill tissue. One of the first steps in MBD-BSseq is to fragment DNA to a desired size (~350 - 500bp in our case). However, we haven’t worked with Metacarcinus magister DNA previously, so I need to empirically determine sonicator (Bioruptor 300; Diagenode) settings for these samples.

Read More

Read Mapping - C.bairdi 201002558-2729-Q7 and 6129-403-26-Q7 Taxa-Specific NanoPore Reads to cbai_genome_v1.01.fasta Using Minimap2 on Mox

  • 2 min read

After extracting FastQ reads using seqtk on 20201013 from the various taxa I had been interested in, the next thing needed doing was mapping reads to the cbai_genome_v1.01 “genome” assembly from 20200917. I found that Minimap2 will map long reads (e.g. NanoPore), in addition to short reads, so I decided to give that a rip.

Read More

Data Wrangling - C.bairdi NanoPore Reads Extractions With Seqtk on Mephisto

  • 1 min read

In my pursuit to identify which contigs/scaffolds of our C.bairdi” genome assembly from 20200917 correspond to interesting taxa, based on taxonomic assignments produced by MEGAN6 on 20200928, I used MEGAN6 to extract taxa-specific reads from cbai_genome_v1.01 on 20201007 - the output is only available in FastA format. Since I want the original reads in FastQ format, I will use the FastA sequence IDs (from the FastA index file) and provide that to seqtk to extract the FastQ reads for each sample and corresponding taxa.

Read More

Taxonomic Assignments - C.bairdi 6129-403-26-Q7 NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on emu

  • 3 min read

After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.

Read More

Taxonomic Assignments - C.bairdi 20102558-2729-Q7 NanoPore Reads Using DIAMOND BLASTx on Mox and MEGAN6 daa2rma on emu

  • 3 min read

After noticing that the initial MEGAN6 taxonomic assignments for our combined C.bairdi NanoPore data from 20200917 revealed a high number of bases assigned to E.canceri and Aquifex sp., I decided to explore the taxonomic breakdown of just the individual samples to see which of the samples was contributing to these taxonomic assignments most.

Read More

Data Wrangling - C.bairdi NanoPore 6129-403-26 Quality Filtering Using NanoFilt on Mox

  • 2 min read

Last week, I ran all of our Q7-filtered C.baird NanoPore reads through MEGAN6 to evaluate the taxonomic breakdown (on 20200917) and noticed that there were a large quantity of bases assigned to E.canceri (a known microsporidian agent of infection in crabs) and Aquifex sp. (a genus of thermophylic bacteria), in addition to the expected Arthropoda assignments. Notably, Alveolata assignments were remarkably low.

Read More

Data Wrangling - C.bairdi NanoPore 20102558-2729 Quality Filtering Using NanoFilt on Mox

  • 2 min read

Last week, I ran all of our Q7-filtered C.baird NanoPore reads through MEGAN6 to evaluate the taxonomic breakdown (on 20200917) and noticed that there were a large quantity of bases assigned to E.canceri (a known microsporidian agent of infection in crabs) and Aquifex sp. (a genus of thermophylic bacteria), in addition to the expected Arthropoda assignments. Notably, Alveolata assignments were remarkably low.

Read More

Data Wrangling - Subsetting cbai_genome_v1.0 Assembly with faidx

  • 1 min read

Previously assembled cbai_genome_v1.0.fasta with our NanoPore Q7 reads on 20200917 and noticed that there were numerous sequences that were well shorter than the expected 500bp threshold that the assembler (Flye) was supposed to spit out. I created an Issue on the Flye GitHub page to find out why. The developer responded and determined it was an issue with the assembly polisher and that sequences <500bp could be safely ignored.

Read More

DNA Quantification - Re-quant Ronits C.gigas Diploid-Triploid Ctenidia gDNA Submitted to ZymoResearch

  • 1 min read

I received notice from ZymoResearch yesterday afternoon that the DNA we sent on 20200820 for this project (Quote 3534) had insufficient DNA for sequencing for most of the samples. This was, honestly, shocking. I had even submitted well over the minimum amount of DNA required (submitted 1.75ug - only needed 1ug). So, I’m not entirely sure what happened here.

Read More

Primer Design and In-Silico Testing - Geoduck Reproduction Primers

  • 1 min read

Shelly asked that I re-run the primer design pipeline that Kaitlyn had previously run to design a set of reproduction-related qPCR primers. Unfortunately, Kaitlyn’s Jupyter Notebook wasn’t backed up and she accidentally deleted it, I believe, so there’s no real record of how she designed the primers. However, I do know that she was unable to run the EMBOSS primersearch tool, which will check your primers against a set of sequences for any other matches. This is useful for confirming specificity.

Read More

Metagenomics - Data Extractions Using MEGAN6

  • 1 min read

Decided to finally take the time to methodically extract data from our metagenomics project so that I have the tables handy when I need them and I can easily share them with other people. Previously, I hadn’t done this due to limitations on looking at the data remotely. I finally downloaded all of the RMA6 files from 20191014 after being fed up with the remote desktop connection and upgrading the size of my hard drive (5 of the six RMA6 files are >40GB in size).

Read More

Sequence Extractions - C.bairdi Transcriptomes v2.0 and v3.0 Excluding Alveolata with MEGAN6 on Swoose

  • ~1 min read

Continuing to try to identify the best C.bairdi transcriptome, we decided to extract all non-dinoflagellate sequences from cbai_transcriptome_v2.0 (RNAseq shorthand: 2018, 2019, 2020-GW, 2020-UW) and cbai_transcriptome_v3.0 (RNAseq shorthand: 2018, 2019, 2020-UW). Both of these transcriptomes were assembled without any taxonomic filter applied. DIAMOND BLASTx and conversion to MEGAN6 RMA6 files was performed yesterday (20200604).

Read More

Transcriptome Comparison - C.bairdi Transcriptomes Compared with DETONATE on Mox

  • 4 min read

We’ve produced a number of C.bairdi transcriptomes and we’re interested in doing some comparisons to try to determine which one might be “best”. I previously compared the BUSCO scores of each of these transcriptomes and now will be using the DETONATE software package to perform two different types of comparisons: compared to a reference (REF-EVAL) and determine an overall quality “score” (RSEM-EVAL). I’ll be running REF-EVAL in this notebook.

Read More

Transcriptome Assembly - C.bairdi All Pooled Arthropoda-only RNAseq Data with Trinity on Mox

  • 2 min read

For completeness sake, I wanted to create an additional C.bairdi transcriptome assembly that consisted of Arthropoda only sequences from just pooled RNAseq data (since I recently generated a similar assembly without taxonomically filtered reads on 20200518). This constitutes samples we have designated: 2018, 2019, 2020-UW. A de novo assembly was run using Trinity on Mox. Since all pooled RNAseq libraries were stranded, I added this option to Trinity command.

Read More

Transcriptome Assembly - P.trituberculatus (Japanese blue crab) NCBI SRA BioProject PRJNA597187 Data with Trinity on Mox

  • 3 min read

After generating a number of C.bairdi (Tanner crab) transcriptomes, we decided we should compare them to evaluate which to help decide which one should become our “canonical” version. As part of that, the Trinity wiki offers a list of tools that one can use to check the quality of transcriptome assemblies. Some of those require a transcriptome of a related species.

Read More

SRA Library Assessment - Determine RNAseq Library Strandedness from P.trituberculatus SRA BioProject PRJNA597187

  • 3 min read

We’ve produced a number of C.bairid transcriptomes utilizing different assembly approaches (e.g. Arthropoda reads only, stranded libraries only, mixed strandedness libraries, etc) and we want to determine which of them is “best”. Trinity has a nice list of tools to assess the quality of transcriptome assemblies, but most of the tools rely on comparison to a transcriptome of a related species.

Read More

Transcriptome Assembly - C.bairdi All Pooled RNAseq Data Without Taxonomic Filters with Trinity on Mox

  • 2 min read

Steven asked that I assemble a transcriptome with just our pooled C.bairdi RNAseq data (not taxonomically filtered; see the FastQ list file linked in the Results section below). This constitutes samples we have designated: 2018, 2019, 2020-UW. A de novo assembly was run using Trinity on Mox. Since all pooled RNAseq libraries were stranded, I added this option to Trinity command.

Read More

GO to GOslim - C.bairdi Enriched GO Terms from 20200422 DEGs

  • 6 min read

After running pairwise comparisons and identify differentially expressed genes (DEGs) on 20200422 and finding enriched gene ontology terms, I decided to map the GO terms to Biological Process GOslims. Additionally, I decided to try another level of comparison (I’m not sure how valid it is), whereby I will count the number of GO terms assigned to each GOslim and then calculate the percentage of GOterms that get assigned to each of the GOslim categories. The idea being that it might help identify Biological Processes that are “favored” in a given set of DEGs. I decided to set up “fancy” pyramid plots to view a given set of GO-GOslims for each DEG comparison.

Read More

NanoPore Sequencing - C.bairdi gDNA 6129_403_26

  • 1 min read

After getting high quality gDNA from Hematodinium-infected C.bairdi hemolymph on 2020210 we decided to run some of the sample on the NanoPore MinION, since the flowcells have a very short shelf life. Additionally, the results from this will also help inform us on whether this sample might worth submitting for PacBio sequencing. And, of course, this provides us with additional sequencing data to complement our previous NanoPore runs from 20200109.

Read More

Trimming/MultiQC - Methcompare Bisulfite FastQs with fastp on Mox

  • 3 min read

Steven asked me to trim a set of FastQ files, provided by Hollie Putnam, in preparation for methylation analysis using Bismark. The analysis is part of a coral project comparing DNA methylation profiles of different species, as well as comparing different sample prep protocols. There’s a dedicated GitHub repo here:

Read More

DNA Isolation, Quantification, and Gel - C.bairdi gDNA Sample 6129_403_26

  • 1 min read

In order to do some genome sequencing on C.bairid and Hematodinium, we need hihg molecular weight gDNA. I attempted this twice before, using two different methods (Quick DNA/RNA Microprep Kit (ZymoResearch) on 20200122 and the E.Z.N.A Mollusc DNA Kit (Omega) on 20200108) using ~10yr old ethanol-preserved tissue provided by Pam Jensen. Both methods yielded highly degrade gDNA. So, I’m now attempting to get higher quality gDNA from the RNAlater-preserved hemolymph pellets from this experiment.

Read More

Gene Expression - Hematodinium MEGAN6 with Trinity and EdgeR

  • 2 min read

After completing annotation of the Hematodinium MEGAN6 taxonomic-specific Trinity assembly using Trinotate on 20200126, I performed differential gene expression analysis and gene ontology (GO) term enrichment analysis using Trinity’s scripts to run EdgeR and GOseq, respectively. The comparison listed below is the only comparison possible, as there were no reads present in the uninfected Hematodinium extractions.

Read More

NanoPore Sequencing - Initial NanoPore MinION Lambda Sequencing Test

  • 1 min read

We recently acquired a NanoPore MinION sequencer, FLO-MIN106 flow cell and the Rapid Sequencing Kit (SQK-RAD004). The NanoPore website provides a pretty thorough an user-friendly walk-through of how to begin using the system for the first time. With that said, I believe the user needs to have a registered account with NanoPore and needs to have purchased some products to have full access to the protocols they provide.

Read More

Genome Comparison - Pgenerosa_v074 vs Pgenerosa_v070 with MUMmer Promer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer 3.23 (specifically, promer for protein level comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs S.glomerata NCBI with MUMmer Promer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer 3.23 (specifically, promer for protein level comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs M.yessoensis NCBI with MUMmer Promer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer 3.23 (specifically, promer for protein level comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs H.sapiens NCBI with MUMmer Promer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer 3.23 (specifically, promer for protein level comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs C.virginica NCBI with MUMmer Promer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer 3.23 (specifically, promer for protein level comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs C.gigas NCBI with MUMmer Promer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer 3.23 (specifically, promer for protein level comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs Pgenerosa_v070 with MUMmer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer (v4) (specifically, nucmer for nucleotide comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs Pgenerosa_v074 with MUMmer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer (v4) (specifically, nucmer for nucleotide comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs S.glomerata NCBI with MUMmer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer (v4) (specifically, nucmer for nucleotide comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs M.yessoensis NCBI with MUMmer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer (v4) (specifically, nucmer for nucleotide comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs H.sapiens NCBI with MUMmer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer (v4) (specifically, nucmer for nucleotide comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs C.gigas NCBI with MUMmer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer (v4) (specifically, nucmer for nucleotide comparisons). This software is specifically designed to do this type of comparison.

Read More

Genome Comparison - Pgenerosa_v074 vs C.virginica NCBI with MUMmer on Mox

  • 3 min read

In continuing to further improve our geoduck genome annotation, I’m attempting to figure out why Scaffold 1 of our assembly doesn’t have any annotations. As part of that I’ve decided to perform a series of genome comparisons and see how they match up, with an emphasis on Scaffold 1, using MUMmer (v4) (specifically, nucmer for nucleotide comparisons). This software is specifically designed to do this type of comparison.

Read More

Data Summary - P.generosa Transcriptome Assemblies Stats

  • 1 min read

In our continuing quest to wrangle the geoduck transcriptome assemblies we have, I was tasked with compiling assembly stats for our various assemblies. The table below provides an overview of some stats for each of our assemblies. Links within the table go to the the notebook entries for the various methods from which the data was gathered. In general:

Read More

FastQC-MultiQC - Additional C.gigas WGBS Sequencing Data from Genewiz Received 20190501

  • ~1 min read

Earlier today, we received the additional G.gigas sequencing data from Genewiz. Wanted to run through FastQC again and get an updated report for each data set. Admittedly, it probably won’t look much different from the initial FastQC run on 20190415, due to the fact that the additional sequencing was simply appended to the previous data. Since FastQC examines a subset of the data in each file, I’d fully expect the FastQC report to look the same. However, we’ll have a greater number of sequences in each file. This should, in turn, increase the number of reads retained after quality trimming.

Read More