We’re working on getting the metagenomics sequencing project written up as a manuscript and Steven asked me to provide an overview of the taxonomic makeup of our metagenome assembly in this GitHub Issue.
I previously assembled all of the sequencing data in to a single assembly (i.e. did not assemble by experimental treatments):
Subsequently, I ran some gene prediction software to help refine the assembly in to a more conservative representation, in hopes of getting a more realistic view of biologically relevant DNA (i.e. analyzing sequenced DNA that actually has putative functions, as opposed to random eDNA that may have been floating around in the water):
For getting taxonomic info, I took the MetaGeneMark proteins FastA file and ran BLASTp against the NCBI SwissProt database (v5) to get taxonomic IDs. See this Jupyter Notebook (GitHub):
This was followed up by using Krona to plot the data in an interactive fashion, according to NCBI taxonomic ID abundance (see Results below).
Here’s how the sample names breakdown:
|Sample||Develomental Stage (days post-fertilization)||pH Treatment|
Interactive Krona plots (HTML):
As a brief overview, the initial Megahit assembly generated:
- 2,276,153 contigs.
- 3,296,610 genes.
BLASTp resulted in:
- 1,346,325 SwissProt matches
The Krona plot provides a pretty nice way to view the breakdown of the data and, as such, I won’t provide a written summary of how it all shakes out.
Next, for curiosity sake, I’ll run BLASTn and see how things compare.