We’re working on getting the metagenomics sequencing project written up as a manuscript and Steven asked me to provide an overview of the taxonomic makeup of our metagenome assembly in this GitHub Issue.
I previously assembled all of the sequencing data in to a single assembly (i.e. did not assemble by experimental treatments):
Subsequently, I ran some gene prediction software to help refine the assembly in to a more conservative representation, in hopes of getting a more realistic view of biologically relevant DNA (i.e. analyzing sequenced DNA that actually has putative functions, as opposed to random eDNA that may have been floating around in the water):
For getting taxonomic info, I took the MetaGeneMark proteins FastA file and ran BLASTp against the NCBI SwissProt database (v5) to get taxonomic IDs. See this Jupyter Notebook (GitHub):
This was followed up by using Krona to plot the data in an interactive fashion, according to NCBI taxonomic ID abundance (see Results below).
Here’s how the sample names breakdown:
Sample | Develomental Stage (days post-fertilization) | pH Treatment |
---|---|---|
MG1 | 13 | 8.2 |
MG2 | 17 | 8.2 |
MG3 | 6 | 7.1 |
MG5 | 10 | 8.2 |
MG6 | 13 | 7.1 |
MG7 | 17 | 7.1 |
RESULTS
Output folder:
Interactive Krona plots (HTML):
20190321_metagenomics_pgen_blastp/krona_megahit_MGM_blastp.html
As a brief overview, the initial Megahit assembly generated:
- 2,276,153 contigs.
MetaGeneMark predicted:
- 3,296,610 genes.
BLASTp resulted in:
- 1,346,325 SwissProt matches
The Krona plot provides a pretty nice way to view the breakdown of the data and, as such, I won’t provide a written summary of how it all shakes out.
Next, for curiosity sake, I’ll run BLASTn and see how things compare.