We’re working on getting the metagenomics sequencing project written up as a manuscript and Steven asked me to provide an overview of the taxonomic makeup of our metagenome assembly in this GitHub Issue.

I previously assembled all of the sequencing data in to a single assembly (i.e. did not assemble by experimental treatments):

Megahit assembly notebook.

Subsequently, I ran some gene prediction software to help refine the assembly in to a more conservative representation, in hopes of getting a more realistic view of biologically relevant DNA (i.e. analyzing sequenced DNA that actually has putative functions, as opposed to random eDNA that may have been floating around in the water):

MetaGeneMark gene prediction notebook

For getting taxonomic info, I took the MetaGeneMark proteins FastA file and ran BLASTp against the NCBI SwissProt database (v5) to get taxonomic IDs. See this Jupyter Notebook (GitHub):

20190321_swoose_metagnomics_pgen_blastp_ncbi-sp-v5-db.ipynb

This was followed up by using Krona to plot the data in an interactive fashion, according to NCBI taxonomic ID abundance (see Results below).

Here’s how the sample names breakdown:

Sample	Develomental Stage (days post-fertilization)	pH Treatment
MG1	13	8.2
MG2	17	8.2
MG3	6	7.1
MG5	10	8.2
MG6	13	7.1
MG7	17	7.1

RESULTS

Output folder:

20190321_metagenomics_pgen_blastp/
Interactive Krona plots (HTML):
20190321_metagenomics_pgen_blastp/krona_megahit_MGM_blastp.html

As a brief overview, the initial Megahit assembly generated:

2,276,153 contigs.

MetaGeneMark predicted:

3,296,610 genes.

BLASTp resulted in:

1,346,325 SwissProt matches

The Krona plot provides a pretty nice way to view the breakdown of the data and, as such, I won’t provide a written summary of how it all shakes out.

Next, for curiosity sake, I’ll run BLASTn and see how things compare.