Metagenomics - P.generosa Water Sample Assembly Comparisons with Quast

Continuing work on the metagenomics project, Emma shared her “co-assembly”, so I figured it would be quick and easy to compare hers with mine and get a feel for how different/similar they might be. I did a similar comparison last week where I compared each of our individual water sample assemblies. Those results showed my assemblies generated:

  • significantly larger “largest contigs” (10 - 50x larger than Emma’s)

  • larger N50 values (~2x larger than Emma’s)

  • total length in bps (~1.5x more than Emma’s)

So, I ran Quast on my computer (swoose - Ubuntu 16.04LTS) with the following input FastAs:

python \
/home/sam/programs/quast-5.0.2/quast.py \
--threads=20 \
--min-contig=100 \
--labels=ets,sjw \
/home/sam/data/metagenomics/P_generosa/emma_assemblies/contigs.fa \
/home/sam/data/metagenomics/P_generosa/final.contigs.fa

Here’s how the sample names breakdown:

Sample Develomental Stage (days post-fertilization) pH Treatment
MG1 13 8.2
MG2 17 8.2
MG3 6 7.1
MG5 10 8.2
MG6 13 7.1
MG7 17 7.1

RESULTS

Output folder:

Quast report (HTML):

Screen cap of assembly comparison report

Well, these results are very strange. The thing that immediately jumps out to me is how “small” Emma’s assembly is. My assembly has nearly 5x the number of bases as hers does (2.2Gbp vs 412Mbp). This is an enormous disparity between the two assemblies. I’ll talk to Emma and try to get explicit details on how she constructed her assembly.