Metagenomics Annotation - P.generosa Water Samples with MEGAN6

After running DIAMOND BLASTx and MEGANIZER on these samples on 20190925 to assess taxonomy info, I began the analyses/visualization of this data with MEGAN6.

Initially, I tried to import all of the Meganized DAA files, but ran in to problems.

After running for an entire day, my hard drive ran out of space! Turns out, each of the MEGAN6 files generated are going to be HUGE:

Screencap showing 178GB MEGAN6 file size for a single sample

If each sample is going to generate a 178GB (or larger?) file, I tried to run this analysis directly on Gannet, as my computer’s HDD won’t have enough space.

Trying to process this on Gannet also failed, for multiple reasons. Firstly, we seemed to be having stability problems with Gannet during this analysis. Frequently, Gannet was shut off, interrupting analysis. This may have been due to bad batteries in the UPS that Gannet was connected to. Secondly, despite the interruptions, I also encountered a couple of I/O errors (i.e. data transfer issues).

Both of these were particularly annoying because the analysis took multiple days to run!

However, these issues may have ended up being serendipitous, as it led me to read deeper into the MEGAN community forum and get a better understanding of how MEGAN works; particularly how it handles paired end sequencing data.

As such, I determined only one pair (and only one pair) of Meganized DAA files should be imported and processed at a time. Each pair took ~2 - 3 days to generate the expected RMA6 file. Each RMA6 file was generally very large (~40GB), however, the MG3 RMA6 file is significantly smaller than the others (~24GB):

screencap of RMA6 file sizes

RMA6 files were generated using the “Import from BLAST”, selecting each pair of DAA files, and applying the default Naive LCA settings. Used the following mapping files:

Taxonomy: prot_acc2tax-Jul2019X1.abin

EggNog: acc2eggnog-Jul2019X.abin

InterPro2GO: acc2interpro-Jul2019X.abin

SEED: acc2seed-May2015XX.abin

Here’s how the sample names breakdown:

Sample Develomental Stage (days post-fertilization) pH Treatment
MG1 13 8.2
MG2 17 8.2
MG3 6 7.1
MG5 10 8.2
MG6 13 7.1
MG7 17 7.1


Output folder:

RMA6 files (NOTE: these are very large files!):

MEGAN6 “Import from BLAST” log file:

The process generates this type of visualization after an RMA6 file is created:

Screencap of example RMA6 visualization

Next up is to figure out how to compare samples/groups. Since I have separate RMA6 files for each sample, I believe that I can import all the RMA6 files at one time and then use the software to group/compare them.