After running DIAMOND BLASTx and MEGANIZER on these samples on 20190925 to assess taxonomy info, I began the analyses/visualization of this data with MEGAN6.
Initially, I tried to import all of the Meganized DAA files, but ran in to problems.
After running for an entire day, my hard drive ran out of space! Turns out, each of the MEGAN6 files generated are going to be HUGE:
If each sample is going to generate a 178GB (or larger?) file, I tried to run this analysis directly on Gannet, as my computer’s HDD won’t have enough space.
Trying to process this on Gannet also failed, for multiple reasons. Firstly, we seemed to be having stability problems with Gannet during this analysis. Frequently, Gannet was shut off, interrupting analysis. This may have been due to bad batteries in the UPS that Gannet was connected to. Secondly, despite the interruptions, I also encountered a couple of I/O errors (i.e. data transfer issues).
Both of these were particularly annoying because the analysis took multiple days to run!
However, these issues may have ended up being serendipitous, as it led me to read deeper into the MEGAN community forum and get a better understanding of how MEGAN works; particularly how it handles paired end sequencing data.
As such, I determined only one pair (and only one pair) of Meganized DAA files should be imported and processed at a time. Each pair took ~2 - 3 days to generate the expected RMA6 file. Each RMA6 file was generally very large (~40GB), however, the MG3
RMA6 file is significantly smaller than the others (~24GB):
RMA6 files were generated using the “Import from BLAST”, selecting each pair of DAA files, and applying the default Naive LCA settings. Used the following mapping files:
Taxonomy: prot_acc2tax-Jul2019X1.abin
EggNog: acc2eggnog-Jul2019X.abin
InterPro2GO: acc2interpro-Jul2019X.abin
SEED: acc2seed-May2015XX.abin
Here’s how the sample names breakdown:
Sample | Develomental Stage (days post-fertilization) | pH Treatment |
---|---|---|
MG1 | 13 | 8.2 |
MG2 | 17 | 8.2 |
MG3 | 6 | 7.1 |
MG5 | 10 | 8.2 |
MG6 | 13 | 7.1 |
MG7 | 17 | 7.1 |
RESULTS
Output folder:
RMA6 files (NOTE: these are very large files!):
MEGAN6 “Import from BLAST” log file:
The process generates this type of visualization after an RMA6 file is created:
Next up is to figure out how to compare samples/groups. Since I have separate RMA6 files for each sample, I believe that I can import all the RMA6 files at one time and then use the software to group/compare them.