UPDATE 20220121: THIS IS AN INCOMPLETE POST AND IS ONLY POSTED FOR POSTERITY.
Due to time constraints and limits on the amount of effort wanted to be put into this project, I did not proceed with a complete analysis using Anvi’o. However, there was a fair amount of work put into using this program, so I’ve decided to post what I had originally written up, despite it being incomplete. It may serve as a useful reference for someone else in the lab in the future. Since this stage of analysis requires a graphical user interface, it was performed on in a dedicated conda
envrionment for Anvi’o on swoose
and not on Mox
. Original Anvi’o database generation was performed on 20190401.
Check the initial data binning:
anvi-interactive \
--profile-db PROFILE.db \
--contigs-db contigs.db \
--collection-name CONCOCT
That generates this:
Table representation of initial data binning:
anvi-summarize \
--pan-or-profile-db PROFILE.db \
--contigs-db contigs.db \
--collection-name CONCOCT \
--output-dir MERGED_SUMMARY
This command generates an index.html
file (see Results section below for link) and takes ~10 minutes to complete. Here’s the portion showing the binning completion/redundancy info (there’s much, much more data present in that file):
Refine the bins:
anvi-refine \
--profile-db PROFILE.db \
--contigs-db contigs.db \
--collection-name CONCOCT \
--bin-id Bin_75
In the screencap below, the region of the dendrogram marked as “Bin_75_1” shows a drastic difference in coverage in the MG7 track at this particular split. Additionally, looking at the quick stats shown for this newly identified bin (in the window pane to the left), one can see that Completion is now 99% and Redundancy is only 1.4%; a marked improvement on the automatic binning.
There does appear to be a problem with the binning data, though. It appears that many bins exhibit Completion/Redundancy data of 0.00%, however clicking on that data in the summary table reveals that is incorrect:
Here’s an example of what happens when refining Bin 3:
Ah, as it turns out, this has been fixed in the “master” commit in Anvi’o (thanks to the Anvi’o devs for their fast responses to my questions on their Slack channel!!).
Here’s how the upgrade process went.
Clone current version of Anvio’s GitHub repo:
git clone --recursive https://github.com/merenlab/anvio.git
Create an Anaconda environment for Anvio’, using Python version 3.6.
Then, activate the newly created Anaconda environment.
Within the cloned Anvi’o repo, use Python to execute the setup.py
file.
Finally, use pip
to complete the Anvi’o installation.
conda create --yes --name anvio python=3.6
conda activate anvio
python setup.py
pip install --editable
Now, we should be able to run anvio-interactive
on the database again and get an updated version of stats for each bin:
conda activate anvio
(anvio) sam@swoose:~/analyses/20190619_anvio$ ~/programs/anvio_git_master_bfbcbb3/bin/anvi-interactive --profile-db PROFILE.db --contigs-db contigs.db --collection-name CONCOCT
Config Error: The database at 'contigs.db' is outdated (its version is v12, but your anvi'o
installation only knows how to deal with v13). You can migrate your database
without losing any data using the program `anvi-migrate-db`.
So, I ran the anvi-migrate-db
command as recommended:
~/programs/anvio_git_master_bfbcbb3/bin/anvi-migrate-db contigs.db
This message popped up:
* The contigs database is now 13. Unfortunatly this update removed all single-copy
core gene HMMs from your contigs database :( We are very sorry about this, but
we only did it to be able to offer you nicer things. It is best if you re-run
`anvi-run-hmms` program from scratch. Doing that will not remove any 'non-
default' HMM profiles you may have added in this contigs database, so you have
nothing to worry about.
So, I ran the anvi-run-hmms
command on the database to complete the upgrae process:
~/programs/anvio_git_master_bfbcbb3/bin/anvi-run-hmms -c contigs.db --num-threads 23
After upgrading, here’s how things look:
Additionally, the Anvi’o refine interface provides an informative pop up when clicking on the Completion info on the left pane that explains in user-friendly lingo what your Completion/Redundancy data might suggest:
Clearly, there’s a lot of manual binning that needs to take place in order to refine the bins with excessive Completion/Redundancy info…
Here are some examples of how you perform further binning (refinement) of the intial bins. Basically, visually examing the coverage plots (the black histograms), along with the tree in the center of the plot. Select regions where there is a noticeable difference in magnitude of the histograms and where there are distinct branches in the tree. Anvi’o practically has these already identified, but for some reason doesn’t bin them separately; sometimes even when there are high Completion and low Redudnacy scores.