Saw this tweet this morning and thought this would be good to try out for our Olympia oyster genome assemblies, as it will handle “hybrid” assemblies (i.e. short-reads and long reads):
The latest verison of the MaSuRCA assembler supports @nanopore reads and it is super easy to use. Check it out! https://t.co/ByKUCpAoIf pic.twitter.com/KulviJhmMh
— Darrin Schultz (@conchoecia) October 18, 2017
Additionally, I was excited by the “…super easy to use.” part in the description. As it turns out, that part of the Tweet is totally untrue. Here are some of the things that make it difficult to use:
No pre-install readme file. Without readme file there are no instructions/info on:
Necessary dependencies
Install command(s)
Initial install attempt failed with error message. Message suggests trying:
- BOOST_ROOT=install ./install.sh
No post-install readme file. How do I even get started without any documentation??!!
I managed to track down the guide for this, but didn’t get it via searching the internet. I noticed that the link for the software in the original Tweet had a parent directory, so I navigated there and spotted this:
Quick start guide (PDF): ftp://ftp.genome.umd.edu/pub/MaSuRCA/MaSuRCA_QuickStartGuide.pdf
Although not a big deal, this quick start guide is for the previous version (v.3.2.2).
So, is this where we get to the “super easy to use” part? Uh, no. Check it out:
- Modify a config file (luckily, a template is created during install)
* Illumina paired-end (PE) reads: Aribtrary two letter prefix, mean read length, and read length standard deviation need to be supplied (!!!)
* Illumina mate-paired (MP) reads: Called "JUMP" in config file and needs the same type of info supplied as PE reads.
* PacBio reads: Need to be in a single FASTA file (ugh)!
* A bunch of other stuff that I can probably leave alone.
Run the
masurca
script (located here:MaSuRCA-3.2.3/bin/masurca
). This will generate a new script (calledassemble.sh
).Run the
assemble.sh
script that was created in the previous step.
Although not specifically related to the MaSuRCA install, I did encounter problems trying to install this on our Mox (hyak) computing node.
Build node fail (ironically, this is the specific type of node that’s supposed to be used for compiling software):
(http://owl.fish.washington.edu/Athaliana/20171019_mox_build_masurca_fail.png)
OK, so I decided to try compiling it using the login node (which is not what the login node is supposed to be used for):
Login node fail:
(http://owl.fish.washington.edu/Athaliana/20171019_mox_login_masurca_fail.png)
I really didn’t want to have to put together an SBATCH script just to compile this software (which compiled without issue, except for that initial BOOST error thingy, on my local Ubuntu 16.04 LTS system), so I just tried running an interactive node and it worked!
Now, on to trying to actually run this thing…