Since we’re preparing a manuscript that relies on BGI’s manipulation/handling of the genotype-by-sequencing data, I attempted to could reproduce the demultiplexing steps that BGI used in order to perform the SNP/genotyping on these samples.
The key word in the above sentence is “attempted.” Ugh, what a massive waste of time it turned out to be. I’ve contacted BGI to get some help on this.
In the meantime, here’s a brief (actually, not as brief as I’d like) rundown of my struggles.
The demultiplexing software that BGI used is something called “iTools” which is bundled in this GitHub repo: Resqtools
To demutliplex, they ran a script called: split.sh
The script seems fairly straightforward. Here is what it contains:
<code>iTools Fqtools splitpool \
-InFq1 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz \
-InFq2 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz \
-Index index.lst \
-Flag enzyme.txt \
-MisMatch \
-OutDir split
</code>
It tells the iTools program to use the Fqtools tool “splitpool” to operate on a pair of gzipped FASTQ files. It also utilizes an index file (index.lst) which contains all the barcodes needed to identify, and separate, the individual samples that were combined prior to sequencing.
The first bump in the road is the -Flag enzyme.txt
portion of the code. BGI did not provide me with this file. I recently requested them to send me it (or its contents, since I suspected it was only a single line text file). They sent me the contents of the file:
<code>CAGC
CTGC</code>
The next problem is neither of those two sequences are the recognition site for the enzyme that was (supposedly) used: ApeKI. The recognition site for ApeKI is: GCWGC
Regardless, I decided to see if I could reproduce the demultiplexing using the info they’d provided me.
I cloned the Resqtools repo, changed into the Reseqtools/iTools directory and typed make
.
This resulted in an error informing me that it could not find boost/spirit/core.hpp
I tracked down the Boost library junk, downloaded the newest version and untarred it in /usr/local/bin.
Tried to run make
in the Reseqtools/iTools directory and got the same error. Realized iTools might not be searching the system $PATH
(this turned out to be correct), so I moved the contents of the Boost folder to the iTools, ran make
and got the same error. Turns out, the newest version of Boost doesn’t have that core.hpp
file any more. Looking at the iTools documentation, iTools was built around Boost 1.44. OMG…
Downloaded Boost 1.44 and went through the same steps as above. This eliminated the missing core.hpp
error!
But, of course, led to another error. The error:
<code>"Threading support unavaliable: it has been explicitly disabled with BOOST_DISABLE_THREADS"</code>
That was related to something with newer versions of the GCC compiler (this is, essentially, built into the computer; it’s not worth trying to install/use old versions of GCC) trying to work with old versions of Boost. Found a patch for a config file here: libstdcpp3.hpp.patch
I made the appropriate edits to the file as shown in that link and ran make
and it almost worked!
The current error is:
<code>./src/Variants/soapsv-v1.02/include.h:15:16: fatal error: gd.h: No such file or directory</code>
I gave up and contacted BGI to see if they can get me a functional version of iTools…