In an attempt to figure out what’s going on with the Illumina data we recently received for these samples, I BLASTed the 400ppm data set that had previously been de-novo assembled by Steven: EmmaBS400.fa (dead link - owl.fish.washington.edu/halfshell/EmmaBS400.fa).
I also created a nucleotide BLAST database (DB) from the Crassostrea_gigas.GCA_000297895.1.24.fa
Jupyter (IPython) Notebook: 20150429_Gigas_larvae_OA_BLASTn.ipynb
Notebook Viewer: 20150429_Gigas_larvae_OA_BLASTn
Results:
The results are not great.
All query contigs successfully BLAST to sequences in the C.gigas Ensembl BLAST DB. However, only 33 of the sequences (out of ~37,000) have an e-value of 0.0. The next best e-value for any matches is 0.001. For the uninitiated, that value is not very good, especially when you’re BLASTing against the same exact species DB.
Will BLASTn the C.gigas contigs against the entire GenBank nt (all nucleotides) to see what the taxonomy breakdown looks like of these sequences.