We have an upcoming meeting with Illumina to discuss how the geoduck genome project is coming along and to decide how we want to proceed.
So, we wanted to get a quick idea of how well our geoduck assemblies are by performing some quick alignments using Bowtie2.
Used the following assemblies as references:
sn_ph_01 : SuperNova assembly of 10x Genomics data
sparse_03 : SparseAssembler assembly of BGI and Illumina project data
pga_02 : Hi-C assembly of Phase Genomics data
The analysis is documented in a Jupyter Notebook.
Jupyter Notebook (GitHub):
NOTE: Due to large amount of stdout from first genome index command, the notebook does not render well on GitHub. I recommend downloading and opening notebook on a locally install version of Jupyter.
Here’s a brief overview of the process:
Generate Bowtie2 indexes for each of the genome assemblies.
Map 1,000,000 reads from the following Illumina NovaSeq FastQ files:
Results:
Bowtie2 Genome Indexes:
Bowtie2 sn_ph_01 alignment folder:
Bowtie2 sparse_03 alignment folder:
Bowtie2 pga_02 alignment folder:
MAPPING SUMMARY TABLE
_All mapping data was pulled from the respective *.err file in the Bowtie2 alignment folders._
sequence_ID Assembler Alignment Rate (%)sn_ph_01 | SuperNova (10x) | 79.89 |
sparse_03 | SparseAssembler | 85.83 |
pga_02 | Hi-C (Phase Genomics) | 79.90| |
Mapping efficiency is similar for all assemblies. After speaking with Steven, we’ve decided we’ll begin exploring genome annotation pipelines.