Read Mapping - Mapping Illumina Data to Geoduck Genome Assemblies with Bowtie2

We have an upcoming meeting with Illumina to discuss how the geoduck genome project is coming along and to decide how we want to proceed.

So, we wanted to get a quick idea of how well our geoduck assemblies are by performing some quick alignments using Bowtie2.

Used the following assemblies as references:

sn_ph_01 : SuperNova assembly of 10x Genomics data
sparse_03 : SparseAssembler assembly of BGI and Illumina project data
pga_02 : Hi-C assembly of Phase Genomics data

The analysis is documented in a Jupyter Notebook.

Jupyter Notebook (GitHub):

20180508_roadrunner_geoduck_bowtie2_genome_mapping.ipynb

NOTE: Due to large amount of stdout from first genome index command, the notebook does not render well on GitHub. I recommend downloading and opening notebook on a locally install version of Jupyter.

Here’s a brief overview of the process:

Generate Bowtie2 indexes for each of the genome assemblies.
Map 1,000,000 reads from the following Illumina NovaSeq FastQ files:

Results:

Bowtie2 Genome Indexes:

20180508_geoduck_assemblies_bowtie2_indexes/

Bowtie2 sn_ph_01 alignment folder:

20180508_geoduck_mapping_nova_to_10x/

Bowtie2 sparse_03 alignment folder:

20180508_geoduck_mapping_nova_to_sparse/

Bowtie2 pga_02 alignment folder:

20180508_geoduck_mapping_nova_to_Hi-C/

MAPPING SUMMARY TABLE

_All mapping data was pulled from the respective *.err file in the Bowtie2 alignment folders._

sequence_ID Assembler Alignment Rate (%)

sn_ph_01	SuperNova (10x)	79.89
sparse_03	SparseAssembler	85.83
pga_02	Hi-C (Phase Genomics)	79.90\|

Mapping efficiency is similar for all assemblies. After speaking with Steven, we’ve decided we’ll begin exploring genome annotation pipelines.