We have an upcoming meeting with Illumina to discuss how the geoduck genome project is coming along and to decide how we want to proceed.
Used the following assemblies as references:
sn_ph_01 : SuperNova assembly of 10x Genomics data
sparse_03 : SparseAssembler assembly of BGI and Illumina project data
pga_02 : Hi-C assembly of Phase Genomics data
The analysis is documented in a Jupyter Notebook.
Jupyter Notebook (GitHub):
NOTE: Due to large amount of stdout from first genome index command, the notebook does not render well on GitHub. I recommend downloading and opening notebook on a locally install version of Jupyter.
Here’s a brief overview of the process:
Generate Bowtie2 indexes for each of the genome assemblies.
Map 1,000,000 reads from the following Illumina NovaSeq FastQ files:
Bowtie2 Genome Indexes:
Bowtie2 sn_ph_01 alignment folder:
Bowtie2 sparse_03 alignment folder:
Bowtie2 pga_02 alignment folder:
MAPPING SUMMARY TABLE
All mapping data was pulled from the respective *.err file in the Bowtie2 alignment folders.
|[pga_02](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies)||Hi-C (Phase Genomics)||79.90||
Mapping efficiency is similar for all assemblies. After speaking with Steven, we’ve decided we’ll begin exploring genome annotation pipelines.