We would like to get an assembly of the geoduck NovaSeq data that Illumina provided us with.
Steven previously ran the raw data through FASTQC and there was a significant amount of adapter contamination (up to 44% in some libraries) present (see his FASTQC report here).
So, I trimmed them using TrimGalore and re-ran FASTQC on them.
This required two rounds of trimming using the “auto-detect” feature of Trim Galore.
Round 1: remove NovaSeq adapters
Round 2: remove standard Illumina adapters
See Jupyter notebook below for the gritty details.
Results:
All data for this NovaSeq assembly project can be found here: https://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/.
Round 1 Trim Galore reports: [20180125_trim_galore_reports/](https://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180125_trim_galore_reports/] Round 1 FASTQC: 20180129_trimmed_multiqc_fastqc_01 Round 1 FASTQC MultiQC overview: 20180129_trimmed_multiqc_fastqc_01/multiqc_report.html
Round 2 Trim Galore reports: 20180125_geoduck_novaseq/20180205_trim_galore_reports/ Round 2 FASTQC: 20180205_trimmed_fastqc_02/ Round 2 FASTQC MultiQC overview: 20180205_trimmed_multiqc_fastqc_02/multiqc_report.html
For the astute observer, you might notice the “Per Base Sequence Content” generates a “Fail” warning for all samples. Per the FASTQC help, this is likely expected (due to the fact that NovaSeq libraries are prepared using transposases) and doesn’t have any downstream impacts on analyses.
Jupyter Notebook (GitHub): 20180125_roadrunner_trimming_geoduck_novaseq.ipynb