Adapter Trimming and FASTQC - Illumina Geoduck Novaseq Data

We would like to get an assembly of the geoduck NovaSeq data that Illumina provided us with.

Steven previously ran the raw data through FASTQC and there was a significant amount of adapter contamination (up to 44% in some libraries) present see his FASTQC report here (DEAD LINK - owl.fish.washington.edu/halfshell/bu-alanine-wd/17-09-15b/multiqc_report.html).

So, I trimmed them using TrimGalore and re-ran FASTQC on them.

This required two rounds of trimming using the “auto-detect” feature of Trim Galore.

Round 1: remove NovaSeq adapters
Round 2: remove standard Illumina adapters

See Jupyter notebook below for the gritty details.

Results:

All data for this NovaSeq assembly project can be found here: https://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/.

Round 1 Trim Galore reports:

20180125_trim_galore_reports/

Round 1 FASTQC:

20180129_trimmed_multiqc_fastqc_01

Round 1 FASTQC MultiQC overview:

20180129_trimmed_multiqc_fastqc_01/multiqc_report.html

Round 2 Trim Galore reports:

20180125_geoduck_novaseq/20180205_trim_galore_reports/

Round 2 FASTQC:

20180205_trimmed_fastqc_02/

Round 2 FASTQC MultiQC overview:

20180205_trimmed_multiqc_fastqc_02/multiqc_report.html

For the astute observer, you might notice the “Per Base Sequence Content” generates a “Fail” warning for all samples. Per the FASTQC help, this is likely expected (due to the fact that NovaSeq libraries are prepared using transposases) and doesn’t have any downstream impacts on analyses.

Jupyter Notebook (GitHub): 20180125_roadrunner_trimming_geoduck_novaseq.ipynb