We would like to get an assembly of the geoduck NovaSeq data that Illumina provided us with.
Steven previously ran the raw data through FASTQC and there was a significant amount of adapter contamination (up to 44% in some libraries) present see his FASTQC report here (DEAD LINK - owl.fish.washington.edu/halfshell/bu-alanine-wd/17-09-15b/multiqc_report.html).
So, I trimmed them using TrimGalore and re-ran FASTQC on them.
This required two rounds of trimming using the “auto-detect” feature of Trim Galore.
Round 1: remove NovaSeq adapters
Round 2: remove standard Illumina adapters
See Jupyter notebook below for the gritty details.
Results:
All data for this NovaSeq assembly project can be found here: https://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/.
Round 1 Trim Galore reports:
Round 1 FASTQC:
Round 1 FASTQC MultiQC overview:
Round 2 Trim Galore reports:
Round 2 FASTQC:
Round 2 FASTQC MultiQC overview:
For the astute observer, you might notice the “Per Base Sequence Content” generates a “Fail” warning for all samples. Per the FASTQC help, this is likely expected (due to the fact that NovaSeq libraries are prepared using transposases) and doesn’t have any downstream impacts on analyses.
Jupyter Notebook (GitHub): 20180125_roadrunner_trimming_geoduck_novaseq.ipynb