I previously ran this data through the Bismark pipeline and followed up with MethylKit analysis. MethylKit analysis revealed an extremely low number of differentially methylated loci (DML), which seemed odd.
Steven and I met to discuss and compare our different variations on the analysis and decided to try out different tweaks to evaluate how they affect analysis.
I did the following tasks:
Looked at original sequence data quality with FastQC.
Summarized FastQC analysis with MultiQC.
Trimmed data using TrimGalore!, trimming 10bp from 5’ end of reads (8bp is recommended by Bismark docs).
Summarized trimming stats with MultiQC.
Looked at trimmed sequence quality with FastQC.
Summarized FastQC analysis with MultiQC.
This was run on the Univ. of Washington High Performance Computing (HPC) cluster, Mox.
Mox SBATCH submission script has all details on how the analyses were conducted:
RESULTS
Output folder:
Raw sequence FastQC output folder:
Raw sequence MultiQC report (HTML):
TrimGalore! output folder (trimmed FastQ files are here):
Trimming MultiQC report (HTML):
Trimmed FastQC output folder:
Trimmed MultiQC report (HTML):