Trimming - Andy Dittman Preliminary RNA-seq Data Using Fastp FastQC MultiQC on Hyak
2026
fastp
fastqc
multiqc
hyak
RNA-seq
trimming
Author
Sam White
Published
May 20, 2026
INTRO
After running an initial FastQC on the raw reads, I thought it would be prudent to run the reads through some adapter and quality trimming. I ran fastp to trim the reads (removed adapters and low-quality bases, 15bp from 5’ end of each read, removed poly-G and poly-A tails), and then ran FastQC and MultiQC on the trimmed reads to assess the quality of the trimmed data. See CODE section below for specifics.
The trimming and quality assessment was run on Hyak using our Apptainer image: srlab-R4.4-bioinformatics-container-c3d3116.sif.
Trimming with fastp went quickly and smoothly, and the resulting trimmed FastQs were of good quality, as assessed by FastQC and MultiQC. After trimming, the per base sequence content is much more consistent across the read length.
I’ll recommend that Andy proceed with having the remainder of the samples sequenced.
Code is below.
CODE
VARIABLES
Code
# DIRECTORIESraw_reads_dir <-"/mmfs1/gscratch/scrubbed/samwhite/data/dittman_grc_rnaseq_1/"output_dir <-"/mmfs1/gscratch/scrubbed/samwhite/outputs/20260520--Fastp-FastQC-MultiQC-dittman-grc-rnaseq"# FILESfastq_pattern="*.fastq.gz"R1_fastq_pattern="*_R1_*.fastq.gz"R2_fastq_pattern="*_R2_*.fastq.gz"trimmed_fastq_pattern="*fastp-trim.fq.gz"# SETTINGSthreads <-"50"# Export these as environment variables for bash chunks.Sys.setenv(fastq_pattern = fastq_pattern,raw_reads_dir = raw_reads_dir,R1_fastq_pattern = R1_fastq_pattern,R2_fastq_pattern = R2_fastq_pattern,trimmed_fastq_pattern = trimmed_fastq_pattern,output_dir = output_dir,threads = threads)
Fastp Trimming
fastp[@chen2023] is set to auto-detect Illumina adapters, as well as trim the first 15bp from each read, as past experience shows these first 15bp are more inconsistent than the remainder of the read length.
# Make output directories, if it doesn't existmkdir--parents"${output_dir}"# Change to raw reads directorycd"${raw_reads_dir}"# Create arrays of fastq R1 files and sample namesfor fastq in${R1_fastq_pattern}dofastq_array_R1+=("${fastq}")R1_names_array+=("$(echo"${fastq}"|awk-F"_"-v OFS="_"'{print $1, $2, $3, $4, $5, $6}')")done# Create array of fastq R2 filesfor fastq in${R2_fastq_pattern}dofastq_array_R2+=("${fastq}")R2_names_array+=("$(echo"${fastq}"|awk-F"_"-v OFS="_"'{print $1, $2, $3, $4, $5, $6}')")done# Create list of fastq files used in analysis# Create MD5 checksum for referenceif[!-f"${output_dir}"/raw-fastq-checksums.md5 ];thenfor fastq in*.gzdomd5sum${fastq}>>"${output_dir}"/raw-fastq-checksums.md5donefi# Run fastp on files# Adds JSON report output for downstream usage by MultiQCfor index in"${!fastq_array_R1[@]}"doR1_sample_name=$(echo"${R1_names_array[index]}")R2_sample_name=$(echo"${R2_names_array[index]}")fastp\--in1${fastq_array_R1[index]}\--in2${fastq_array_R2[index]}\--detect_adapter_for_pe\--trim_poly_g\--trim_poly_x\--trim_front1 15 \--trim_front2 15 \--thread${threads}\--html"${output_dir}"/"${R1_sample_name}".fastp-trim.report.html \--json"${output_dir}"/"${R1_sample_name}".fastp-trim.report.json \--out1"${output_dir}"/"${R1_sample_name}"_R1_001.fastp-trim.fq.gz \--out2"${output_dir}"/"${R2_sample_name}"_R2_001.fastp-trim.fq.gz \2>>"${output_dir}"/fastp.stderr# Generate md5 checksums for newly trimmed filescd"${output_dir}"md5sum"${R1_sample_name}"_R1_001.fastp-trim.fq.gz >"${R1_sample_name}"_R1_001.fastp-trim.fq.gz.md5md5sum"${R2_sample_name}"_R2_001.fastp-trim.fq.gz >"${R2_sample_name}"_R2_001.fastp-trim.fq.gz.md5cd-done
FASTQC/MULTIQC
Uses --cl-config "sp: { fastp: { fn: '*report.json' } }" to update the MultiQC search pattern for the fastp module.
Code
# Make output directory if it doesn't existmkdir--parents"${output_dir}"############ RUN FASTQC ############# Create array of trimmed FastQstrimmed_fastqs_array=(${output_dir}/${trimmed_fastq_pattern})# Pass array contents to new variable as space-delimited listtrimmed_fastqc_list=$(echo"${trimmed_fastqs_array[*]}")echo"Beginning FastQC on trimmed reads..."echo""# Run FastQC### NOTE: Do NOT quote trimmed_fastqc_listfastqc\--threads ${threads}\--outdir ${output_dir}\--quiet \${trimmed_fastqc_list}echo"FastQC on trimmed reads complete!"echo""############ END FASTQC ######################## RUN MULTIQC ############echo"Beginning MultiQC on trimmed FastQC..."echo""multiqc${output_dir}\--cl-config "sp: { fastp: { fn: '*report.json' } }"\--interactive \-o ${output_dir}echo""echo"MultiQC on trimmed FastQs complete."echo""############ END MULTIQC ############echo"Removing FastQC zip files."echo""rm${output_dir}/*.zipecho"FastQC zip files removed."echo""