Finally got around to running FastQC on Yaamini’s RNAseq and WGBS sequencing data recieved on 20210528.

SBATCH script (GitHub):

0210629_cvir_fastqc_yaamini_rnaseq-wgbs.sh

#!/bin/bash
## Job Name
#SBATCH --job-name=20210629_cvir_fastqc_yaamini_rnaseq-wgbs
## Allocation Definition
#SBATCH --account=coenv
#SBATCH --partition=coenv
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=5-00:00:00
## Memory per node
#SBATCH --mem=200G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20210629_cvir_fastqc_yaamini_rnaseq-wgbs


### FastQC assessment of raw sequencing from Ronit's ploidy WGBS.


###################################################################################
# These variables need to be set by user

# FastQC output directory
output_dir=$(pwd)

# Set number of CPUs to use
threads=40

# Input/output files
checksums=fastq_checksums.md5
fastq_list=fastq_list.txt
raw_reads_dir=/gscratch/srlab/sam/data/C_gigas/wgbs

# Paths to programs
fastqc=/gscratch/srlab/programs/fastqc_v0.11.9/fastqc
multiqc=/gscratch/srlab/programs/anaconda3/bin/multiqc


# Programs associative array
declare -A programs_array
programs_array=(
[fastqc]="${fastqc}" \
[multiqc]="${multiqc}"
)

###################################################################################

# Exit script if any command fails
set -e

# Load Python Mox module for Python module availability
module load intel-python3_2017

# Sync raw FastQ files to working directory
rsync --archive --verbose \
/gscratch/scrubbed/samwhite/data/C_virginica/RNAseq/*.fastq.gz .

rsync --archive --verbose \
/gscratch/scrubbed/samwhite/data/C_virginica/DNAseq/*.fq.gz .

# Populate array with FastQ files
fastq_array=(*.f*q.gz)

# Pass array contents to new variable
fastqc_list=$(echo "${fastq_array[*]}")

# Run FastQC
# NOTE: Do NOT quote ${fastqc_list}
${programs_array[fastqc]} \
--threads ${threads} \
--outdir ${output_dir} \
${fastqc_list}


# Create list of fastq files used in analysis
echo "${fastqc_list}" | tr " " "\n" >> ${fastq_list}

# Generate checksums for reference
while read -r line
do

    # Generate MD5 checksums for each input FastQ file
    echo "Generating MD5 checksum for ${line}."
    md5sum "${line}" >> "${checksums}"
    echo "Completed: MD5 checksum for ${line}."
    echo ""

    # Remove fastq files from working directory
    echo "Removing ${line} from directory"
    rm "${line}"
    echo "Removed ${line} from directory"
    echo ""
done < ${fastq_list}

# Run MultiQC
${programs_array[multiqc]} .


# Capture program options
for program in "${!programs_array[@]}"
do
    {
  echo "Program options for ${program}: "
    echo ""
  # Handle samtools help menus
  if [[ "${program}" == "samtools_index" ]] \
  || [[ "${program}" == "samtools_sort" ]] \
  || [[ "${program}" == "samtools_view" ]]
  then
    ${programs_array[$program]}
  fi
    ${programs_array[$program]} -h
    echo ""
    echo ""
    echo "----------------------------------------------"
    echo ""
    echo ""
} &>> program_options.log || true

  # If MultiQC is in programs_array, copy the config file to this directory.
  if [[ "${program}" == "multiqc" ]]; then
    cp --preserve ~/.multiqc_config.yaml .
  fi
done


# Document programs in PATH (primarily for program version ID)
{
date
echo ""
echo "System PATH for $SLURM_JOB_ID"
echo ""
printf "%0.s-" {1..10}
echo "${PATH}" | tr : \\n
} >> system_path.log

RESULTS

Runtime was ~3.75hrs:

Output folder:

20210629_cvir_fastqc_yaamini_rnaseq-wgbs/
- MultiQC Report (HTML; open in browser)
  - 20210629_cvir_fastqc_yaamini_rnaseq-wgbs/multiqc_report.html

When viewing the MultiQC report, remember that there are two different sequencing projects here: RNAseq and WGBS. This explains why there appear to be such drastic differences between two “groups” of files in the report; because there are!

Will get this info added to Nightingales (Google Sheet).

RESULTS

MultiQC Report (HTML; open in browser)