Clarified with Steven an approach for tackling multi-condition comparisons (see this GitHub Issue). As such, I need to have individual transcript abundances for each sample from the 2020 Genewiz RNAseq data before I can proceed. So, I ran salmon (v1.2.1) to perform an alignment-free set of transcript abundances. It’s ridiculously fast, btw…
This was run on Mox and used the C.bairdi-specific reads that were extracted using MEGAN6 on 202020330.
SBATCH script (GitHub):
#!/bin/bash
## Job Name
#SBATCH --job-name=cbai_DEG_basic
## Allocation Definition
#SBATCH --account=coenv
#SBATCH --partition=coenv
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=10-00:00:00
## Memory per node
#SBATCH --mem=120G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20200429_cbai_salmon_2020GW_transcript_abundances
# Script to generate set of transcript abundances for all C.bairdi Genewiz 2020 data.
#
# C.bairdi-specific reads were extracted with MEGAN6:
# https://robertslab.github.io/sams-notebook/posts/2020/2020-03-30-RNAseq-Reads-Extractions---C.bairdi-Taxonomic-Reads-Extractions-with-MEGAN6-on-swoose/
#
# Transcriptome was produced here: https://robertslab.github.io/sams-notebook/posts/2020/2020-03-30-Transcriptome-Assembly---C.bairdi-with-MEGAN6-Taxonomy-specific-Reads-with-Trinity-on-Mox/
# Transcriptome is the same as: cbai_transcriptome_v1.5.fasta
#
# Salmon index generated during a previous gene expression analysis:
# https://robertslab.github.io/sams-notebook/posts/2020/2020-04-22-Gene-Expression---C.bairdi-Pairwise-DEG-Comparisons-with-2019-RNAseq-using-Trinity-Salmon-EdgeR-on-Mox/
###################################################################################################################
# BEGIN USER SETTINGS
# Programs array
declare -A programs_array
programs_array=([salmon]="/gscratch/srlab/programs/salmon-1.2.1_linux_x86_64/bin/salmon")
## Designate input files and locations
fastq_dir="/gscratch/srlab/sam/data/C_bairdi/RNAseq/"
salmon_index="/gscratch/srlab/sam/data/C_bairdi/transcriptomes/20200408.C_bairdi.megan.Trinity.fasta.salmon.idx"
# Set number of CPU threads
# Salmon default is 56 threads - so not needed
# threads=28
# END USER SETTINGS
####################################################################################################################
# Exit script if any command fails
set -e
# Load Python Mox module for Python module availability
module load intel-python3_2017
# Document programs in PATH (primarily for program version ID)
{
date
echo ""
echo "System PATH for $SLURM_JOB_ID"
echo ""
printf "%0.s-" {1..10}
echo "${PATH}" | tr : \\n
} >> system_path.log
# Caputure working directory
#wd="$(pwd)"
# Capture program options
## NOTE: This particular instance is specific to salmon!
for program in "${!programs_array[@]}"
do
{
echo "Program options for ${programs_array[$program]}: "
echo ""
${programs_array[$program]} quant --help
echo ""
${programs_array[$program]} quant --help-reads
echo ""
echo "----------------------------------------------"
echo ""
echo ""
} &>> program_options.log || true
done
# Populate array with FastQ files
reads_array=("${fastq_dir}"*.fq)
# Loop through read pairs
# Increment by 2 to process next pair of FastQ files
for (( i=0; i<${#reads_array[@]} ; i+=2 ))
do
# Create list of FastQ files used
{
echo "${reads_array[i]}"
echo "${reads_array[i+1]}"
} >> fastq-list.txt
# Strip path and save just sample number
# Expects sample name to be like:
# 20200413.C_bairdi.359.D12.infected.ambient.megan_R2.fq
# Will pull out '359'
sample=$(echo ${reads_array[i]##*/} | awk -F"." '{print $3}')
# Run salmon
# Library type (stranded or not) is set to auto (A)
${programs_array[salmon]} quant \
--index ${salmon_index} \
--libType A \
--validateMappings \
--output "${sample}"_quant \
-1 ${reads_array[i]} \
-2 ${reads_array[i+1]}
done
RESULTS
Extremely fast, ~3mins:
Output folder:
Here are links to the individual quants files: