We’ve produced a number of C.bairdi transcriptomes and we’re interested in doing some comparisons to try to determine which one might be “best”. I previously compared the BUSCO scores of each of these transcriptomes and now will be using the DETONATE software package to perform two different types of comparisons: compared to a reference (REF-EVAL) and determine an overall quality “score” (RSEM-EVAL). I’ll be running REF-EVAL in this notebook.
A link to the paper is here and explains both:
I opted to just “quickly” run through REF-EVAL, as it only requires (minimally) transcriptome FastAs for performing assembly comparisons; although it probably provides more accurate comparisons if you generate a “true assembly” first. However, this can’t always be done if using a publicly available transcriptome which doesn’t have the corresponding FastQ files used for the assembly.
Since this is designed to compare a transcriptome assembly to a “reference” transcriptome of a related species, I did a couple of things:
Downloaded Japanese blue crab (Portunus trituberculatus) transcriptome from NCBI (accession: GFFJ01.1).
Downloaded green crab (Carcinus maenas) transcriptome from NCBI (accession: GBXE01.1).
Downloaded and assembled Portunus trituberculatus NCBI SRA RNAseq data.
I then compared all of our assemblies to each other, to the three “reference” sequences, and the three “reference” sequences to our C.bairdi assemblies (i.e. using our assemblies as the “reference” in the comparison).
This job was run on Mox.
SBATCH script (GitHub):
#!/bin/bash
## Job Name
#SBATCH --job-name=cbai_detonate_transcriptome_comparisons
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=8-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20200529_cbai_detonate_transcriptome_comparisons
###################################################################################
# These variables need to be set by user
# Array of the various comparisons to evaluate
# Each condition in each comparison should be separated by a "-"
transcriptomes_array=(
cbai_transcriptome_v1.0.fa \
cbai_transcriptome_v1.5.fa \
cbai_transcriptome_v1.6.fa \
cbai_transcriptome_v1.7.fa \
cbai_transcriptome_v2.0.fa \
cbai_transcriptome_v3.0.fa \
20200526.P_trituberculatus.Trinity.fa \
GFFJ01.1.fa \
GBXE01.1.fa
)
###################################################################################
# Exit script if any command fails
set -e
# Load Python Mox module for Python module availability
module load intel-python3_2017
# Document programs in PATH (primarily for program version ID)
{
date
echo ""
echo "System PATH for $SLURM_JOB_ID"
echo ""
printf "%0.s-" {1..10}
echo "${PATH}" | tr : \\n
} >> system_path.log
threads=28
#programs
pblat="/gscratch/srlab/programs/pblat-2.1/pblat"
detonate="/gscratch/srlab/programs/detonate-1.11/ref-eval/ref-eval"
# Determine length of transcriptomes array
transcriptomes_array_length=${#transcriptomes_array[@]}
# Loop through each comparison
for (( i=0; i < transcriptomes_array_length; i++ ))
do
transcriptome1="${transcriptomes_array[$i]}"
# Capture FastA checksums for verification
echo "Generating checksum for ${transcriptome1}"
md5sum "${transcriptome1}" >> fasta.checksums.md5
echo "Finished generating checksum for ${transcriptome1}"
echo ""
for (( j=0; j < transcriptomes_array_length; j++ ))
do
# Don't run comparison of the same transcriptome
if [[ "${transcriptomes_array[$j]}" != "${transcriptomes_array[$i]}" ]]; then
transcriptome2="${transcriptomes_array[$j]}"
comparison1="${transcriptome1}-vs-${transcriptome2}"
comparison2="${transcriptome2}-vs-${transcriptome1}"
# Check if pblat output files are not present
if [[ ! -f "${comparison1}.psl" ]] && [[ -f "${comparison2}".psl ]]; then
# Run pblat
echo "Starting pblat: ${comparison1}"
${pblat} -minIdentity=80 -threads=${threads} "${transcriptome2}" "${transcriptome1}" "${comparison1}".psl
echo "Finished pblat: ${comparison1}"
echo ""
echo "Starting pblat: ${comparison2}"
${pblat} -minIdentity=80 -threads=${threads} "${transcriptome1}" "${transcriptome2}" "${comparison2}".psl
echo "Finished pblat: ${comparison2}"
echo ""
fi
# Check if DETONATE output file exists
if [[ ! -f "${comparison1}.scores.txt" ]]; then
# Run ref-eval, unweighted scores only
echo "Running DETONATE with ${transcriptome1} and ${transcriptome2}."
${detonate} \
--scores=nucl,pair,contig \
--weighted=no \
--A-seqs "${transcriptome1}" \
--B-seqs "${transcriptome2}" \
--A-to-B "${comparison1}".psl \
--B-to-A "${comparison2}".psl \
| tee "${comparison1}.scores.txt"
echo "Finished DETONATE with ${transcriptome1} and ${transcriptome2}."
echo ""
fi
fi
done
done
RESULTS
Took a bit over 3 days to complete:
NOTE: There are two jobs associated with this because I had to modify SBATCH script after initial run to account for missed reference comparisons.
Well, I ran this and I am completely unsure what it all means. Hah! And, ugh! It was easy enough to setup and run, so it didn’t require too much effort. However, even after reading the paper, it’s still not clear if this has any real value; particularly since the paper focuses on evaluating de novo assemblies. The evaluation to a reference transcriptome seems like it’s an aside. However, the two sets of software are packaged together as DETONATE, so I figured I’d run them both.
All the files are linked below for anyone curious enough to spend time trying to figure out if these results mean anything…
Output folder:
Reference: cbai_transcriptome_v1.0
cbai_transcriptome_v1.5.fa-vs-cbai_transcriptome_v1.0.fa.scores.txt
cbai_transcriptome_v1.6.fa-vs-cbai_transcriptome_v1.0.fa.scores.txt
cbai_transcriptome_v1.7.fa-vs-cbai_transcriptome_v1.0.fa.scores.txt
cbai_transcriptome_v2.0.fa-vs-cbai_transcriptome_v1.0.fa.scores.txt
cbai_transcriptome_v3.0.fa-vs-cbai_transcriptome_v1.0.fa.scores.txt
Reference: cbai_transcriptome_v1.5
cbai_transcriptome_v1.0.fa-vs-cbai_transcriptome_v1.5.fa.scores.txt
cbai_transcriptome_v1.6.fa-vs-cbai_transcriptome_v1.5.fa.scores.txt
cbai_transcriptome_v1.7.fa-vs-cbai_transcriptome_v1.5.fa.scores.txt
cbai_transcriptome_v2.0.fa-vs-cbai_transcriptome_v1.5.fa.scores.txt
cbai_transcriptome_v3.0.fa-vs-cbai_transcriptome_v1.5.fa.scores.txt
Reference: cbai_transcriptome_v1.6
cbai_transcriptome_v1.0.fa-vs-cbai_transcriptome_v1.6.fa.scores.txt
cbai_transcriptome_v1.5.fa-vs-cbai_transcriptome_v1.6.fa.scores.txt
cbai_transcriptome_v1.7.fa-vs-cbai_transcriptome_v1.6.fa.scores.txt
cbai_transcriptome_v2.0.fa-vs-cbai_transcriptome_v1.6.fa.scores.txt
cbai_transcriptome_v3.0.fa-vs-cbai_transcriptome_v1.6.fa.scores.txt
Reference: cbai_transcriptome_v1.7
cbai_transcriptome_v1.0.fa-vs-cbai_transcriptome_v1.7.fa.scores.txt
cbai_transcriptome_v1.5.fa-vs-cbai_transcriptome_v1.7.fa.scores.txt
cbai_transcriptome_v1.6.fa-vs-cbai_transcriptome_v1.7.fa.scores.txt
cbai_transcriptome_v2.0.fa-vs-cbai_transcriptome_v1.7.fa.scores.txt
cbai_transcriptome_v3.0.fa-vs-cbai_transcriptome_v1.7.fa.scores.txt
Reference: cbai_transcriptome_v2.0
cbai_transcriptome_v1.0.fa-vs-cbai_transcriptome_v2.0.fa.scores.txt
cbai_transcriptome_v1.5.fa-vs-cbai_transcriptome_v2.0.fa.scores.txt
cbai_transcriptome_v1.6.fa-vs-cbai_transcriptome_v2.0.fa.scores.txt
cbai_transcriptome_v1.7.fa-vs-cbai_transcriptome_v2.0.fa.scores.txt
cbai_transcriptome_v3.0.fa-vs-cbai_transcriptome_v2.0.fa.scores.txt
Reference: cbai_transcriptome_v3.0
cbai_transcriptome_v1.0.fa-vs-cbai_transcriptome_v3.0.fa.scores.txt
cbai_transcriptome_v1.5.fa-vs-cbai_transcriptome_v3.0.fa.scores.txt
cbai_transcriptome_v1.6.fa-vs-cbai_transcriptome_v3.0.fa.scores.txt
cbai_transcriptome_v1.7.fa-vs-cbai_transcriptome_v3.0.fa.scores.txt
cbai_transcriptome_v2.0.fa-vs-cbai_transcriptome_v3.0.fa.scores.txt
Reference: 20200526.P_trituberculatus
GBXE01.1.fa-vs-20200526.P_trituberculatus.Trinity.fa.scores.txt
GFFJ01.1.fa-vs-20200526.P_trituberculatus.Trinity.fa.scores.txt
cbai_transcriptome_v1.0.fa-vs-20200526.P_trituberculatus.Trinity.fa.scores.txt
cbai_transcriptome_v1.5.fa-vs-20200526.P_trituberculatus.Trinity.fa.scores.txt
cbai_transcriptome_v1.6.fa-vs-20200526.P_trituberculatus.Trinity.fa.scores.txt
cbai_transcriptome_v1.7.fa-vs-20200526.P_trituberculatus.Trinity.fa.scores.txt
cbai_transcriptome_v2.0.fa-vs-20200526.P_trituberculatus.Trinity.fa.scores.txt
cbai_transcriptome_v3.0.fa-vs-20200526.P_trituberculatus.Trinity.fa.scores.txt