As part of annotating cbai_transcriptome_v1.6.fasta from 20200518, I need to run DIAMOND BLASTx to use with Trinotate.
Ran DIAMOND BLASTx against the UniProt/SwissProt database (downloaded 20200123) on Mox.
SBATCH script (GitHub):
## Job Name
#SBATCH --job-name=cbai_blastx_DIAMOND
## Allocation Definition
#SBATCH --account=coenv
#SBATCH --partition=coenv
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=10-00:00:00
## Memory per node
#SBATCH --mem=120G
##turn on e-mail notification
#SBATCH --mail-type=ALL
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20200519_cbai_diamond_blastx_transcriptome_v1.6
### BLASTx of Trinity de novo assembly of all C.bairdi RNAseq data, Arthropoda only:
### cbai_transcriptome_v1.6.fasta
### Includes "descriptor_1" short-hand of: 2020-GW, 2020-UW, 2019, 2018.
# Exit script if any command fails
set -e
# Load Python Mox module for Python module availability
module load intel-python3_2017
# SegFault fix?
# Document programs in PATH (primarily for program version ID)
echo ""
echo "System PATH for $SLURM_JOB_ID"
echo ""
printf "%0.s-" {1..10}
echo "${PATH}" | tr : \\n
} >> system_path.log
# Program paths
# DIAMOND UniProt database
# Trinity assembly (FastA)
# Strip leading path and extensions
no_path=$(echo "${fasta##*/}")
no_ext=$(echo "${no_path%.*}")
# Run DIAMOND with blastx
# Output format 6 produces a standard BLAST tab-delimited file
${diamond} blastx \
--db ${dmnd} \
--query "${fasta}" \
--out "${no_ext}".blastx.outfmt6 \
--outfmt 6 \
--evalue 1e-4 \
--max-target-seqs 1 \
--block-size 15.0 \
--index-chunks 4
As usual, runtime was ridiculously fast: 12 seconds
Output folder:
BLASTx output (outfmt6; text; 1.9MB):