Genome Annotation - P.generosa v1.0 Assembly Using BLASTn for BlobToolKit on Mox

To continue towards getting our Panopea generosa (Pacific geoduck) genome assembly (v1.0) analyzed with BlobToolKit, per this GitHub Issue, I’ve decided to run each aspect of the pipeline manually, as I continue to have issues utilizing the automatic pipeline. As such, I’ve run BLASTn according to the BlobToolKit “Getting Started” guide on Mox.

SBATCH script (GitHub):

#!/bin/bash
## Job Name
#SBATCH --job-name=20210415_pgen_blastn-nt_Panopea-generosa-v1.0
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=10-00:00:00
## Memory per node
#SBATCH --mem=120G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20210415_pgen_blastn-nt_Panopea-generosa-v1.0


### BLASTn of P.generosa genome assembly Panopea-generosa-v1.0.fa
### against NCBI nt database.
### In preparation for use in BlobTools2


###################################################################################
# These variables need to be set by user

# Set number of CPUs to use
threads=40

# Input/output files
fasta="/gscratch/srlab/sam/data/P_generosa/genomes/Panopea-generosa-v1.0.fa"
blast_db="/gscratch/srlab/blastdbs/20210401_ncbi_nt/nt"

# Programs
blastn="/gscratch/srlab/programs/ncbi-blast-2.10.1+/bin/blastn"


# Programs associative array
declare -A programs_array
programs_array=(
[blastn]="${blastn}"
)


###################################################################################

# Exit script if any command fails
set -e


# Run BLASTn with custom format/settings for use in blobtools2
${programs_array[blastn]} \
-db ${blast_db} \
-query ${fasta} \
-outfmt "6 qseqid staxids bitscore std" \
-max_target_seqs 10 \
-max_hsps 1 \
-evalue 1e-25 \
-num_threads ${threads} \
-out Panopea-generosa-v1.0_blobtools2_blast.out


###################################################################################

# Capture program options
echo "Logging program options..."
for program in "${!programs_array[@]}"
do
	{
  echo "Program options for ${program}: "
	echo ""
  # Handle samtools help menus
  if [[ "${program}" == "samtools_index" ]] \
  || [[ "${program}" == "samtools_sort" ]] \
  || [[ "${program}" == "samtools_view" ]]
  then
    ${programs_array[$program]}

  # Handle DIAMOND BLAST menu
  elif [[ "${program}" == "diamond" ]]; then
    ${programs_array[$program]} help

  # Handle NCBI BLASTx menu
  elif [[ "${program}" == "blastx" ]] \
  || [[ "${program}" == "blastn" ]]; then
    ${programs_array[$program]} -help
  fi
	${programs_array[$program]} -h
	echo ""
	echo ""
	echo "----------------------------------------------"
	echo ""
	echo ""
} &>> program_options.log || true

  # If MultiQC is in programs_array, copy the config file to this directory.
  if [[ "${program}" == "multiqc" ]]; then
  	cp --preserve ~/.multiqc_config.yaml multiqc_config.yaml
  fi
done

# Document programs in PATH (primarily for program version ID)
{
  date
  echo ""
  echo "System PATH for $SLURM_JOB_ID"
  echo ""
  printf "%0.s-" {1..10}
  echo "${PATH}" | tr : \\n
} >> system_path.log

echo "Finished logging system PATH"

RESULTS

Runtime wasn’t too bad; just a bit over 6hrs:

BLASTn runtime

Output folder:

After DIAMOND BLASTx and minimap2 alignments are complete, I’ll get this info imported into the BlobToolKit viewer.