Continuation of genome annotation of the Olympia oyster genome. Determined initial gene models using MAKER with two rounds of SNAP, relabeled with more user-friendly names, and then performed protein-level annotations using BLASTp. Next, I’m going to run InterProScan5 (IPS5) to help functionally characterize the O.lurida proteins ID’d by MAKER. Once this is complete, I’ll use MAKER to incorporate the IPS5 and BLASTp results into a much more neatly (i.e. human-readable) annotated genome!
THE IPS5 analysis was performed on Mox with the following SBATCH script:
#!/bin/bash
## Job Name
#SBATCH --job-name=interproscan
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=15-00:00:00
## Memory per node
#SBATCH --mem=120G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190108_oly_maker_interproscan
# Load Python Mox module for Python module availability
module load intel-python3_2017
# Load Open MPI module for parallel, multi-node processing
module load icc_19-ompi_3.1.2
# SegFault fix?
export THREADS_DAEMON_MODEL=1
# Document programs in PATH (primarily for program version ID)
date >> system_path.log
echo "" >> system_path.log
echo "System PATH for $SLURM_JOB_ID" >> system_path.log
echo "" >> system_path.log
printf "%0.s-" {1..10} >> system_path.log
echo ${PATH} | tr : \\n >> system_path.log
# Variables
interproscan=/gscratch/srlab/programs/interproscan-5.31-70.0/interproscan.sh
maker_prot_fasta=/gscratch/scrubbed/samwhite/outputs/20190108_oly_maker_id_mapping/20181127_oly_genome_snap02.all.maker.proteins.renamed.fasta
# Run InterProScan 5
## disable-precalc since this requires external database access (which Mox does not allow)
${interproscan} \
--input ${maker_prot_fasta} \
--goterms \
--output-file-base 20190108_oly_maker_proteins_ips \
--disable-precalc
RESULTS
Took a decent amount of time to run (~6.5hrs). Requesting both nodes didn’t make a difference because the program was unable to see two nodes, so only utilized a single node for processing. However, I did modify the config file to request much more than the default number of threads (cores) and set the maximum memory to 120GB.
Output folder:
Tab-separated file (24MB):
GFF file (60MB):
XML file (150MB):
InterProScan Properties file (i.e. config file):
OK, so I should be able to use the TSV file, along with the previous BLASTp annotations to make a nice, functionally annotated genome! Will get this next step running on Mox!