Code
library(reticulate)Sam White
March 5, 2026
Steven asked me to test out his workflow-annotation pipeline (GitHub) after Hannah encountered an issue yesterday and we implemented a potential fix (GitHub Issue).
The pipeline was designed to take nucleotide sequences all the way through gene ontology annotation.
After some back-and-forth attempts at troubleshooting, I resorted to asking the Gemini 3 Flash (preview) gAI agent. The suggestion was the lack of an User-Agent in the URL request header was causing the 403 permission denied error Hannah (and I) encountered.
I implemented the suggested changes and tested them.
As input, I used a subset of a FastA file provided by Hannah (see code below).
Be sure to deactivate any current Python/Conda environments!
deactivate
and/or
conda deactivate
Specifically, sets Conda paths/environments to be used throughout this notebook.
This was run on my laptop, so the paths are specific to my system. Adjust as needed for your system.
# CONDA
conda_env_name <- c("/home/sam/programs/miniforge3/envs/sr320-workflow-annotation")
conda_path <- c("/home/sam/programs/miniforge3/bin/conda")
# WORKING DIRECTORY
working_dir <- c("/home/sam/gitrepos/RobertsLab/sams-notebook/posts/2026/2026-03-05-Software-Testing---Stevens-workflow-annotation-Pipeline/")
Sys.setenv(working_dir = working_dir)If this is successful, the first line of output should show that the Python being used is the one in your workflow-annotation conda environment path.
E.g.
python: /home/sam/programs/miniforge3/envs/sr320-workflow-annotation/bin/python
python: /home/sam/programs/miniforge3/envs/sr320-workflow-annotation/bin/python
libpython: /home/sam/programs/miniforge3/envs/sr320-workflow-annotation/lib/libpython3.11.so
pythonhome: /home/sam/programs/miniforge3/envs/sr320-workflow-annotation:/home/sam/programs/miniforge3/envs/sr320-workflow-annotation
version: 3.11.15 | packaged by conda-forge | (main, Mar 5 2026, 16:45:40) [GCC 14.3.0]
numpy: /home/sam/programs/miniforge3/envs/sr320-workflow-annotation/lib/python3.11/site-packages/numpy
numpy_version: 2.4.2
NOTE: Python version was forced by use_python() function
The full FastA was not utilized. I created a subset for quick testing.
cd "/home/sam/gitrepos/RobertsLab/sams-notebook/posts/2026/2026-03-05-Software-Testing---Stevens-workflow-annotation-Pipeline/"
git clone git@github.com:kubu4/workflow-annotation.git
cd workflow-annotation
wget --quiet --continue https://raw.githubusercontent.com/hannahnowers/Seastar-capstone/refs/heads/main/data/derm-protein.fa>g7941.t1
MQYLPFRGVLCVFWGFLLIETFRPANAGDFALVADLRNGTIYAGSIGQSLADIAPLPLTG
VIRPLAVEYDPVEKMVYWTDVNSLPSPKITRAHVNGSGQMTLVDQLHLPDGLALDVESRL
VYWTDGVLGYIGRTRMNGTGARETIVVGLDQPRAIITDSGFIYWTDWGNSSRIERAGLDG
SNRTTLITGNLVWPNGLFKDGNNLYWCDAKLDKIERSDLLGNNREIVIDLTSYPQIHPFD
LAVYDEYIYWTDWGYTTLIRVHTSGRGEQNYGPSVFQQSGGLHIQKEPNYCNSSPCQNGA
ICIDVINGFSCICPSEHQGITRSENPSGSGPCVNGGTCTTIPGGFTCQCPAGYDGPTCTI
EFIVKKRDREIAGKCKRLH*
>g7942.t1
MVVQVTKTSRDYGKINPELLQNLLDERLRDTFQTEDYPNLLYETYESITTSVLDEICPVT
TRVRTVKPRLPWYDNTIQEERRIPRRLERNWRKSRLDTDYDAFLTQENNV*
>g7943.t1
MKSTSSAILCLFGLFGLGYCGPCETLDPCENGAQCLDFDSIPGYFFCYCPYTFYGTRCEN
RNAACDDNPCMFGGTCLVFNERYECECPSGIYGNHCQANGCANDPCMNGGTCWPFGFSYT
CICSPGYRGENCDE*
>g7944.t1
MATIPIDTCKRHASTKSAETVGVKHLRRRIQTTGPLTIADYMREVLTNPLTGYYMNKDVF
GNKGDFITSPEISQMFGELIALWIIHEWTQLGQMTPLQVVELGPGRGTLADDMLRVFKQF
QHITGNSLSLHLVEVSPKMSQLQEEKLTGQQQSKTDMDQPSVSQDSPAGRCEGDTVAERG
SLSSAYKTSISKTGMPVSWYQSLKEVPKGVSCFIAHEFFDALPIHQFKKTEKGWREVLVD
VDSDDAGSNHLRFVLSPAATPASTVYPQGSDSRGQIEVCPEGGVIVQEMAHRISEHGGMS
LIVDYGHDGTKTDTLRGFKKHKLHDVLCEPGTADLTADVDFSYFRQVVQGKVSTHGPITQ
ESFLKAMGIETRLKMLLKSATGEQRNSLITGCRMLTDPAQMGERFKFFAMMPLNQGTESG
QAKARTPAGFPSVC*
cd "/home/sam/gitrepos/RobertsLab/sams-notebook/posts/2026/2026-03-05-Software-Testing---Stevens-workflow-annotation-Pipeline/workflow-annotation" && \
export PATH="/home/sam/programs/miniforge3/envs/sr320-workflow-annotation/bin:$PATH" && \
./blast2slim.sh \
-i subset-derm-protein.fa \
--diamond \
--protein \
> blast_run.log 2>&1[INFO] Output directory: output/run_20260310_164133
[INFO] Running DIAMOND BLASTP...
diamond v2.1.23.177 (C) Max Planck Society for the Advancement of Science, Benjamin J. Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
#CPU threads: 40
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: output/run_20260310_164133
#Target sequences to report alignments for: 1
Opening the database... [0.101s]
Database: blastdb/uniprot_sprot.dmnd (type: Diamond database, sequences: 574627, letters: 208482574)
Block size = 2000000000
Opening the input file... [0s]
Opening the output file... [0s]
Loading query sequences... [0s]
Masking queries... [0.014s]
Algorithm: Double-indexed
Building query histograms... [0.006s]
Seeking in database... [0s]
Loading reference sequences... [0.565s]
Masking reference... [3.266s]
Initializing temporary storage... [0s]
Building reference histograms... [2.46s]
Allocating buffers... [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4.
Building reference seed array... [0.838s]
Building query seed array... [0.007s]
Computing hash join... [0.055s]
Masking low complexity seeds... [0.009s]
Searching alignments... [0.006s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4.
Building reference seed array... [1.104s]
Building query seed array... [0.008s]
Computing hash join... [0.056s]
Masking low complexity seeds... [0.006s]
Searching alignments... [0.005s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 3/4.
Building reference seed array... [1.153s]
Building query seed array... [0.009s]
Computing hash join... [0.058s]
Masking low complexity seeds... [0.003s]
Searching alignments... [0.004s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 4/4.
Building reference seed array... [0.817s]
Building query seed array... [0.006s]
Computing hash join... [0.056s]
Masking low complexity seeds... [0.003s]
Searching alignments... [0.005s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 1/4.
Building reference seed array... [0.859s]
Building query seed array... [0.009s]
Computing hash join... [0.054s]
Masking low complexity seeds... [0.003s]
Searching alignments... [0.005s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 2/4.
Building reference seed array... [1.106s]
Building query seed array... [0.006s]
Computing hash join... [0.049s]
Masking low complexity seeds... [0.004s]
Searching alignments... [0.003s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 3/4.
Building reference seed array... [1.038s]
Building query seed array... [0.007s]
Computing hash join... [0.05s]
Masking low complexity seeds... [0.006s]
Searching alignments... [0.007s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 4/4.
Building reference seed array... [0.786s]
Building query seed array... [0.007s]
Computing hash join... [0.055s]
Masking low complexity seeds... [0.005s]
Searching alignments... [0.011s]
Deallocating memory... [0s]
Deallocating buffers... [0.036s]
Clearing query masking... [0s]
Computing alignments... [0.098s]
Deallocating reference... [0.017s]
Loading reference sequences... [0s]
Deallocating buffers... [0s]
Deallocating queries... [0s]
Loading query sequences... [0s]
Closing the input file... [0s]
Closing the output file... [0s]
Closing the database... [0.005s]
Cleaning up... [0s]
Total time = 14.878s
Reported 2 pairwise alignments, 2 HSPs.
2 queries aligned.
deps ok
output/run_20260310_164133/go-basic.obo: fmt(1.2) rel(2026-01-23) 42,036 Terms; optional_attrs(relationship)
output/run_20260310_164133/goslim_generic.obo: fmt(1.2) rel(go/2026-01-23/subsets/goslim_generic.owl) 207 Terms; optional_attrs(relationship)
[DONE] Wrote:
- output/run_20260310_164133/annotation_full_go.tsv
- output/run_20260310_164133/annotation_with_goslim.tsv
[INFO] Summary stats: 2/2 BLAST hits, 2 GO annotations, 2 GO-Slim mappings
[INFO] Generated GO-Slim chart: goslim_chart.png
[INFO] Generated summary report: summary.md
[OK] All done. See output/run_20260310_164133/annotation_with_goslim.tsv and output/run_20260310_164133/summary.md
output/run_20260306_113912:
total 31M
-rw-rw-r-- 1 sam sam 339 Mar 6 11:39 subset-derm-protein.blast.tsv
-rw-rw-r-- 1 sam sam 9.0K Mar 6 11:39 postprocess_uniprot_go.py
-rw-rw-r-- 1 sam sam 2.8K Mar 6 11:39 annotation_full_go.tsv
-rw-rw-r-- 1 sam sam 31M Mar 6 11:39 go-basic.obo
-rw-rw-r-- 1 sam sam 122K Mar 6 11:39 goslim_generic.obo
-rw-rw-r-- 1 sam sam 3.2K Mar 6 11:39 annotation_with_goslim.tsv
-rw-rw-r-- 1 sam sam 296 Mar 6 11:39 summary_stats.json
-rw-rw-r-- 1 sam sam 51K Mar 6 11:39 goslim_chart.png
-rw-rw-r-- 1 sam sam 1.2K Mar 6 11:39 summary.md
output/run_20260306_150323:
total 31M
-rw-rw-r-- 1 sam sam 339 Mar 6 15:03 subset-derm-protein.blast.tsv
-rw-rw-r-- 1 sam sam 9.0K Mar 6 15:03 postprocess_uniprot_go.py
-rw-rw-r-- 1 sam sam 2.8K Mar 6 15:03 annotation_full_go.tsv
-rw-rw-r-- 1 sam sam 31M Mar 6 15:03 go-basic.obo
-rw-rw-r-- 1 sam sam 122K Mar 6 15:03 goslim_generic.obo
-rw-rw-r-- 1 sam sam 3.2K Mar 6 15:03 annotation_with_goslim.tsv
-rw-rw-r-- 1 sam sam 296 Mar 6 15:03 summary_stats.json
-rw-rw-r-- 1 sam sam 51K Mar 6 15:03 goslim_chart.png
-rw-rw-r-- 1 sam sam 1.2K Mar 6 15:03 summary.md
output/run_20260310_164133:
total 31M
-rw-rw-r-- 1 sam sam 339 Mar 10 16:41 subset-derm-protein.blast.tsv
-rw-rw-r-- 1 sam sam 9.0K Mar 10 16:41 postprocess_uniprot_go.py
-rw-rw-r-- 1 sam sam 2.8K Mar 10 16:41 annotation_full_go.tsv
-rw-rw-r-- 1 sam sam 31M Mar 10 16:41 go-basic.obo
-rw-rw-r-- 1 sam sam 122K Mar 10 16:41 goslim_generic.obo
-rw-rw-r-- 1 sam sam 3.2K Mar 10 16:41 annotation_with_goslim.tsv
-rw-rw-r-- 1 sam sam 296 Mar 10 16:41 summary_stats.json
-rw-rw-r-- 1 sam sam 51K Mar 10 16:41 goslim_chart.png
-rw-rw-r-- 1 sam sam 1.2K Mar 10 16:41 summary.md
==> run_20260306_113912/annotation_with_goslim.tsv <==
query accession id reviewed protein_name organism pident length evalue bitscore title go_ids go_bp go_cc go_mf goslim_ids goslim_names
g7941.t1 C0HL13 LRP2_PIG Low-density lipoprotein receptor-related protein 2 (LRP-2) (Glycoprotein 330) (gp330) (Megalin) Sus scrofa (Pig) 28.9 294 2.34e-31 129 sp|C0HL13|LRP2_PIG Low-density lipoprotein receptor-related protein 2 OS=Sus scrofa OX=9823 GN=LRP2 PE=1 SV=1 GO:0001843; GO:0003139; GO:0003148; GO:0003223; GO:0005509; GO:0005905; GO:0006898; GO:0007605; GO:0008584; GO:0009897; GO:0016324; GO:0017124; GO:0030001; GO:0030424; GO:0030425; GO:0030514; GO:0031526; GO:0031904; GO:0043235; GO:0050769; GO:0051087; GO:0060068; GO:0060982; GO:0061156; GO:0070447; GO:0140058; GO:1904447; GO:1905167 coronary artery morphogenesis [GO:0060982]; folate import across plasma membrane [GO:1904447]; male gonad development [GO:0008584]; metal ion transport [GO:0030001]; negative regulation of BMP signaling pathway [GO:0030514]; neural tube closure [GO:0001843]; neuron projection arborization [GO:0140058]; outflow tract septum morphogenesis [GO:0003148]; positive regulation of lysosomal protein catabolic process [GO:1905167]; positive regulation of neurogenesis [GO:0050769]; positive regulation of oligodendrocyte progenitor proliferation [GO:0070447]; pulmonary artery morphogenesis [GO:0061156]; receptor-mediated endocytosis [GO:0006898]; secondary heart field specification [GO:0003139]; sensory perception of sound [GO:0007605]; vagina development [GO:0060068]; ventricular compact myocardium morphogenesis [GO:0003223] apical plasma membrane [GO:0016324]; axon [GO:0030424]; brush border membrane [GO:0031526]; clathrin-coated pit [GO:0005905]; dendrite [GO:0030425]; endosome lumen [GO:0031904]; external side of plasma membrane [GO:0009897]; receptor complex [GO:0043235] calcium ion binding [GO:0005509]; protein-folding chaperone binding [GO:0051087]; SH3 domain binding [GO:0017124] GO:0016192; GO:0022414; GO:0048856; GO:0050877; GO:0055085 vesicle-mediated transport; reproductive process; anatomical structure development; nervous system process; transmembrane transport
g7944.t1 Q7L592 NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial (EC 2.1.1.320) (NADH dehydrogenase [ubiquinone] complex I, assembly factor 7) (Protein midA homolog) Homo sapiens (Human) 51.8 415 8.529999999999999e-141 412 sp|Q7L592|NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial OS=Homo sapiens OX=9606 GN=NDUFAF7 PE=1 SV=1 GO:0005615; GO:0005739; GO:0005759; GO:0008168; GO:0019899; GO:0019918; GO:0032981; GO:0035243 mitochondrial respiratory chain complex I assembly [GO:0032981]; peptidyl-arginine methylation, to symmetrical-dimethyl arginine [GO:0019918] extracellular space [GO:0005615]; mitochondrial matrix [GO:0005759]; mitochondrion [GO:0005739] enzyme binding [GO:0019899]; methyltransferase activity [GO:0008168]; protein-arginine omega-N symmetric methyltransferase activity [GO:0035243] GO:0003824; GO:0005615; GO:0005739; GO:0016740; GO:0043226; GO:0065003; GO:0140096 catalytic activity; extracellular space; mitochondrion; transferase activity; organelle; protein-containing complex assembly; catalytic activity, acting on a protein
==> run_20260306_113912/postprocess_uniprot_go.py <==
import sys, os, time, json, math
import pandas as pd
import requests
from pathlib import Path
BLAST_TSV = sys.argv[1]
OUTDIR = sys.argv[2]
os.makedirs(OUTDIR, exist_ok=True)
hits = pd.read_csv(BLAST_TSV, sep='\t', header=None,
==> run_20260306_113912/summary_stats.json <==
{"total_sequences": 2, "blast_hits": 2, "go_matches": 2, "goslim_matches": 2, "goslim_counts": {"vesicle-mediated transport": 1, "reproductive process": 1, "anatomical structure development": 1, "nervous system process": 1, "transmembrane transport": 1, "protein-containing complex assembly": 1}}
==> run_20260306_113912/subset-derm-protein.blast.tsv <==
g7941.t1 sp|C0HL13|LRP2_PIG 28.9 294 2.34e-31 129 sp|C0HL13|LRP2_PIG Low-density lipoprotein receptor-related protein 2 OS=Sus scrofa OX=9823 GN=LRP2 PE=1 SV=1
g7944.t1 sp|Q7L592|NDUF7_HUMAN 51.8 415 8.53e-141 412 sp|Q7L592|NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial OS=Homo sapiens OX=9606 GN=NDUFAF7 PE=1 SV=1
==> run_20260306_113912/summary.md <==
# Annotation Summary Report
## Job Information
- **Input file**: subset-derm-protein.fa
- **Start time**: 2026-03-06 11:39:12
- **End time**: 2026-03-06 11:39:39
- **Duration**: 0h 0m 27s
- **CPUs used**: 40
- **Tool**: DIAMOND BLASTP (protein)
==> run_20260306_113912/goslim_generic.obo <==
format-version: 1.2
data-version: go/releases/2026-01-23/subsets/goslim_generic.owl
subsetdef: chebi_ph7_3 "Rhea list of ChEBI terms representing the major species at pH 7.3."
subsetdef: gocheck_do_not_annotate "Term not to be used for direct annotation"
subsetdef: gocheck_obsoletion_candidate "Terms planned for obsoletion"
subsetdef: goslim_agr "AGR slim"
subsetdef: goslim_aspergillus "Aspergillus GO slim"
subsetdef: goslim_candida "Candida GO slim"
subsetdef: goslim_chembl "ChEMBL protein targets summary"
subsetdef: goslim_drosophila "Drosophila GO slim"
==> run_20260306_113912/go-basic.obo <==
format-version: 1.2
data-version: releases/2026-01-23
subsetdef: chebi_ph7_3 "Rhea list of ChEBI terms representing the major species at pH 7.3."
subsetdef: gocheck_do_not_annotate "Term not to be used for direct annotation"
subsetdef: gocheck_obsoletion_candidate "Terms planned for obsoletion"
subsetdef: goslim_agr "AGR slim"
subsetdef: goslim_aspergillus "Aspergillus GO slim"
subsetdef: goslim_candida "Candida GO slim"
subsetdef: goslim_chembl "ChEMBL protein targets summary"
subsetdef: goslim_drosophila "Drosophila GO slim"
==> run_20260306_113912/annotation_full_go.tsv <==
query accession pident length evalue bitscore title id Reviewed protein_name organism go_ids go_bp go_cc go_mf
g7941.t1 C0HL13 28.9 294 2.34e-31 129 sp|C0HL13|LRP2_PIG Low-density lipoprotein receptor-related protein 2 OS=Sus scrofa OX=9823 GN=LRP2 PE=1 SV=1 LRP2_PIG reviewed Low-density lipoprotein receptor-related protein 2 (LRP-2) (Glycoprotein 330) (gp330) (Megalin) Sus scrofa (Pig) GO:0001843; GO:0003139; GO:0003148; GO:0003223; GO:0005509; GO:0005905; GO:0006898; GO:0007605; GO:0008584; GO:0009897; GO:0016324; GO:0017124; GO:0030001; GO:0030424; GO:0030425; GO:0030514; GO:0031526; GO:0031904; GO:0043235; GO:0050769; GO:0051087; GO:0060068; GO:0060982; GO:0061156; GO:0070447; GO:0140058; GO:1904447; GO:1905167 coronary artery morphogenesis [GO:0060982]; folate import across plasma membrane [GO:1904447]; male gonad development [GO:0008584]; metal ion transport [GO:0030001]; negative regulation of BMP signaling pathway [GO:0030514]; neural tube closure [GO:0001843]; neuron projection arborization [GO:0140058]; outflow tract septum morphogenesis [GO:0003148]; positive regulation of lysosomal protein catabolic process [GO:1905167]; positive regulation of neurogenesis [GO:0050769]; positive regulation of oligodendrocyte progenitor proliferation [GO:0070447]; pulmonary artery morphogenesis [GO:0061156]; receptor-mediated endocytosis [GO:0006898]; secondary heart field specification [GO:0003139]; sensory perception of sound [GO:0007605]; vagina development [GO:0060068]; ventricular compact myocardium morphogenesis [GO:0003223] apical plasma membrane [GO:0016324]; axon [GO:0030424]; brush border membrane [GO:0031526]; clathrin-coated pit [GO:0005905]; dendrite [GO:0030425]; endosome lumen [GO:0031904]; external side of plasma membrane [GO:0009897]; receptor complex [GO:0043235] calcium ion binding [GO:0005509]; protein-folding chaperone binding [GO:0051087]; SH3 domain binding [GO:0017124]
g7944.t1 Q7L592 51.8 415 8.529999999999999e-141 412 sp|Q7L592|NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial OS=Homo sapiens OX=9606 GN=NDUFAF7 PE=1 SV=1 NDUF7_HUMAN reviewed Protein arginine methyltransferase NDUFAF7, mitochondrial (EC 2.1.1.320) (NADH dehydrogenase [ubiquinone] complex I, assembly factor 7) (Protein midA homolog) Homo sapiens (Human) GO:0005615; GO:0005739; GO:0005759; GO:0008168; GO:0019899; GO:0019918; GO:0032981; GO:0035243 mitochondrial respiratory chain complex I assembly [GO:0032981]; peptidyl-arginine methylation, to symmetrical-dimethyl arginine [GO:0019918] extracellular space [GO:0005615]; mitochondrial matrix [GO:0005759]; mitochondrion [GO:0005739] enzyme binding [GO:0019899]; methyltransferase activity [GO:0008168]; protein-arginine omega-N symmetric methyltransferase activity [GO:0035243]
==> run_20260306_150323/annotation_with_goslim.tsv <==
query accession id reviewed protein_name organism pident length evalue bitscore title go_ids go_bp go_cc go_mf goslim_ids goslim_names
g7941.t1 C0HL13 LRP2_PIG Low-density lipoprotein receptor-related protein 2 (LRP-2) (Glycoprotein 330) (gp330) (Megalin) Sus scrofa (Pig) 28.9 294 2.34e-31 129 sp|C0HL13|LRP2_PIG Low-density lipoprotein receptor-related protein 2 OS=Sus scrofa OX=9823 GN=LRP2 PE=1 SV=1 GO:0001843; GO:0003139; GO:0003148; GO:0003223; GO:0005509; GO:0005905; GO:0006898; GO:0007605; GO:0008584; GO:0009897; GO:0016324; GO:0017124; GO:0030001; GO:0030424; GO:0030425; GO:0030514; GO:0031526; GO:0031904; GO:0043235; GO:0050769; GO:0051087; GO:0060068; GO:0060982; GO:0061156; GO:0070447; GO:0140058; GO:1904447; GO:1905167 coronary artery morphogenesis [GO:0060982]; folate import across plasma membrane [GO:1904447]; male gonad development [GO:0008584]; metal ion transport [GO:0030001]; negative regulation of BMP signaling pathway [GO:0030514]; neural tube closure [GO:0001843]; neuron projection arborization [GO:0140058]; outflow tract septum morphogenesis [GO:0003148]; positive regulation of lysosomal protein catabolic process [GO:1905167]; positive regulation of neurogenesis [GO:0050769]; positive regulation of oligodendrocyte progenitor proliferation [GO:0070447]; pulmonary artery morphogenesis [GO:0061156]; receptor-mediated endocytosis [GO:0006898]; secondary heart field specification [GO:0003139]; sensory perception of sound [GO:0007605]; vagina development [GO:0060068]; ventricular compact myocardium morphogenesis [GO:0003223] apical plasma membrane [GO:0016324]; axon [GO:0030424]; brush border membrane [GO:0031526]; clathrin-coated pit [GO:0005905]; dendrite [GO:0030425]; endosome lumen [GO:0031904]; external side of plasma membrane [GO:0009897]; receptor complex [GO:0043235] calcium ion binding [GO:0005509]; protein-folding chaperone binding [GO:0051087]; SH3 domain binding [GO:0017124] GO:0016192; GO:0022414; GO:0048856; GO:0050877; GO:0055085 vesicle-mediated transport; reproductive process; anatomical structure development; nervous system process; transmembrane transport
g7944.t1 Q7L592 NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial (EC 2.1.1.320) (NADH dehydrogenase [ubiquinone] complex I, assembly factor 7) (Protein midA homolog) Homo sapiens (Human) 51.8 415 8.529999999999999e-141 412 sp|Q7L592|NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial OS=Homo sapiens OX=9606 GN=NDUFAF7 PE=1 SV=1 GO:0005615; GO:0005739; GO:0005759; GO:0008168; GO:0019899; GO:0019918; GO:0032981; GO:0035243 mitochondrial respiratory chain complex I assembly [GO:0032981]; peptidyl-arginine methylation, to symmetrical-dimethyl arginine [GO:0019918] extracellular space [GO:0005615]; mitochondrial matrix [GO:0005759]; mitochondrion [GO:0005739] enzyme binding [GO:0019899]; methyltransferase activity [GO:0008168]; protein-arginine omega-N symmetric methyltransferase activity [GO:0035243] GO:0003824; GO:0005615; GO:0005739; GO:0016740; GO:0043226; GO:0065003; GO:0140096 catalytic activity; extracellular space; mitochondrion; transferase activity; organelle; protein-containing complex assembly; catalytic activity, acting on a protein
==> run_20260306_150323/postprocess_uniprot_go.py <==
import sys, os, time, json, math
import pandas as pd
import requests
from pathlib import Path
BLAST_TSV = sys.argv[1]
OUTDIR = sys.argv[2]
os.makedirs(OUTDIR, exist_ok=True)
hits = pd.read_csv(BLAST_TSV, sep='\t', header=None,
==> run_20260306_150323/summary_stats.json <==
{"total_sequences": 2, "blast_hits": 2, "go_matches": 2, "goslim_matches": 2, "goslim_counts": {"vesicle-mediated transport": 1, "reproductive process": 1, "anatomical structure development": 1, "nervous system process": 1, "transmembrane transport": 1, "protein-containing complex assembly": 1}}
==> run_20260306_150323/subset-derm-protein.blast.tsv <==
g7941.t1 sp|C0HL13|LRP2_PIG 28.9 294 2.34e-31 129 sp|C0HL13|LRP2_PIG Low-density lipoprotein receptor-related protein 2 OS=Sus scrofa OX=9823 GN=LRP2 PE=1 SV=1
g7944.t1 sp|Q7L592|NDUF7_HUMAN 51.8 415 8.53e-141 412 sp|Q7L592|NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial OS=Homo sapiens OX=9606 GN=NDUFAF7 PE=1 SV=1
==> run_20260306_150323/summary.md <==
# Annotation Summary Report
## Job Information
- **Input file**: subset-derm-protein.fa
- **Start time**: 2026-03-06 15:03:23
- **End time**: 2026-03-06 15:03:42
- **Duration**: 0h 0m 19s
- **CPUs used**: 40
- **Tool**: DIAMOND BLASTP (protein)
==> run_20260306_150323/goslim_generic.obo <==
format-version: 1.2
data-version: go/releases/2026-01-23/subsets/goslim_generic.owl
subsetdef: chebi_ph7_3 "Rhea list of ChEBI terms representing the major species at pH 7.3."
subsetdef: gocheck_do_not_annotate "Term not to be used for direct annotation"
subsetdef: gocheck_obsoletion_candidate "Terms planned for obsoletion"
subsetdef: goslim_agr "AGR slim"
subsetdef: goslim_aspergillus "Aspergillus GO slim"
subsetdef: goslim_candida "Candida GO slim"
subsetdef: goslim_chembl "ChEMBL protein targets summary"
subsetdef: goslim_drosophila "Drosophila GO slim"
==> run_20260306_150323/go-basic.obo <==
format-version: 1.2
data-version: releases/2026-01-23
subsetdef: chebi_ph7_3 "Rhea list of ChEBI terms representing the major species at pH 7.3."
subsetdef: gocheck_do_not_annotate "Term not to be used for direct annotation"
subsetdef: gocheck_obsoletion_candidate "Terms planned for obsoletion"
subsetdef: goslim_agr "AGR slim"
subsetdef: goslim_aspergillus "Aspergillus GO slim"
subsetdef: goslim_candida "Candida GO slim"
subsetdef: goslim_chembl "ChEMBL protein targets summary"
subsetdef: goslim_drosophila "Drosophila GO slim"
==> run_20260306_150323/annotation_full_go.tsv <==
query accession pident length evalue bitscore title id Reviewed protein_name organism go_ids go_bp go_cc go_mf
g7941.t1 C0HL13 28.9 294 2.34e-31 129 sp|C0HL13|LRP2_PIG Low-density lipoprotein receptor-related protein 2 OS=Sus scrofa OX=9823 GN=LRP2 PE=1 SV=1 LRP2_PIG reviewed Low-density lipoprotein receptor-related protein 2 (LRP-2) (Glycoprotein 330) (gp330) (Megalin) Sus scrofa (Pig) GO:0001843; GO:0003139; GO:0003148; GO:0003223; GO:0005509; GO:0005905; GO:0006898; GO:0007605; GO:0008584; GO:0009897; GO:0016324; GO:0017124; GO:0030001; GO:0030424; GO:0030425; GO:0030514; GO:0031526; GO:0031904; GO:0043235; GO:0050769; GO:0051087; GO:0060068; GO:0060982; GO:0061156; GO:0070447; GO:0140058; GO:1904447; GO:1905167 coronary artery morphogenesis [GO:0060982]; folate import across plasma membrane [GO:1904447]; male gonad development [GO:0008584]; metal ion transport [GO:0030001]; negative regulation of BMP signaling pathway [GO:0030514]; neural tube closure [GO:0001843]; neuron projection arborization [GO:0140058]; outflow tract septum morphogenesis [GO:0003148]; positive regulation of lysosomal protein catabolic process [GO:1905167]; positive regulation of neurogenesis [GO:0050769]; positive regulation of oligodendrocyte progenitor proliferation [GO:0070447]; pulmonary artery morphogenesis [GO:0061156]; receptor-mediated endocytosis [GO:0006898]; secondary heart field specification [GO:0003139]; sensory perception of sound [GO:0007605]; vagina development [GO:0060068]; ventricular compact myocardium morphogenesis [GO:0003223] apical plasma membrane [GO:0016324]; axon [GO:0030424]; brush border membrane [GO:0031526]; clathrin-coated pit [GO:0005905]; dendrite [GO:0030425]; endosome lumen [GO:0031904]; external side of plasma membrane [GO:0009897]; receptor complex [GO:0043235] calcium ion binding [GO:0005509]; protein-folding chaperone binding [GO:0051087]; SH3 domain binding [GO:0017124]
g7944.t1 Q7L592 51.8 415 8.529999999999999e-141 412 sp|Q7L592|NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial OS=Homo sapiens OX=9606 GN=NDUFAF7 PE=1 SV=1 NDUF7_HUMAN reviewed Protein arginine methyltransferase NDUFAF7, mitochondrial (EC 2.1.1.320) (NADH dehydrogenase [ubiquinone] complex I, assembly factor 7) (Protein midA homolog) Homo sapiens (Human) GO:0005615; GO:0005739; GO:0005759; GO:0008168; GO:0019899; GO:0019918; GO:0032981; GO:0035243 mitochondrial respiratory chain complex I assembly [GO:0032981]; peptidyl-arginine methylation, to symmetrical-dimethyl arginine [GO:0019918] extracellular space [GO:0005615]; mitochondrial matrix [GO:0005759]; mitochondrion [GO:0005739] enzyme binding [GO:0019899]; methyltransferase activity [GO:0008168]; protein-arginine omega-N symmetric methyltransferase activity [GO:0035243]
==> run_20260310_164133/annotation_with_goslim.tsv <==
query accession id reviewed protein_name organism pident length evalue bitscore title go_ids go_bp go_cc go_mf goslim_ids goslim_names
g7941.t1 C0HL13 LRP2_PIG Low-density lipoprotein receptor-related protein 2 (LRP-2) (Glycoprotein 330) (gp330) (Megalin) Sus scrofa (Pig) 28.9 294 2.34e-31 129 sp|C0HL13|LRP2_PIG Low-density lipoprotein receptor-related protein 2 OS=Sus scrofa OX=9823 GN=LRP2 PE=1 SV=1 GO:0001843; GO:0003139; GO:0003148; GO:0003223; GO:0005509; GO:0005905; GO:0006898; GO:0007605; GO:0008584; GO:0009897; GO:0016324; GO:0017124; GO:0030001; GO:0030424; GO:0030425; GO:0030514; GO:0031526; GO:0031904; GO:0043235; GO:0050769; GO:0051087; GO:0060068; GO:0060982; GO:0061156; GO:0070447; GO:0140058; GO:1904447; GO:1905167 coronary artery morphogenesis [GO:0060982]; folate import across plasma membrane [GO:1904447]; male gonad development [GO:0008584]; metal ion transport [GO:0030001]; negative regulation of BMP signaling pathway [GO:0030514]; neural tube closure [GO:0001843]; neuron projection arborization [GO:0140058]; outflow tract septum morphogenesis [GO:0003148]; positive regulation of lysosomal protein catabolic process [GO:1905167]; positive regulation of neurogenesis [GO:0050769]; positive regulation of oligodendrocyte progenitor proliferation [GO:0070447]; pulmonary artery morphogenesis [GO:0061156]; receptor-mediated endocytosis [GO:0006898]; secondary heart field specification [GO:0003139]; sensory perception of sound [GO:0007605]; vagina development [GO:0060068]; ventricular compact myocardium morphogenesis [GO:0003223] apical plasma membrane [GO:0016324]; axon [GO:0030424]; brush border membrane [GO:0031526]; clathrin-coated pit [GO:0005905]; dendrite [GO:0030425]; endosome lumen [GO:0031904]; external side of plasma membrane [GO:0009897]; receptor complex [GO:0043235] calcium ion binding [GO:0005509]; protein-folding chaperone binding [GO:0051087]; SH3 domain binding [GO:0017124] GO:0016192; GO:0022414; GO:0048856; GO:0050877; GO:0055085 vesicle-mediated transport; reproductive process; anatomical structure development; nervous system process; transmembrane transport
g7944.t1 Q7L592 NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial (EC 2.1.1.320) (NADH dehydrogenase [ubiquinone] complex I, assembly factor 7) (Protein midA homolog) Homo sapiens (Human) 51.8 415 8.529999999999999e-141 412 sp|Q7L592|NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial OS=Homo sapiens OX=9606 GN=NDUFAF7 PE=1 SV=1 GO:0005615; GO:0005739; GO:0005759; GO:0008168; GO:0019899; GO:0019918; GO:0032981; GO:0035243 mitochondrial respiratory chain complex I assembly [GO:0032981]; peptidyl-arginine methylation, to symmetrical-dimethyl arginine [GO:0019918] extracellular space [GO:0005615]; mitochondrial matrix [GO:0005759]; mitochondrion [GO:0005739] enzyme binding [GO:0019899]; methyltransferase activity [GO:0008168]; protein-arginine omega-N symmetric methyltransferase activity [GO:0035243] GO:0003824; GO:0005615; GO:0005739; GO:0016740; GO:0043226; GO:0065003; GO:0140096 catalytic activity; extracellular space; mitochondrion; transferase activity; organelle; protein-containing complex assembly; catalytic activity, acting on a protein
==> run_20260310_164133/postprocess_uniprot_go.py <==
import sys, os, time, json, math
import pandas as pd
import requests
from pathlib import Path
BLAST_TSV = sys.argv[1]
OUTDIR = sys.argv[2]
os.makedirs(OUTDIR, exist_ok=True)
hits = pd.read_csv(BLAST_TSV, sep='\t', header=None,
==> run_20260310_164133/summary_stats.json <==
{"total_sequences": 2, "blast_hits": 2, "go_matches": 2, "goslim_matches": 2, "goslim_counts": {"vesicle-mediated transport": 1, "reproductive process": 1, "anatomical structure development": 1, "nervous system process": 1, "transmembrane transport": 1, "protein-containing complex assembly": 1}}
==> run_20260310_164133/subset-derm-protein.blast.tsv <==
g7941.t1 sp|C0HL13|LRP2_PIG 28.9 294 2.34e-31 129 sp|C0HL13|LRP2_PIG Low-density lipoprotein receptor-related protein 2 OS=Sus scrofa OX=9823 GN=LRP2 PE=1 SV=1
g7944.t1 sp|Q7L592|NDUF7_HUMAN 51.8 415 8.53e-141 412 sp|Q7L592|NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial OS=Homo sapiens OX=9606 GN=NDUFAF7 PE=1 SV=1
==> run_20260310_164133/summary.md <==
# Annotation Summary Report
## Job Information
- **Input file**: subset-derm-protein.fa
- **Start time**: 2026-03-10 16:41:33
- **End time**: 2026-03-10 16:41:54
- **Duration**: 0h 0m 21s
- **CPUs used**: 40
- **Tool**: DIAMOND BLASTP (protein)
==> run_20260310_164133/goslim_generic.obo <==
format-version: 1.2
data-version: go/releases/2026-01-23/subsets/goslim_generic.owl
subsetdef: chebi_ph7_3 "Rhea list of ChEBI terms representing the major species at pH 7.3."
subsetdef: gocheck_do_not_annotate "Term not to be used for direct annotation"
subsetdef: gocheck_obsoletion_candidate "Terms planned for obsoletion"
subsetdef: goslim_agr "AGR slim"
subsetdef: goslim_aspergillus "Aspergillus GO slim"
subsetdef: goslim_candida "Candida GO slim"
subsetdef: goslim_chembl "ChEMBL protein targets summary"
subsetdef: goslim_drosophila "Drosophila GO slim"
==> run_20260310_164133/go-basic.obo <==
format-version: 1.2
data-version: releases/2026-01-23
subsetdef: chebi_ph7_3 "Rhea list of ChEBI terms representing the major species at pH 7.3."
subsetdef: gocheck_do_not_annotate "Term not to be used for direct annotation"
subsetdef: gocheck_obsoletion_candidate "Terms planned for obsoletion"
subsetdef: goslim_agr "AGR slim"
subsetdef: goslim_aspergillus "Aspergillus GO slim"
subsetdef: goslim_candida "Candida GO slim"
subsetdef: goslim_chembl "ChEMBL protein targets summary"
subsetdef: goslim_drosophila "Drosophila GO slim"
==> run_20260310_164133/annotation_full_go.tsv <==
query accession pident length evalue bitscore title id Reviewed protein_name organism go_ids go_bp go_cc go_mf
g7941.t1 C0HL13 28.9 294 2.34e-31 129 sp|C0HL13|LRP2_PIG Low-density lipoprotein receptor-related protein 2 OS=Sus scrofa OX=9823 GN=LRP2 PE=1 SV=1 LRP2_PIG reviewed Low-density lipoprotein receptor-related protein 2 (LRP-2) (Glycoprotein 330) (gp330) (Megalin) Sus scrofa (Pig) GO:0001843; GO:0003139; GO:0003148; GO:0003223; GO:0005509; GO:0005905; GO:0006898; GO:0007605; GO:0008584; GO:0009897; GO:0016324; GO:0017124; GO:0030001; GO:0030424; GO:0030425; GO:0030514; GO:0031526; GO:0031904; GO:0043235; GO:0050769; GO:0051087; GO:0060068; GO:0060982; GO:0061156; GO:0070447; GO:0140058; GO:1904447; GO:1905167 coronary artery morphogenesis [GO:0060982]; folate import across plasma membrane [GO:1904447]; male gonad development [GO:0008584]; metal ion transport [GO:0030001]; negative regulation of BMP signaling pathway [GO:0030514]; neural tube closure [GO:0001843]; neuron projection arborization [GO:0140058]; outflow tract septum morphogenesis [GO:0003148]; positive regulation of lysosomal protein catabolic process [GO:1905167]; positive regulation of neurogenesis [GO:0050769]; positive regulation of oligodendrocyte progenitor proliferation [GO:0070447]; pulmonary artery morphogenesis [GO:0061156]; receptor-mediated endocytosis [GO:0006898]; secondary heart field specification [GO:0003139]; sensory perception of sound [GO:0007605]; vagina development [GO:0060068]; ventricular compact myocardium morphogenesis [GO:0003223] apical plasma membrane [GO:0016324]; axon [GO:0030424]; brush border membrane [GO:0031526]; clathrin-coated pit [GO:0005905]; dendrite [GO:0030425]; endosome lumen [GO:0031904]; external side of plasma membrane [GO:0009897]; receptor complex [GO:0043235] calcium ion binding [GO:0005509]; protein-folding chaperone binding [GO:0051087]; SH3 domain binding [GO:0017124]
g7944.t1 Q7L592 51.8 415 8.529999999999999e-141 412 sp|Q7L592|NDUF7_HUMAN Protein arginine methyltransferase NDUFAF7, mitochondrial OS=Homo sapiens OX=9606 GN=NDUFAF7 PE=1 SV=1 NDUF7_HUMAN reviewed Protein arginine methyltransferase NDUFAF7, mitochondrial (EC 2.1.1.320) (NADH dehydrogenase [ubiquinone] complex I, assembly factor 7) (Protein midA homolog) Homo sapiens (Human) GO:0005615; GO:0005739; GO:0005759; GO:0008168; GO:0019899; GO:0019918; GO:0032981; GO:0035243 mitochondrial respiratory chain complex I assembly [GO:0032981]; peptidyl-arginine methylation, to symmetrical-dimethyl arginine [GO:0019918] extracellular space [GO:0005615]; mitochondrial matrix [GO:0005759]; mitochondrion [GO:0005739] enzyme binding [GO:0019899]; methyltransferase activity [GO:0008168]; protein-arginine omega-N symmetric methyltransferase activity [GO:0035243]
Looks like the fix worked! Everything is as expected.