I performed the assembly on Hyak (Klone), UW’s high-performance computing cluster. I used our Apptainer (Singularity) container to run the job (srlab-R4.4-bioinformatics-container-fca8ae6.sif), which includes Flye version 2.9.6.
Below is the rendered markdown from 13.1-genome-assembly-lean.Rmd. The run took ~7 days.
PRIMARY OUTPUTS
FastA (3.9GB): https://gannet.fish.washington.edu/gitrepos/project-lake-trout/output/13.1-genome-assembly-lean/snam-lean-pb-flye-assembly.fasta
FastA index: https://gannet.fish.washington.edu/gitrepos/project-lake-trout/output/13.1-genome-assembly-lean/snam-lean-pb-flye-assembly.fasta.fai
Once I’ve also assembled this genome using hifiasm, I’ll compare the two assemblies using QUAST, which will provide a more comprehensive set of assembly statistics and metrics.
13.1-genome-assembly-lean
Sam White 2026-03-26
1 BACKGROUND
Use Flye(GitHub) (Kolmogorov et al. 2019) to assemble PacBio reads for S.namaycushlean ecotype.
library(reticulate)knitr::opts_chunk$set(echo =TRUE, # Display code chunkseval =FALSE, # Evaluate code chunksresults ="hold", # Hold outputs and show them after the full code chunkwarning =FALSE, # Hide warningscollapse =FALSE, # Keep code and output in separate blockswarning =FALSE, # Hide warningsmessage =FALSE, # Hide messagescomment ="##"# Prefix output lines with '##' so output is visually distinct)
## python: /srlab/programs/miniforge3-24.7.1-0/envs/flye-2.9.6-env/bin/python
## libpython: /srlab/programs/miniforge3-24.7.1-0/envs/flye-2.9.6-env/lib/libpython3.12.so
## pythonhome: /srlab/programs/miniforge3-24.7.1-0/envs/flye-2.9.6-env:/srlab/programs/miniforge3-24.7.1-0/envs/flye-2.9.6-env
## version: 3.12.13 | packaged by conda-forge | (main, Mar 5 2026, 16:50:00) [GCC 14.3.0]
## numpy: [NOT FOUND]
##
## NOTE: Python version was forced by use_python() function
# Make output directory, if it doesn't existmkdir--parents"${output_dir}"# Create an array of .fastq.gz filesfastqs=(${data_dir}/*.fastq.gz)# Print the result (optional, for verification)## newline-delimited formatecho"List of FastQs used for assembly:"printf"%s\n""${fastqs[@]}"\|tee"${output_dir}/input_fastqs.txt"# Run Flye assemblyflye\--pacbio-hifi "${fastqs[@]}"\--genome-size "${genome_size}"\--out-dir "${output_dir}"\--threads "${threads}"\>"${output_dir}"/pb-lean-assembly.log 2>&1
3 OUTPUTS
3.1 List files
ls-ltrh"${output_dir}"
## total 8.2G
## -rw-r--r-- 1 samwhite all 326 Mar 26 11:02 input_fastqs.txt
## drwxr-xr-x 2 samwhite all 512 Mar 30 07:44 00-assembly
## drwxr-xr-x 2 samwhite all 8.0K Mar 30 12:12 10-consensus
## drwxr-xr-x 2 samwhite all 512 Mar 31 08:22 20-repeat
## drwxr-xr-x 2 samwhite all 512 Mar 31 08:33 30-contigger
## drwxr-xr-x 2 samwhite all 8.0K Mar 31 17:25 40-polishing
## -rw-r--r-- 1 samwhite all 92 Mar 31 17:25 params.json
## -rw-r--r-- 1 samwhite all 53M Mar 31 17:25 assembly_graph.gv
## -rw-r--r-- 1 samwhite all 4.0G Mar 31 17:25 assembly_graph.gfa
## -rw-r--r-- 1 samwhite all 3.9G Mar 31 17:25 assembly.fasta
## -rw-r--r-- 1 samwhite all 4.8M Mar 31 17:25 assembly_info.txt
## -rw-r--r-- 1 samwhite all 3.2K Mar 31 17:25 pb-lean-assembly.log
## -rw-r--r-- 1 samwhite all 338M Mar 31 17:25 flye.log
## -rw-r--r-- 1 samwhite all 3.9M Apr 2 10:05 assembly.fasta.fai
3.2 Rename files
Files have generic “assembly_*” naming.
Renaming to reflect specific species-ecotype-data-assembly_method.
cd"${output_dir}"for file in assembly*domv"${file}"${prefix}-${file}done
## [2026-03-31 17:25:45] root: INFO: Assembly statistics:
##
## Total length: 4046369763
## Fragments: 112562
## Fragments N50: 74514
## Largest frg: 2170611
## Scaffolds: 0
## Mean coverage: 15
##
## [2026-03-31 17:25:45] root: INFO: Final assembly: /mmfs1/gscratch/scrubbed/samwhite/gitrepos/RobertsLab/project-lake-trout/output/13.1-genome-assembly-lean/assembly.fasta
Kolmogorov, Mikhail, Jeffrey Yuan, Yu Lin, and Pavel A. Pevzner. 2019. “Assembly of Long, Error-Prone Reads Using Repeat Graphs.” Nature Biotechnology 37 (5): 540–46. https://doi.org/10.1038/s41587-019-0072-8.