I performed the assembly on Hyak (Klone), UW’s high-performance computing cluster. I used our Apptainer (Singularity) container to run the job (srlab-R4.4-bioinformatics-container-fca8ae6.sif), which includes Flye version 2.9.6.
Below is the rendered markdown from 13.1-genome-assembly-siscowet.Rmd. The run took ~7 days.
PRIMARY OUTPUTS
FastA (4.1GB): https://gannet.fish.washington.edu/gitrepos/project-lake-trout/output/13.1-genome-assembly-siscowet/snam-siscowet-pb-flye-assembly.fasta
FastA index: https://gannet.fish.washington.edu/gitrepos/project-lake-trout/output/13.1-genome-assembly-siscowet/snam-siscowet-pb-flye-assembly.fasta.fai
Once I’ve also assembled this genome using hifiasm, I’ll compare the two assemblies using QUAST, which will provide a more comprehensive set of assembly statistics and metrics.
1 BACKGROUND
Use Flye(GitHub) (Kolmogorov et al. 2019) to assemble PacBio reads for S.namaycushsiscowet ecotype.
## python: /srlab/programs/miniforge3-24.7.1-0/envs/flye-2.9.6-env/bin/python
## libpython: /srlab/programs/miniforge3-24.7.1-0/envs/flye-2.9.6-env/lib/libpython3.12.so
## pythonhome: /srlab/programs/miniforge3-24.7.1-0/envs/flye-2.9.6-env:/srlab/programs/miniforge3-24.7.1-0/envs/flye-2.9.6-env
## version: 3.12.13 | packaged by conda-forge | (main, Mar 5 2026, 16:50:00) [GCC 14.3.0]
## numpy: [NOT FOUND]
##
## NOTE: Python version was forced by use_python() function
3 FLYE
# Make output directory, if it doesn't existmkdir--parents"${output_dir}"# Create an array of .fastq.gz filesfastqs=(${data_dir}/*.fastq.gz)# Print the result (optional, for verification)## newline-delimited formatecho"List of FastQs used for assembly:"printf"%s\n""${fastqs[@]}"\|tee"${output_dir}/input_fastqs.txt"# Run Flye assemblyflye\--pacbio-hifi "${fastqs[@]}"\--genome-size "${genome_size}"\--out-dir "${output_dir}"\--threads "${threads}"\>"${output_dir}"/pb-siscowet-assembly.log 2>&1
4 OUTPUTS
4.1 List files
ls-ltrh"${output_dir}"
## total 8.9G
## -rw-r--r-- 1 samwhite all 342 Apr 6 08:08 input_fastqs.txt
## drwxr-xr-x 2 samwhite all 512 Apr 8 11:51 00-assembly
## drwxr-xr-x 2 samwhite all 8.0K Apr 8 15:08 10-consensus
## drwxr-xr-x 2 samwhite all 512 Apr 9 02:23 20-repeat
## drwxr-xr-x 2 samwhite all 512 Apr 9 02:40 30-contigger
## drwxr-xr-x 2 samwhite all 8.0K Apr 9 10:29 40-polishing
## -rw-r--r-- 1 samwhite all 92 Apr 9 10:29 params.json
## -rw-r--r-- 1 samwhite all 55M Apr 9 10:29 assembly_graph.gv
## -rw-r--r-- 1 samwhite all 4.3G Apr 9 10:30 assembly_graph.gfa
## -rw-r--r-- 1 samwhite all 4.2G Apr 9 10:30 assembly.fasta
## -rw-r--r-- 1 samwhite all 4.8M Apr 9 10:30 assembly_info.txt
## -rw-r--r-- 1 samwhite all 3.2K Apr 9 10:30 pb-siscowet-assembly.log
## -rw-r--r-- 1 samwhite all 482M Apr 9 10:30 flye.log
4.2 Rename files
Files have generic “assembly_*” naming.
Renaming to reflect specific species-ecotype-data-assembly_method.
cd"${output_dir}"for file in assembly*domv"${file}"${prefix}-${file}done
## [2026-04-09 10:30:36] root: INFO: Assembly statistics:
##
## Total length: 4336778941
## Fragments: 110158
## Fragments N50: 78564
## Largest frg: 2753427
## Scaffolds: 0
## Mean coverage: 21
##
## [2026-04-09 10:30:36] root: INFO: Final assembly: /mmfs1/gscratch/scrubbed/samwhite/gitrepos/RobertsLab/project-lake-trout/output/13.1-genome-assembly-siscowet/assembly.fasta
Kolmogorov, Mikhail, Jeffrey Yuan, Yu Lin, and Pavel A. Pevzner. 2019. “Assembly of Long, Error-Prone Reads Using Repeat Graphs.” Nature Biotechnology 37 (5): 540–46. https://doi.org/10.1038/s41587-019-0072-8.