Steven asked to subset the Pgenerosa_v070.fa (2.1GB) in this GitHub Issue #705. In that issue, it was determined that a significant portion of the sequencing data that was assembled by Phase Genomics clustered in “scaffolds” 1 - 18. As such, Steven asked to subset just those 18 scaffolds.
This was done by using the samtools faidx
program.
Process is documented in the following Jupyter Notebook (GitHub):
RESULTS
Output folder:
FastA (914MB):
FastA Index: - 20190625_pgen_v070_scaffold_subsetting/Pgenerosa_v070.18.fa.fai