INTRO
Hollie Putnams’ lab sequenced the P.tuahiniensis genome using PacBio Revio long read sequencing, as part of Genohub Project 6470522. She asked that we also download the data and host on our server(s) to ensure additional backups were available (GitHub Issue).
MATERIALS & METHODS
Data was transferred from the Genohub Amazon S3 bucket using rclone (website). Rclone was first configured for Amazon S3 with the bucket ID and secret key provided by Genohub. Then, I created an alias in the rclone configuration with the same bucket ID.
Rclone was run on Owl in the intended destination directory (`/volume1/web/nightingales/P_tuahiniensis/genohub6470522/):
rclone copy \
--checksum \
--progress \
--drive-shared-with-me \
genohub:genohub6470522 .Since no checksums were provided, I generated MD5 checksums for any future data transfers:
$ for file in *.bam*; do md5sum ${file} | tee --append ${file}.md5; done
04596b06637ec92de0cb432c835a4fa8 m84100_251021_203206_s3.hifi_reads.bam
2a237da20370845fb87aa711b00cd945 m84100_251021_203206_s3.hifi_reads.bam.pbiRESULTS
Data is available on Owl:
https://owl.fish.washington.edu/nightingales/P_tuahiniensis/genohub6470522/
The BAM files generated by PacBio can be converted to FastQ files using the bam2fastq program in the pbtk (GitHub repo) package.