After working with all of this Olympia oyster genome sequencing data, I remembered that we had an old, singular PacBio SMRT cell file (from June 2013). This file didn’t seem to be included in any recent assemblies of Sean’s or mine. This is most likely because we have it in the PacBio H5 format and not in FASTQ.
I installed PacBio’s pbh5tools on my computer (swoose), converted the file and moved it to owl/nightingales/O_lurida
<code>python bash5tools.py /mnt/owl/nightingales/O_lurida/m130619_081336_42134_c100525122550000001823081109281326_s1_p0.bas.h5 --outType fastq </code>
I generated an MD5 checksum and appended to the checksums.md5
file in /owl/nightingales/O_lurida using the following command:
<code>md5sum m130619_081336_42134_c100525122550000001823081109281326_s1_p0.fastq | awk '{print $2 " = " $1}' >> checksums.md5</code>
The command above pipes the output to awk to format the output to match the existing format of the checksums.md5
file (i.e. filename = hash).
I’ve also updated our [Nightingales spreadsheet (Google Sheet)(https://docs.google.com/spreadsheets/d/1_XqIOPVHSBVGscnjzDSWUeRL7HUHXfaHxVzec-I-8Xk/edit?usp=sharing) to reflect this.
Will generate updated PacBio assemblies with Canu and/or Racon.