Data Wrangling - Splitting BAM by Size for Upload to OSF

We’re in the process of organizing files for a manuscript dealing with the geoduck genome assembly/annotation we’ve done. As part of that, we need the Stringtie BAM file that was used with GenSAS for Pgenerosa_v074 annotation to upload to the Open Science Foundation repository for this project. Unfortunately, at 73GB, the file far exceeds the individual file size limit for OSF (5GB). So, I split it into 5GB chunks. See the following notebook for deets:

Jupyter Notebook (GitHub):


  1. Use Bash command split to split the file into desired chunk sizes

  2. Reassemble chunks into full size BAM using the Bash cat command.

  3. Run md5sum on original BAM and reassembled BAM to confirm the two files are the same.


Output folder:

Will upload split files to OSF repository.