Continuing to organizing files for a manuscript dealing with the geoduck genome assembly/annotation we’ve done, we decided to rename the files as well as rename the scaffolds, to make the naming consistent and a bit easier to read (both for humans and computers).
Currently, most of the GFF and BED files are named something like:
A couple of other files (like the assembly FastA) have names like this:
The scaffolds within each of the files are named like so:
We want the filenames to look like this:
We want the scaffold names to look like this:
I processed all of the necessary files and documented in the following Jupyter Notebook (GitHub):
Uploaded files to Gannet folder (linked above) and to the Open Science Foundation repository for this project.