Continuing to organizing files for a manuscript dealing with the geoduck genome assembly/annotation we’ve done, we decided to rename the files as well as rename the scaffolds, to make the naming consistent and a bit easier to read (both for humans and computers).
Currently, most of the GFF and BED files are named something like:
- Panopea-generosa-vv0.74.a4.rRNA.gff3
A couple of other files (like the assembly FastA) have names like this:
- Pgenerosa_v074.fa
The scaffolds within each of the files are named like so:
- PGA_scaffold18__69_contigs__length_27737463
We want the filenames to look like this:
- Panopea-generosa-v1.0
We want the scaffold names to look like this:
- Scaffold_01
I processed all of the necessary files and documented in the following Jupyter Notebook (GitHub):
RESULTS
Output folder:
Uploaded files to Gannet folder (linked above) and to the Open Science Foundation repository for this project.