Data Wrangling - Rename Pgenerosa_v074 Files and Scaffolds

Continuing to organizing files for a manuscript dealing with the geoduck genome assembly/annotation we’ve done, we decided to rename the files as well as rename the scaffolds, to make the naming consistent and a bit easier to read (both for humans and computers).

Currently, most of the GFF and BED files are named something like:

  • Panopea-generosa-vv0.74.a4.rRNA.gff3

A couple of other files (like the assembly FastA) have names like this:

  • Pgenerosa_v074.fa

The scaffolds within each of the files are named like so:

  • PGA_scaffold18__69_contigs__length_27737463

We want the filenames to look like this:

  • Panopea-generosa-v1.0

We want the scaffold names to look like this:

  • Scaffold_01

I processed all of the necessary files and documented in the following Jupyter Notebook (GitHub):


RESULTS

Output folder:

Uploaded files to Gannet folder (linked above) and to the Open Science Foundation repository for this project.