Working on the CEABIGR project, I was preparing to make a gene expression file to use in CIRCOS (GitHub Issue) when I realized that the Ballgown gene expression file (CSV; GitHub) had more genes than the C.virginica genes BED file we were using. After some sleuthing, I discovered that the discrepancy was caused by the lack of pseudogenes in the genes BED file I was using. Although it might not really have any impact on things, I thought it would still be prudent to have a BED file that completely matched all of the genes in the Ballgown gene expression file. Plus, having the pseudogenes might be of longterm usefulness if we we ever decide to evalute the role of long non-coding RNAs (lncRNAs) in this project.

So, I created a new BED file containing genes and pseudogenes.

It’s all documented in the following Jupyter Notebook:

GitHub: 20220926_cvir_gff-to-bed-genes_and_pseudogenes.ipynb
NBviewer: 20220926_cvir_gff-to-bed-genes_and_pseudogenes.ipynb

RESULTS

Alrighty, doing that we now have a BED file with gene names that matches all the genes in the Ballgown gene expression file!

Output folder:

20220926-cvir-gff-to-bed-genes_and_pseudogenes/

BED file
- 20220926-cvir-gff-to-bed-genes_and_pseudogenes/20220926-cvir-GCF_002022765.2-genes-and-pseudogenes.bed (1.9MB)