In preparation for a paper we’re writing, we needed some summary stats on the various genome assembly feature sequences. I determined the max/min, mean, and median sequence lengths for all the GFF feature files we currently have for Pgenerosa_v070, Pgenerosa_v070_top18_scaffolds, and Pgenerosa_v074. This info will be compiled in to a table for the manuscript. See our Genomic Resources wiki for more info on GFFs:
- Genomic Resources Wiki (GitHub)
Calculations were performed using Python in a Jupyter Notebook.
Jupyter Notebook (GitHub):
RESULTS
I’ve copied/pasted (and, reordered) the summary data for each of the GFFs that were analyzed, for quick reference. Will get this compiled in to a table of some sort for people to use for the manuscript.
Pgenerosa_v070_top18_scaffolds.CDS.gff
-------------------------
mean 177.728604
min 1.000000
median 123.000000
max 9851.000000
Pgenerosa_v070_top18_scaffolds.exon.gff
-------------------------
mean 219.024085
min 3.000000
median 131.000000
max 16912.000000
Pgenerosa_v070_top18_scaffolds.five_prime_UTR.gff
-------------------------
mean 148.558072
min 1.000000
median 89.000000
max 16912.000000
Pgenerosa_v070_top18_scaffolds.gene.gff
-------------------------
mean 8696.297953
min 180.000000
median 4621.000000
max 139826.000000
Pgenerosa_v070_top18_scaffolds.mRNA.gff
-------------------------
mean 8696.297953
min 180.000000
median 4621.000000
max 139826.000000
Pgenerosa_v070_top18_scaffolds.three_prime_UTR.gff
-------------------------
mean 481.02338
min 1.00000
median 221.00000
max 8355.00000
Pgenerosa_v074.CDS.gff
-------------------------
mean 193.248049
min 1.000000
median 126.000000
max 11772.000000
Pgenerosa_v074.exon.gff
-------------------------
mean 210.612431
min 3.000000
median 130.000000
max 11772.000000
Pgenerosa_v074.five_prime_UTR.gff
-------------------------
mean 62.226496
min 1.000000
median 41.000000
max 1259.000000
Pgenerosa_v074.gene.gff
-------------------------
mean 12170.897196
min 192.000000
median 5552.000000
max 175905.000000
Pgenerosa_v074.mRNA.gff
-------------------------
mean 12170.897196
min 192.000000
median 5552.000000
max 175905.000000
Pgenerosa_v074.three_prime_UTR.gff
-------------------------
mean 516.604585
min 1.000000
median 251.000000
max 4080.000000
Pgenerosa_v070.CDS.gff
-------------------------
mean 194.539909
min 1.000000
median 132.000000
max 11048.000000
Pgenerosa_v070.exon.gff
-------------------------
mean 217.971132
min 3.000000
median 139.000000
max 16912.000000
Pgenerosa_v070.gene.gff
-------------------------
mean 3768.232394
min 177.000000
median 1474.000000
max 139826.000000
Pgenerosa_v070.mRNA.gff
-------------------------
mean 3768.232394
min 177.000000
median 1474.000000
max 139826.000000