In the final GFF from our GenSAS Pgenerosa_v074.a4 annotation, we noticed that there were no repeat motifs/sequences identified on Scaffold 01. The remaining scaffolds all had repeat motifs present on them, so something seemed amiss (see this GitHub Issue for more info).
I ended up contacting GenSAS and it turned out there was a bug on their end that led to this issue:
Taein Lee Nov 26, 2019, 7:27 PM (8 days ago) to me, jhumann
Hi Sam,
Thank you so much for your report. There was a bug and it has been fixed. Your gff3 files has been re-generated.
-Taein From: gensas-admin on behalf of sam white Sent: Tuesday, November 26, 2019 3:45 PM To: gensas-admin; jhumann; taein_lee Subject: [Website feedback] Merged GFF missing repeats on only one chromosome
Sam (https://www.gensas.org/user/1671) sent a message using the contact form at https://www.gensas.org/gensas-contact.
Hi,
I generated a merged GFF after I “published” my annotation. I included RepeatModeler features in the merged GFF.
My genome has 18 chromosomes. All of them except one chromosome (name: PGA_scaffold1__77_contigs__length_89643857) has the expected repeats annotations present.
I looked at the individual RepeatMasker and RepeatModeler jobs, and both of those GFFs identified repeats on PGA_scaffold1__77_contigs__length_89643857.
Would you happen to have any ideas on why PGA_scaffold1__77_contigs__length_89643857 isn’t showing any repeat features in the merged GFF?>
This is for my project Pgenerosa_v074.
Thanks for any insight!
Sam
So, now that I have the updated, final GFF, I want to re-run the GFF splitting into separate feature files, as well as counts and sequence length stats for all features (including repeats).
Everything is documented in this Jupyter Notebook (GitHub):
RESULTS
Output folder:
I’ve copied/pasted the summary data for each of the GFFs that were analyzed, for quick reference.
I’ll double-check numbers and update the manuscript as needed. Also, all files will be uploaded to the OSF repository for this paper
Feature | Count |
---|---|
CDS | 236960 |
exon | 236960 |
gene | 34947 |
mRNA | 38326 |
repeat_region | 1676544 |
rRNA | 8 |
tRNA | 16889 |
Panopea-generosa-v1.0.a4.gene.gff3
-------------------------
mean 10811.04461
min 166.00000
median 4464.00000
max 283066.00000
Panopea-generosa-v1.0.a4.mRNA.gff3
-------------------------
mean 12903.649559
min 166.000000
median 5453.000000
max 283066.000000
Panopea-generosa-v1.0.a4.tRNA.gff3
-------------------------
mean 74.807745
min 53.000000
median 75.000000
max 314.000000
Panopea-generosa-v1.0.a4.rRNA.gff3
-------------------------
mean 117.125
min 108.000
median 115.000
max 138.000
Panopea-generosa-v1.0.a4.repeat_region.gff3
-------------------------
mean 212.244974
min 6.000000
median 149.000000
max 10981.000000
Panopea-generosa-v1.0.a4.exon.gff3
-------------------------
mean 201.476988
min 3.000000
median 133.000000
max 13221.000000
Panopea-generosa-v1.0.a4.CDS.gff3
-------------------------
mean 201.476988
min 3.000000
median 133.000000
max 13221.000000
Panopea-generosa-v1.0.a4.repeats.LTR.gff3
-------------------------
percent 0.25
sum 2315583.00
mean 711.83
min 11.00
median 323.00
max 6541.00
Panopea-generosa-v1.0.a4.repeats.RC.gff3
-------------------------
percent 0.03
sum 258182.00
mean 429.59
min 13.00
median 464.00
max 674.00
Panopea-generosa-v1.0.a4.repeats.Simple_repeat.gff3
-------------------------
percent 0.55
sum 5138701.00
mean 258.71
min 6.00
median 124.00
max 5981.00
Panopea-generosa-v1.0.a4.repeats.DNA.gff3
-------------------------
percent 1.01
sum 9497156.00
mean 409.48
min 11.00
median 248.00
max 7012.00
Panopea-generosa-v1.0.a4.repeats.SINE.gff3
-------------------------
percent 0.72
sum 6737909.00
mean 156.23
min 11.00
median 165.00
max 934.00
Panopea-generosa-v1.0.a4.repeats.LINE.gff3
-------------------------
percent 3.19
sum 30035624.00
mean 395.53
min 11.00
median 226.00
max 6604.00
Panopea-generosa-v1.0.a4.repeats.Unknown.gff3
-------------------------
percent 32.04
sum 3.018520e+08
mean 1.998300e+02
min 1.100000e+01
median 1.450000e+02
max 1.098100e+04
-------------------------
Repeats composition of genome (percent): 37.79