After annotating Pgenerosa_v074 on 20190701, we noticed a large discrepancy in the number of transcripts that MAKER identified, compared to Pgenerosa_v070. As a reminder, the Pgenerosa_v074 is a subset of Pgenerosa_v070 containing only the top 18 longest scaffolds. So, we decided to do a quick comparison of the annotations present in these 18 scaffolds Pgenerosa_v070 and Pgenerosa_v074.
grep to pull out features identified in the same 18 scaffolds in the Pgenerosa_v074 assembly from Pgenerosa_v070 annotated GFF from 20190228 and then counted the number of features identified in this newly subsetted GFF. It’s all documented in the Jupyter Notebook below.
Jupyter Notebook (GitHub):
Well, we definitely see a difference between annotations of those top 18 scaffolds (~6-fold difference). However, there is still a huge difference between the full Pgenerosa_v070 annotation and the top 18 scaffolds from the Pgenerosa_v070 annotation.
I’ll be performing some RNAseq alignments to these various assemblies. That should provide us with some evidence that we can use to support/refute some of the annotations that are present/absent.