After trimming P.generosa RNA-seq reads on 20230426 and then aligning and annotating them to the Panopea-generosa-v1.0 genome on 20230426, I proceeded with the final step of lncRNA identification. To do this, I used Zach’s notebook entry on lncRNA identification for guidance. I utilized the annotated GTF generated by gffcompare
during the alignment/annotation step on 20230426. I used ’bedtools getfasta](https://bedtools.readthedocs.io/en/latest/content/tools/getfasta/) and [
CPC2` with an aribtrary 200bp minimum length to identify lncRNAs. All of this was done in a Jupyter Notebook (links below).
Jupyter Notebook (GitHub):
Jupyter Notebook (NB Viewer):
RESULTS
Some very brief “stats”:
Total P.generosa transccripts ID’s by HiSat2/Stringtie: 79,269
Total P.generosa lncRNA ID’d by CPC2 (>= 200bp): 13,606
Percentage of transcripts which are lncRNAs: 17%
Output folder:
20230426-pgen-HISAT2-stringtie-gffcompare-RNAseq/
lncRNA GTF
20230502-pgen-lncRNA-IDs.gtf (2.2M)
MD5:
9adb7efc18fe1bfedcad24c86da1161f