Author

Sam White

Published

May 2, 2023

After trimming P.generosa RNA-seq reads on 20230426 and then aligning and annotating them to the Panopea-generosa-v1.0 genome on 20230426, I proceeded with the final step of lncRNA identification. To do this, I used Zach’s notebook entry on lncRNA identification for guidance. I utilized the annotated GTF generated by gffcompare during the alignment/annotation step on 20230426. I used ’bedtools getfasta](https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html) and [CPC2` with an arbitrary 200bp minimum length to identify lncRNAs. All of this was done in a Jupyter Notebook (links below).

Jupyter Notebook (GitHub):

20230502-pgen-lncRNA-identification.ipynb

Jupyter Notebook (NB Viewer):

20230502-pgen-lncRNA-identification.ipynb

RESULTS

Some very brief “stats”:

Total P.generosa transccripts ID’s by HiSat2/Stringtie: 79,269

Total P.generosa lncRNA ID’d by CPC2 (>= 200bp): 13,606

Percentage of transcripts which are lncRNAs: 17%

Output folder:

20230426-pgen-HISAT2-stringtie-gffcompare-RNAseq/

lncRNA GTF
- 20230502-pgen-lncRNA-IDs.gtf (2.2M)
- MD5: 9adb7efc18fe1bfedcad24c86da1161f