After identifying lncRNA in P.generosa, Steven asked that I generate an tissue-specific expression/count matrix (GitHub Issue). Looking through the documentation for StringTie
, I decided that StringTie
would work for this. The overall approach:
Use tissue-specifc BAMs from HISAT2 alignments
Use “canonical” lncRNA GTF representing all lncRNAs found across all tissues as input to
StringTie
.Use
StringTie
’s expression estimation feature to generate read coverage and expression (FPKM) for each lncRNA.Use
StringTie
’s Python script (prepDE.py3
) to generate tissue/sample-specific count matrix.
This was all run on Raven, using a Jupyter Notebook. Links below:
Jupyter Notebook (NB Viewer):
RESULTS
This produced ballgown
expression files, as well as a transcript read count matrix with a column for each tissue/sample. I’m only linking directly to the final matrix file due to the number of samples and redundant ballgown
files/structure. To view the organization of the output directory, see the directory tree below
Output folder:
20230504-pgen-lncRNA-expression/
Transcript count matrix (CSV)
20230504-pgen-lncRNA-expression/transcript_count_matrix.csv
transcript_id ctenidia gonad heart juvenile larvae MSTRG.1.1 34 16 13 93 6 MSTRG.2.1 18 5 2 9 2 MSTRG.3.1 15 9 48 171 60 MSTRG.22.1 4 24 7 27 22 MSTRG.9.1 3 133 1 1681 245 MSTRG.11.1 88 123 77 144 95 MSTRG.12.1 3 81 12 47 50 MSTRG.25.1 6 47 8 0 1 MSTRG.27.1 4 79 9 12 4
Directory tree
├── [4.0K] ctenidia
│ ├── [3.9M] ctenidia-pgen-lncRNA-stringtie.gtf
│ ├── [137K] e2t.ctab
│ ├── [997K] e_data.ctab
│ ├── [ 10] i2t.ctab
│ ├── [ 48] i_data.ctab
│ └── [1.2M] t_data.ctab
├── [316K] gene_count_matrix.csv
├── [4.0K] gonad
│ ├── [137K] e2t.ctab
│ ├── [1002K] e_data.ctab
│ ├── [3.9M] gonad-pgen-lncRNA-stringtie.gtf
│ ├── [ 10] i2t.ctab
│ ├── [ 48] i_data.ctab
│ └── [1.2M] t_data.ctab
├── [4.0K] heart
│ ├── [137K] e2t.ctab
│ ├── [990K] e_data.ctab
│ ├── [3.9M] heart-pgen-lncRNA-stringtie.gtf
│ ├── [ 10] i2t.ctab
│ ├── [ 48] i_data.ctab
│ └── [1.2M] t_data.ctab
├── [4.0K] juvenile
│ ├── [137K] e2t.ctab
│ ├── [1.0M] e_data.ctab
│ ├── [ 10] i2t.ctab
│ ├── [ 48] i_data.ctab
│ ├── [3.9M] juvenile-pgen-lncRNA-stringtie.gtf
│ └── [1.2M] t_data.ctab
├── [4.0K] larvae
│ ├── [137K] e2t.ctab
│ ├── [1001K] e_data.ctab
│ ├── [ 10] i2t.ctab
│ ├── [ 48] i_data.ctab
│ ├── [3.9M] larvae-pgen-lncRNA-stringtie.gtf
│ └── [1.2M] t_data.ctab
└── [409K] transcript_count_matrix.csv
5 directories, 32 files