In preparation for differential transcript analysis, I previously ran our RNAseq data through StringTie
on 20210726 to identify and quantify transcripts. Identification of differentially expressed transcripts (DETs) and genes (DEGs) will be performed using ballgown
. This notebook entry will be different than most others, as this notebook entry will simply serve as a “landing page” to access/review the analysis; as the analysis will evolve over time and won’t exist as a single computing job with a definitive endpoint.
I’ll just update this post as things go on, primarily with just a focus on important/interesting details/results.
The analysis is part of the following GitHub repo:
Analysis is taking place via the following R Markdown file:
The ballgown_analysis.Rmd
is designed to be maximally reproducible and includes code to download all the necessary data files needed to run the full analysis. With that being said, it will not run properly without the directory structure that comes with the GitHub repo linked above. Additionally, that repo contains an R Project, which ballgown_analysis.Rmd
essentially relies on in order to manage file/directory locations. So, it would be best to clone https://github.com/epigeneticstoocean/2018_L18-adult-methylation and then run the ballgown_analysis.Rmd
Finally, one of the goals of this project is to identify how DNA methylation (more specifically, how differentially methylated loci) might impact expression of alternative transcripts.
Some information/guide to how ballgown
works “behind the scenes”.
- Pairwise (two-group) differential transcript/gene expression analysis.
Outputs will be table of differentially expressed transcripts or genes.
Outputs will not indicate which group the DETs/DEGs belong to. Requires “manual” separation based on value in the fold change (
fc
) column.- Fold change (
fc
) will be in reference to to group that comes first alphanumerically (e.g. groups 0 and 1; 0 would be considered the reference group). Up-regulated transcripts/genes in the first group (e.g. group 0) will have anfc
value < 1, while up-regulated transcripts/genes in the second group will have anfc
value > 1. (links to developer explanation on BioConductor forums)
- Fold change (
- Multigroup (i.e. > 2 groups) differential transcript/gene expression analysis.
Cannot use fold change (
fc
) as a means to determine differences.Will identify DETs/DEGs, but cannot determine which factor (group) is driving this. (links to developer explanation on BioConductor forums)