As we continue to work on the analysis of impacts of OA on Crassostrea virginica (Eastern oyster) gonads via DNA methylation and RNAseq epigeneticstoocean/2018_L18-adult-methylation (GitHub repo), we decided to compare the number of transcripts expressed per gene per sample (GitHub Issue). As it turns out, it was quite the challenge. Ultimately, I wasn’t able to solve it myself, and turned to StackOverflow for a solution. I should’ve just done this at the beginning, as I got a response (and solution) less than five minutes after posting! Regardless, the data wrangling progress (struggle?) was documented in the following GitHub Discussion:
The final data wrangling was performed using R
and documented in this R Markdown file:
transcript-counts.Rmd
epigeneticstoocean/2018_L18-adult-methylation/blob/main/code/transcript-counts.Rmd
RESULTS
Output file (CSV):
transcript-counts_per-gene-per-sample.csv
epigeneticstoocean/2018_L18-adult-methylation/blob/main/analyses/transcript-counts_per-gene-per-sample.csv
Ultimately, the solution came down to this tiny bit of code (see the R Markdown file linked above for actual info about it):
%>%
whole_tx_table select(starts_with(c("gene_name", "FPKM"))) %>%
group_by(gene_name) %>%
summarise((across(everything(), ~sum(. > 0))))
That’s it!