20220930
Continued to review Circos documentation.
- Thought about usefulness (uselessness?) of links for our use case. Possibly interesting to link genes with same GOslims?
Science Hour
Worked on CEABIGR/Circos organization. Should Circos be in it’s own branch? Does this simplify things or complicate things for usage? Better way to keep recommended Circos organization, but setup working directories outside of Circos program install location?
20220929
Pub-a-thon
Lab meeting
- Read Ch. 4 of “The Disordered Cosmos”
Helped reinstall Stacks on Mox for Marta (GitHub Issue)
20220928
- Worked extensively on Circos stuff for CEABIGR project again. Primarily generating Circos-formatted gene expression/methylation files. See CEABIGR
50-circos-formatting
directory.
20220927
Worked extensively on Circos stuff for CEABIGR project. Had to fix some R code, as well as file formatting, bash scripts, and learning how to manipulate the
circos.conf
file. Actually generated a plot:
20220926
epidiverse/snp
pipeline completed, so transferred data and wrote up notebook entry.Generated CEABIGR mean gene expression files, formatted for Circos usage.
20220923
Resubmitted
epidiverse/snp
pipeline on Mox for Crassostrea virginica (Eastern oyster) because I forgot that the pipeline has an aritficial cap of 10 BAMs. Edited thenextflow.config
file to increase cap to 50 and added code in thebio_DNA-methylation.md
doc to count the number of BAMs and specify theepidiverse/snp
command to use that number of BAMs. Prevents the aritficial limit from having any impact.Science Hour. Helped Matt with some Mox/Trinity (system
$PATH
stuff).
20220922
Updated ceabigr predominant isoform R Markdown (GitHub) to generate files for control/exposed female samples.
Due to issue with R loading a package and getting this error:
libgsl.so.23: cannot open shared object file: No such file or directory
, I came across some suggestions that if your current version of R was built with a previous version of Ubuntu (which happens to be my case, since I upgraded Ubuntu earlier this month), I decided to upgrade and build R. Took awhile to compile… I also ended up having to re-install packages. Ugh!Initiated
epidiverse/snp
pipeline on Mox for Crassostrea virginica (Eastern oyster) Bismark BAMs, per this GitHub Issue.
20220921
Updated ceabigr predominant isoform R Markdown (GitHub) to include vectors containing sample types, as well as generate files for control/exposed male samples.
Finished notebook entry for geoduck HISAT2 alignments for lncRNA.
Read more Circos info.
20220920
ceabigr meeting with Yaamini. Decided to generate list of genes with differing predominant isoforms between females and males - these were the only two comparisons Steven had produced so far. List will also use a binary system (i.e.
0
or1
to indicate no different or different, respectively).- Generated file(s). See this Jupyter Notebook (GitHub).
20220919
Did I finally fix the SBATCH script for geoduck Hisat2 alignments? It’s looking that way…
CEABIGR Circos stuff
awk 'BEGIN {OFS="\t"} {print "cvir"$1, $2, $3}' C_virginica-3.0_Gnomon_genes.bed > C_virginica-3.0_Gnomon_genes.bed.circos
Continued reading documentation.
Got distracted exploring how to get a list of all existing GO IDs. The idea being to then map all GOslims to the GO IDs and create a “flat” file that can be used for joining. Mostly reading about using the
GO.db
library in R and how to extract information.
20220916
Science Hour
Continued troubleshooting SBATCH script for geoduck Hisat2 alignments. Despite declaring success yesterday, still realized the script wasn’t running properly. Grrrrrr….
20220915
Retrieved GOslims for single cell RNAseq (scRNAseq) project, per this GitHub Issue.
Pub-a-thon
Permanantly fixed SBATCH script for geoduck Hisat2 alignments and succsufully ran, as part of the lncRNA identification.
20220914
Installed Circos on my computer (VM Ubuntu 22.04LTS):
Couldn’t install needed library:
sudo apt-get -y install libgd2-xpm-dev E: Unable to locate package libgd2-xpm-dev
Possible fix is:
sudo apt-get -y install libgd-dev
Missing Perl modules:
sam@computer:~/programs/circos-0.69-9/bin$ ./circos -modules ok 1.52 Carp ok 0.45 Clone missing Config::General ok 3.80 Cwd ok 2.179 Data::Dumper ok 2.58 Digest::MD5 ok 2.85 File::Basename ok 3.80 File::Spec::Functions ok 0.2311 File::Temp ok 1.52 FindBin missing Font::TTF::Font missing GD missing GD::Polyline ok 2.52 Getopt::Long ok 1.46 IO::File missing List::MoreUtils ok 1.55 List::Util missing Math::Bezier ok 1.999818 Math::BigFloat missing Math::Round missing Math::VecStat ok 1.03_01 Memoize ok 1.97 POSIX missing Params::Validate ok 2.01 Pod::Usage missing Readonly missing Regexp::Common missing SVG missing Set::IntSpan missing Statistics::Basic ok 3.23 Storable ok 1.23 Sys::Hostname ok 2.04 Text::Balanced missing Text::Format ok 1.9767 Time::HiRes
Needed to install
cpanm
:apt-get install cpanminus
Then:
sudo cpanm Clone Config::General Font::TTF::Font GD GD::Polyline List::MoreUtils Math::Bezier Math::Round Math::VecStat Params::Validate Readonly Regexp::Common SVG Set::IntSpan Statistics::Basic Text::Format
Got me to this:
sam@computer:~/programs/circos-0.69-9/bin$ ./circos -modules ok 1.52 Carp ok 0.45 Clone ok 2.65 Config::General ok 3.80 Cwd ok 2.179 Data::Dumper ok 2.58 Digest::MD5 ok 2.85 File::Basename ok 3.80 File::Spec::Functions ok 0.2311 File::Temp ok 1.52 FindBin ok 0.39 Font::TTF::Font ok 2.76 GD ok 0.2 GD::Polyline ok 2.52 Getopt::Long ok 1.46 IO::File ok 0.430 List::MoreUtils ok 1.55 List::Util ok 0.01 Math::Bezier ok 1.999818 Math::BigFloat ok 0.07 Math::Round ok 0.08 Math::VecStat ok 1.03_01 Memoize ok 1.97 POSIX ok 1.30 Params::Validate ok 2.01 Pod::Usage ok 2.05 Readonly ok 2017060201 Regexp::Common ok 2.87 SVG ok 1.19 Set::IntSpan ok 1.6611 Statistics::Basic ok 3.23 Storable ok 1.23 Sys::Hostname ok 2.04 Text::Balanced ok 0.62 Text::Format ok 1.9767 Time::HiRes
Successfully generated test images!
Tidied up a bunch of things in the SBATCH script for geoduck RNAseq HiSat alignments in order to get it to run properly. Currently waiting in queue…
Updated (circos_pgen_karyotype.sh)[https://github.com/RobertsLab/sams-notebook/blob/master/bash_scripts/circos_pgen_karyotype.sh] bash script to allow passing arguments for species abbreviation and FastA index file.
20220913
ceabigr meeting
Finished (I hope!) SBATCH script for geoduck RNAseq HiSat alignments for eventual lncRNA id. Lots of ins and outs and what-have-yous for this thing… Will execute tomorrow when Mox is back online (offline today for maintenance).
20220912
Lab meeting.
Oyster gene expression (ceabigr) meeting with Steven & Yaamini.
Need to make some CIRCOS plots
Continued work on SBATCH script for geoduck RNAseq HiSat alignments for eventual lncRNA id.
20220909
Science Hour and Pub-a-thon
Started working on SBATCH script for geoduck RNAseq HiSat alignments for eventual lncRNA id.
20220908
lab meeting
transferred data to Mox, prepared SBATCH script and trimmed geoduck RNAseq data in preparation for identifying long non-coding RNA
20220907
Computer maintenance: upgraded my laptop Ubuntu virtual machine from 20.04LTS to 22.04LTS.
ProCard reconcilation.
transferred geoduck RNAseq files to Mox in preparation for long non-coding RNA identification.
20220906
Computer maintenance: backed up my laptop virtual machine (took a very long time) to my home server.
Helped with Zach’s GitHub Issue regarding PCR caps vs. films.
Helped with Marta’s GitHub Issue with running a support script in
stacks
.ProCard reconciliation.
Technical issue with accessing the Lab Safety Report Dashboard. Resolved late this afternoon.
20220901
- Continued to mess around with Mixomics R package tutorials. Here’s R Markdown used, as well as some of the plots that were produced (PCA and sparse PCA). Overall, sex is driving factor of differences in gene expression (FPKM) and average gene methylation.
Put together a quick R Markdown doc to go through the tutorial using our gene FPKM expression data and average gene methylation data (generated by Steven):
Some screenshots of plots generated: