20230131
Updated coral GTFs:
Added updated coral genome
HISAT2
indexes to Genomic Resources handbook page.
20230130
Resolved the issue with the coral GTFs (GitHub Issue)!!!
Lab meeting.
Added additional coral GFFs/genomes to Roberts Lab Handbook.
20230127
- Encountered an issue with coral GTFs (GitHub Issue) not being compatible with
HISAT2
extract_exons.py
script. Working on fixing…
20230126
Created
HISAT2
genome index for:- P.acuta (Notebook entry)
Added M.capitata, P.acuta, and P.verrucosa genomes and corresponding
HISAT2
indexes to The Roberts Lab Handbook Genomic Resources page.
20230125
Downloaded genome data for the following species. For those hosted on NCBI, I got the commands using NCBI’s very useful
curl
commands now provided on the beta page for a given genome.P.verrucosa:
curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_014529365.1/download?include_annotation_type=GENOME_FASTA,GENOME_GFF,SEQUENCE_REPORT&filename=GCA_014529365.1.zip" -H "Accept: application/zip"
- NOTE: No annotation file(s) (i.e. no GFF) available.
M.capitata:
curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_006542545.1/download?include_annotation_type=GENOME_FASTA,GENOME_GFF,SEQUENCE_REPORT&filename=GCA_006542545.1.zip" -H "Accept: application/zip"
- NOTE: No annotation file(s) (i.e. no GFF) available.
P.acuta:
Genome FastA:
wget http://cyanophora.rutgers.edu/Pocillopora_acuta/Pocillopora_acuta_HIv2.assembly.fasta.gz
Genome GFF:
http://cyanophora.rutgers.edu/Pocillopora_acuta/Pocillopora_acuta_HIv2.genes.gff3.gz
Created
HISAT2
genome indexes for:M.captita (Notebook entry)
P.verrucosa_ (Notebook entry)
Helped resolve Grace’s Trinity issue (GitHub Issue).
20230124
- Put together full notebook entry regarding coral SRA BioProject PRJNA74403 download/QC/trimming: 2023-01-13-SRA-Data—Coral-SRA-BioProject-PRJNA744403-Download-and-QC/index.qmd (Notebook entry)
20230120
- Re-ran QC/trimming SLURM script for coral SRA data (BioProject PRJNA74403) a couple of times. Still dealing with some minor file output organization issues… Also, seems like
fastp
is trimming data (i.e. output reports indicate average read length after filtering is ~140bp, but the resulting graphs of read lengths after trimming still show 150bp…).
20230119
Ran QC/trimming SLURM script for coral SRA data (BioProject PRJNA74403).
- Took ~ 10.5hrs, but needed some adjustments…
Pub-a-thon
20230118
Continued work on this Issue (GitHub Issue) for dealing with coral SRA data (BioProject PRJNA74403).
gzip
-ed all the data.Continued working on script to trim data.
20230117
Lab meeting.
Continued work on this Issue (GitHub Issue) for dealing with coral SRA data (BioProject PRJNA74403).
- Continued working on script to trim data.
20230113
- Made some scatter plots from the CEABIGR mean gene methylation coefficients of variation calcs. Not sure if they’re useful or not, but here are some of them (animated GIF - changes every ~4s). Animated GIF cycling through scatter plots of different comparisions of mean gene methylation coefficients of variation (CoV). Point are colored by absolute value of differences (delta) between mean gene methylation CoV. Purple are lowest differences, while red are greatest differences. Blue line is the linear model regression line, while the redline is a an artificial regression line with a slope of
1
.
- Began work on this Issue to download/QC some coral BS-seq and RNA-seq data from NCBI SRA BioProject PRJNA74403). Downloads and conversion from SRA to FastQ took > 12hrs.
20230112
- Spent a very long time trying to update CEABIGR mean gene methylation CoV data frames in list so that I could add a delta of CoVs between comparison groups. Had to resort to using ChatGPT (OpenAI) and the bot solved it in less than a minute! Here was the successful solution:
<- lapply(methylation.transposed.rownames.list, function(df) {
methylation.transposed.rownames.list $delta <- abs(df[,1] - df[,2])
dfreturn(df)
})
I had something very similar to this, but didn’t have the return()
aspect of the data the function. I think that was crucial, as I was getting the delta
column by itself as the result, but wasn’t getting the full data frames with the new delta
column added to them.
20230111
- Messed around with plotting CEABIGR mean gene methylation CoV, via scatter plots. Trying to decide if this method provides any info or not. Also tried to figure out how to plot all data frames, as they are stored in a list.
20230110
Worked extensively on troubleshooting “missing” row names in a list of data frames in CEABIGR project for coefficients of variaton of mean DNA methylation.
Turns out, the row names were present the entire time (i.e. I had written the code correctly from the start), but I couldn’t figure out how to view the data frames within a list so that the row names would be visible.
Additionally, the primary problem was that I wanted row names written in the output files. I in the
write.csv()
, I had the argumentrow.names = FALSE
! Doh!
Answered Yaamini’s question regarding retrieving FastA sequences from NCBI.
20230109
Read Ch.11 of “The Disordered Cosmos”
Lab meeting
- Discussed Ch.11 of “The Disordered Cosmos”
Wrote recommendation letter draft for Dorothy.
Updated Owl.
20230106
Worked on Dorothy’s recommendation letter.
Science Hour.
20230105
Long lab meeting discussing ways to improve lab “life” with suggestions from everyone on what they’d like to see. Really interesting/informative session!
Continued to work on the Roberts Lab Handbook transcriptome annotation.
20230104
Worked on Linda Rhodes project on figuring out how to use a pre-built SILVA 138 QIIME classifier.
qiime feature-classifier classify-sklearn \ --i-reads nonchimeras.qza \ --i-classifier silva-138.1-ssu-nr99-515f-806r-classifier.qza \ --o-classification classifier-taxonomy-test.qza
This uses a lot of memory. For the testing I was doing on Linda’s marine mammals data set, it required 25GB of RAM, per CPU! As such, I couldn’t run this with more than a single CPU without the command crashing.
- Runtime was somewhere between 1.5 - 2 days.