Daily Bits - January 2023 – Sam’s Notebook

20230131

Updated coral GTFs:
Added updated coral genome HISAT2 indexes to Genomic Resources handbook page.

20230130

Resolved the issue with the coral GTFs (GitHub Issue)!!!
Lab meeting.
Added additional coral GFFs/genomes to Roberts Lab Handbook.

20230127

Encountered an issue with coral GTFs (GitHub Issue) not being compatible with HISAT2 extract_exons.py script. Working on fixing…

20230126

Created HISAT2 genome index for:
- P.acuta (Notebook entry)
Added M.capitata, P.acuta, and P.verrucosa genomes and corresponding HISAT2 indexes to The Roberts Lab Handbook Genomic Resources page.

20230125

Downloaded genome data for the following species. For those hosted on NCBI, I got the commands using NCBI’s very useful curl commands now provided on the beta page for a given genome.
- P.verrucosa: curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_014529365.1/download?include_annotation_type=GENOME_FASTA,GENOME_GFF,SEQUENCE_REPORT&filename=GCA_014529365.1.zip" -H "Accept: application/zip"
  - NOTE: No annotation file(s) (i.e. no GFF) available.
- M.capitata: curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_006542545.1/download?include_annotation_type=GENOME_FASTA,GENOME_GFF,SEQUENCE_REPORT&filename=GCA_006542545.1.zip" -H "Accept: application/zip"
  - NOTE: No annotation file(s) (i.e. no GFF) available.
- P.acuta:
  - Genome FastA: wget http://cyanophora.rutgers.edu/Pocillopora_acuta/Pocillopora_acuta_HIv2.assembly.fasta.gz
  - Genome GFF: http://cyanophora.rutgers.edu/Pocillopora_acuta/Pocillopora_acuta_HIv2.genes.gff3.gz
- Created HISAT2 genome indexes for:
  - M.captita (Notebook entry)
  - P.verrucosa_ (Notebook entry)
Helped resolve Grace’s Trinity issue (GitHub Issue).

20230124

Put together full notebook entry regarding coral SRA BioProject PRJNA74403 download/QC/trimming: 2023-01-13-SRA-Data—Coral-SRA-BioProject-PRJNA744403-Download-and-QC/index.qmd (Notebook entry)

20230120

Re-ran QC/trimming SLURM script for coral SRA data (BioProject PRJNA74403) a couple of times. Still dealing with some minor file output organization issues… Also, seems like fastp is trimming data (i.e. output reports indicate average read length after filtering is ~140bp, but the resulting graphs of read lengths after trimming still show 150bp…).

20230119

Ran QC/trimming SLURM script for coral SRA data (BioProject PRJNA74403).
- Took ~ 10.5hrs, but needed some adjustments…
Pub-a-thon

20230118

Continued work on this Issue (GitHub Issue) for dealing with coral SRA data (BioProject PRJNA74403).
- gzip-ed all the data.
- Continued working on script to trim data.

20230117

Lab meeting.
Continued work on this Issue (GitHub Issue) for dealing with coral SRA data (BioProject PRJNA74403).
- Continued working on script to trim data.

20230113

Made some scatter plots from the CEABIGR mean gene methylation coefficients of variation calcs. Not sure if they’re useful or not, but here are some of them (animated GIF - changes every ~4s). Animated GIF cycling through scatter plots of different comparisions of mean gene methylation coefficients of variation (CoV). Point are colored by absolute value of differences (delta) between mean gene methylation CoV. Purple are lowest differences, while red are greatest differences. Blue line is the linear model regression line, while the redline is a an artificial regression line with a slope of 1.

Animated GIF cycling through scatter plots of different comparisions of mean gene methylation coefficients of variation (CoV). Point are colored by absolute value of differences (delta) between mean gene methylation CoV. Purple are lowest differences, while red are greatest differences. Blue line is the linear model regression line, while the redline is a an artificial regression line with a slope of `1`.

Began work on this Issue to download/QC some coral BS-seq and RNA-seq data from NCBI SRA BioProject PRJNA74403). Downloads and conversion from SRA to FastQ took > 12hrs.

20230112

Spent a very long time trying to update CEABIGR mean gene methylation CoV data frames in list so that I could add a delta of CoVs between comparison groups. Had to resort to using ChatGPT (OpenAI) and the bot solved it in less than a minute! Here was the successful solution:

methylation.transposed.rownames.list <- lapply(methylation.transposed.rownames.list, function(df) {
  df$delta <- abs(df[,1] - df[,2])
  return(df)
})

I had something very similar to this, but didn’t have the return() aspect of the data the function. I think that was crucial, as I was getting the delta column by itself as the result, but wasn’t getting the full data frames with the new delta column added to them.

20230111

Messed around with plotting CEABIGR mean gene methylation CoV, via scatter plots. Trying to decide if this method provides any info or not. Also tried to figure out how to plot all data frames, as they are stored in a list.

20230110

Worked extensively on troubleshooting “missing” row names in a list of data frames in CEABIGR project for coefficients of variaton of mean DNA methylation.
- Turns out, the row names were present the entire time (i.e. I had written the code correctly from the start), but I couldn’t figure out how to view the data frames within a list so that the row names would be visible.
- Additionally, the primary problem was that I wanted row names written in the output files. I in the write.csv(), I had the argument row.names = FALSE! Doh!
Answered Yaamini’s question regarding retrieving FastA sequences from NCBI.

20230109

Read Ch.11 of “The Disordered Cosmos”
Lab meeting
- Discussed Ch.11 of “The Disordered Cosmos”
Wrote recommendation letter draft for Dorothy.
Updated Owl.

20230106

Worked on Dorothy’s recommendation letter.
Science Hour.

20230105

Long lab meeting discussing ways to improve lab “life” with suggestions from everyone on what they’d like to see. Really interesting/informative session!
Continued to work on the Roberts Lab Handbook transcriptome annotation.

20230104

Worked on Linda Rhodes project on figuring out how to use a pre-built SILVA 138 QIIME classifier.
- ```
  qiime feature-classifier classify-sklearn \
  --i-reads nonchimeras.qza \
  --i-classifier silva-138.1-ssu-nr99-515f-806r-classifier.qza \
  --o-classification classifier-taxonomy-test.qza
```
- This uses a lot of memory. For the testing I was doing on Linda’s marine mammals data set, it required 25GB of RAM, per CPU! As such, I couldn’t run this with more than a single CPU without the command crashing.
  - Runtime was somewhere between 1.5 - 2 days.