20220727
- Ran qPCRs using 8 primer sets on Dorothy’s cDNA from yesterday.
20220726
- In lab to finish Dorothy’s mussel RNA extraction and reverse transcription.
20220725
Read Ch. 7 of “Fresh Banana Leaves” for book club.
Lab meeting discussing Ch. 7 of “Fresh Banana Leaves.”
20220721
More testing with
EpiDiverse/wgbs
andEpiDiverse/snp
pipelines (conda environments). This time ran trimming with an additional 5’ 10bp hard trim, perBismark
recommendations (they actually recommend 8bp).Did a lot of reading about Nextflow and trying to get a grasp on how it all works so we might be able to develop our own pipelines and/or feel confident modifying existing pipelines (e.g. modify
EpiDiverse/snp
pipeline to analyze all BAMs in a directory without the need to explicitly declare how many at run time).
20220720
- More testing with
EpiDiverse/wgbs
pipeline. Per this GitHub Issue, I tried using the conda test profile and things ran smoothly without any errors in thebam_statistics
part of the pipeline. Steps:
- Create conda environment:
conda env create --name epidiverse-wgbs-current -f /home/shared/epidiverse-pipelines/wgbs-current
- Activate conda environment:
conda activate epdiverse-wgbs-current
- Run conda test profile:
NXF_VER=20.07.1 /home/shared/nextflow run /home/shared/epidiverse-pipelines/wgbs-current -profile test,conda
Did same conda test process for
EpiDiverse/snp
and it also ran without issue.Started adding instructions for using Nextflow EpiDiverse pipelines on Raven and Mox to Roberts Lab Handbook.
Decided to re-run the
EpiDiverse/wgbs
pipeline using the conda environment and enable the FastQC process for post-trimming assessment. Will allow me to compare with our previous trimming results and try to determine why our trimming generates different results when passing our FastQs into theEpiDiverse/snp
pipeline.
20220719
- Assisted Dorothy with prep for big RNA extraction from mussel gills. Homogenized 10 control and 10 heat treated gill samples in 1mL RNAzol RT and stored @ -80C. Extraction to happen next Tuesday.
20220718
- Ran some more tests of the
EpiDiverse/wgbs
andEpiDiverse/snp
pipelines to compare the Oly trimmed and untrimmed FastQs as inputs.
20220717
- Ran some more tests of the
EpiDiverse/wgbs
andEpiDiverse/snp
pipelines to compare the Oly trimmed and untrimmed FastQs as inputs.
20220716
- Ran some more tests of the
EpiDiverse/wgbs
andEpiDiverse/snp
pipelines to compare the Oly trimmed and untrimmed FastQs as inputs.
20220715
Managed to resolve the
EpiDiverse/wgbs
test issue I encountered yesterday. See the link for details (involves binding local directory to Singularity container).Successfully ran the
EpiDiverse/wgbs
pipeline on a subset of the Ostrea lurida BSseq data! Now to test this in theEpiDiverse/snp
pipeline to see if output is same as when running Bismark BAMs through theEpiDiverse/snp
pipeline…Tested Bismark BAMs from different data set (Panopea generosa: https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/1231/) in
EpiDiverse/snp
pipeline to see how results compare to the Ostrea lurida Bismark BAMs.
20220714
Installed Singularity on Raven (required Go, so installed that, too).
Tried to run
EpiDiverse/wgbs
test run on Raven, but it failed. See the GitHub Issue I posted for deets.
20220713
Got response to my
EpiDiverse/snp
GitHub issue regarding the pipeline not processing all BAM files in directory! Will re-ran pipeline and add the--take <INTEGER>
parameter to handle all 18 BAM files.Fixed issue with CEABIGR transcript count stats. Original code assumed that calculations were only performed on input data and did not process newly added data generated by
mutate()
function. Fixed that by using the-any_of()
fucntion to skip a vector of column names. The-
preceding theany_of()
is the instruction to exclude things entered in theany_of()
function.
20220712
- In lab: walked Dorothy through the basics of RNA isolation (RNAzol and Direct-zol Mini Kit) and quantification (Qubit RNA BR Assay). Processed two fresh mussel ctenidia samples.
20220711
Lab meeting: Discussed Ch.6 of Fresh Banana Leaves
Oyster gene expression meeting: Steven did some plotting of Crassostrea virginica (Eastern oyster) transcript count data.
Nextflow
epidiverse/snp
: opted to try a subsetted version of the Olurida_v081 genome and BSseq alignments (GitHub Issue):
Looks like it’s working (got past the previous hangups on 20220708)!
N E X T F L O W ~ version 20.07.1
Launching `epidiverse/snp` [sick_hoover] - revision: 9c814703c6 [master]
================================================
E P I D I V E R S E - S N P P I P E L I N E
================================================
~ version 1.0
input dir : /home/shared/8TB_HDD_01/sam/data/O_lurida/BSseq/bams/070322-olymerge-snp
reference : /home/shared/8TB_HDD_01/sam/data/O_lurida/genomes/Olurida_v081-mergecat98.fa
output dir : snps
variant calls : enabled
clustering : enabled
================================================
RUN NAME: sick_hoover
executor > local (60)
[2c/3853d0] process > SNPS:preprocessing (zr1394_9_s456_trimmed_bismark_bt2.deduplicated) [100%] 10 of 10 ✔
[90/dea51f] process > SNPS:masking (zr1394_9_s456_trimmed_bismark_bt2.deduplicated - clustering) [100%] 20 of 20 ✔
[8a/ce2b44] process > SNPS:extracting (zr1394_9_s456_trimmed_bismark_bt2.deduplicated) [100%] 10 of 10 ✔
[01/ef9503] process > SNPS:khmer (zr1394_6_s456_trimmed_bismark_bt2.deduplicated) [ 40%] 4 of 10
[- ] process > SNPS:kwip -
[- ] process > SNPS:clustering -
[83/238cfa] process > SNPS:sorting (zr1394_9_s456_trimmed_bismark_bt2.deduplicated) [100%] 10 of 10 ✔
[03/1b910d] process > SNPS:freebayes (zr1394_7_s456_trimmed_bismark_bt2.deduplicated) [ 0%] 0 of 10
[- ] process > SNPS:bcftools -
[- ] process > SNPS:plot_vcfstats
Well, well, well! It worked!!
N E X T F L O W ~ version 20.07.1
Launching `epidiverse/snp` [sick_hoover] - revision: 9c814703c6 [master]
================================================
E P I D I V E R S E - S N P P I P E L I N E
================================================
~ version 1.0
input dir : /home/shared/8TB_HDD_01/sam/data/O_lurida/BSseq/bams/070322-olymerge-snp
reference : /home/shared/8TB_HDD_01/sam/data/O_lurida/genomes/Olurida_v081-mergecat98.fa
output dir : snps
variant calls : enabled
clustering : enabled
================================================
RUN NAME: sick_hoover
executor > local (92)
[2c/3853d0] process > SNPS:preprocessing (zr1394_9_s456_trimmed_bismark_bt2.deduplicated) [100%] 10 of 10 ✔
[90/dea51f] process > SNPS:masking (zr1394_9_s456_trimmed_bismark_bt2.deduplicated - clustering) [100%] 20 of 20 ✔
[8a/ce2b44] process > SNPS:extracting (zr1394_9_s456_trimmed_bismark_bt2.deduplicated) [100%] 10 of 10 ✔
[a4/dc11ce] process > SNPS:khmer (zr1394_9_s456_trimmed_bismark_bt2.deduplicated) [100%] 10 of 10 ✔
[95/79be9f] process > SNPS:kwip [100%] 1 of 1 ✔
[c6/24b89b] process > SNPS:clustering [100%] 1 of 1 ✔
[83/238cfa] process > SNPS:sorting (zr1394_9_s456_trimmed_bismark_bt2.deduplicated) [100%] 10 of 10 ✔
[63/2c5bc5] process > SNPS:freebayes (zr1394_9_s456_trimmed_bismark_bt2.deduplicated) [100%] 10 of 10 ✔
[00/f4af8b] process > SNPS:bcftools (zr1394_9_s456_trimmed_bismark_bt2.deduplicated) [100%] 10 of 10 ✔
[df/1e78de] process > SNPS:plot_vcfstats (zr1394_9_s456_trimmed_bismark_bt2.deduplicated) [100%] 10 of 10 ✔
Pipeline execution summary
---------------------------
Name : sick_hoover
Profile : docker
Launch dir : /home/shared/8TB_HDD_01/sam/analyses/20220707-olur-epidiverse
Work dir : /home/shared/8TB_HDD_01/sam/analyses/20220707-olur-epidiverse/work (cleared)
Status : success
Error report : -
Completed at: 11-Jul-2022 18:47:04
Duration : 3h 17m 49s
CPU hours : 140.1
Succeeded : 92
I’ll obviously share this with Steven, but will also see if I can get the original dataset to run on Mox, which has twice the memory available as Raven…
Actually, I glanced at the data and it didn’t really work. It only seems to have analyzed some of the BAMs in the directory! There are 18 BAMs, but it only processed 10 of them.. Odd (and annoying). I’ll re-run to see if it does the same thing.
20220710
Created
erne-bs5
genome index to attempt full epidiverse pipeline, starting withEpiDiverse/wgbs
:/home/shared/erne-2.1.1-linux/bin/erne-create \ --methyl-hash \ --fasta Olurida_v081.fa \ --output-prefix Olurida_v081
Realized I had experienced previous issues (see this GitHub Issue and this GitHub Issue) previously, which is not encouraging.
Tried to run the test protocol for this pipeline “locally” (i.e. offline) using Mox sbatch script. To do so, I’ve downloaded all the of the input files listed in
test.config
. I’ve also downloaded the Singularity image (singularity pull docker://epidiverse/wgbs:1.0
) and changed thenextflow.config
file to specify the Singularity image location, like so:
// -profile singularity
singularity {
includeConfig "${baseDir}/config/base.config"
singularity.enabled = true
process.container = '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/work/singularity/wgbs_1.0.sif'
}
That seemed like that should be all that was needed, but when I execute the test command (NXF_VER=20.07.1 /gscratch/srlab/programs/nextflow-21.10.6-all run /gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/wgbs-1.0 -profile test,singularity
), it fails with this error:
executor > local (10)
[c4/79070c] process > INDEX:erne_bs5_indexing [100%] 1 of 1 ✔
[30/202688] process > INDEX:segemehl_indexing [100%] 1 of 1 ✔
[07/dc2230] process > WGBS:read_trimming (sampleB) [100%] 8 of 8, failed: 8...
[- ] process > WGBS:read_merging -
[- ] process > WGBS:fastqc -
[- ] process > WGBS:erne_bs5 -
[- ] process > WGBS:segemehl -
[- ] process > WGBS:erne_bs5_processing -
[- ] process > WGBS:segemehl_processing -
[- ] process > WGBS:bam_merging -
[- ] process > WGBS:bam_subsetting -
[- ] process > WGBS:bam_filtering -
[- ] process > WGBS:bam_statistics -
[- ] process > CALL:bam_processing -
[- ] process > CALL:Picard_MarkDuplicates -
[- ] process > CALL:MethylDackel -
[- ] process > CALL:conversion_rate_estima... -
Pipeline execution summary
---------------------------
Name : infallible_mccarthy
Profile : test,singularity
Launch dir : /gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test
Work dir : /gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/work
Status : failed
Error report : Error executing process > 'WGBS:read_trimming (sampleA)'
Caused by:
Process `WGBS:read_trimming (sampleA)` terminated with an error exit status (1)
Command executed:
mkdir fastq fastq/logs
cutadapt -j 2 -a AGATCGGAAGAGC -A AGATCGGAAGAGC \
-q 20 -m 36 -O 3 \
-o fastq/merge.null \
-p fastq/merge.g null g \
> fastq/logs/cutadapt.sampleA.merge.log 2>&1
Command exit status:
1
Command output:
(empty)
Work dir:
/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/work/12/6ee9cc7a7372a97f34f21a4f79efb3
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Error executing process > 'WGBS:read_trimming (sampleA)'
Caused by:
Process `WGBS:read_trimming (sampleA)` terminated with an error exit status (1)
Command executed:
mkdir fastq fastq/logs
cutadapt -j 2 -a AGATCGGAAGAGC -A AGATCGGAAGAGC \
-q 20 -m 36 -O 3 \
-o fastq/merge.null \
-p fastq/merge.g null g \
> fastq/logs/cutadapt.sampleA.merge.log 2>&1
Command exit status:
1
Command output:
(empty)
Work dir:
/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/work/12/6ee9cc7a7372a97f34f21a4f79efb3
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
When I look at the Cutadapt log file, this is what is shown:
cat cutadapt.sampleA.merge.log
This is cutadapt 2.10 with Python 3.6.7
Command line parameters: -j 2 -a AGATCGGAAGAGC -A AGATCGGAAGAGC -q 20 -m 36 -O 3 -o fastq/merge.null -p fastq/merge.g null g
Processing reads on 2 cores in paired-end mode ...
ERROR: Traceback (most recent call last):
File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/cutadapt/pipeline.py", line 477, in run
with xopen(self.file, 'rb') as f:
File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/xopen/__init__.py", line 407, in xopen
return open(filename, mode)
IsADirectoryError: [Errno 21] Is a directory: 'null'
ERROR: Traceback (most recent call last):
File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/cutadapt/pipeline.py", line 477, in run
with xopen(self.file, 'rb') as f:
File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/xopen/__init__.py", line 407, in xopen
return open(filename, mode)
IsADirectoryError: [Errno 21] Is a directory: 'null'
ERROR: Traceback (most recent call last):
File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/cutadapt/pipeline.py", line 540, in run
raise e
IsADirectoryError: [Errno 21] Is a directory: 'null'
Traceback (most recent call last):
File "/opt/conda/envs/wgbs/bin/cutadapt", line 10, in <module>
sys.exit(main())
File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/cutadapt/__main__.py", line 855, in main
stats = r.run()
File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/cutadapt/pipeline.py", line 770, in run
raise e
IsADirectoryError: [Errno 21] Is a directory: 'null'
I posted an Issue, but I’m not really expecting to get a response.
20220709
Nextflow epidiverse/snp: Let it run overnight and woke up to the same timeout error message and kernel memory warnings, despite changes in config file.. Although, I had been running it with the
-resume
flag… I’ll remove all previoussnps/
andwork/
dirs and run with the following command:NXF_VER=20.07.1 /home/shared/nextflow run epidiverse/snp \ -c nextflow-docker_permissions.config \ -profile docker \ --input /home/shared/8TB_HDD_01/sam/data/O_lurida/BSseq/ \ --reference /home/shared/8TB_HDD_01/sam/data/O_lurida/genomes/Olurida_v081.fa
- Well that didn’t work… I’m going to try again with a limited data set, as the
Olurida_v081.fa
has a TON of contigs, which has caused issues with other bisfulfite analysis software.
- Well that didn’t work… I’m going to try again with a limited data set, as the
20220708
Spent time at Science Hour with Steven visualizing some of the Ostrea lurida (Olympia oyster) transcript counts stats in R - I mostly just watched and nodded. :)
Nextflow epidiverse/snp problems. Got to deal with these shenanigans:
================================================ E P I D I V E R S E - S N P P I P E L I N E ================================================ ~ version 1.0 input dir : /home/shared/8TB_HDD_01/sam/data/O_lurida/BSseq reference : /home/shared/8TB_HDD_01/sam/data/O_lurida/genomes/Olurida_v081.fa output dir : snps variant calls : enabled clustering : enabled ================================================ RUN NAME: nostalgic_bhabha executor > local (34) [d7/390a8d] process > SNPS:preprocessing (zr1394_all_s456_trimmed_bismark_bt2.deduplicated.sorted) [100%] 10 of 10 ✔ [49/25512a] process > SNPS:masking (zr1394_11_s456_trimmed_bismark_bt2.deduplicated.sorted - variants) [ 60%] 24 of 40, failed: 24, retries: 20 [- ] process > SNPS:extracting - [- ] process > SNPS:khmer - [- ] process > SNPS:kwip - [- ] process > SNPS:clustering - [- ] process > SNPS:sorting - [- ] process > SNPS:freebayes - [- ] process > SNPS:bcftools - [- ] process > SNPS:plot_vcfstats - Error executing process > 'SNPS:masking (zr1394_17_s456_trimmed_bismark_bt2.deduplicated.sorted - clustering)' Caused by: Process exceeded running time limit (4h) Command executed: change_sam_queries.py -Q -T 16 -t . -G calmd.bam clustering.bam || exit $? find -mindepth 1 -maxdepth 1 -type d -exec rm -r {} \; Command exit status: - Command output: (empty) Command error: WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. Work dir: /home/shared/8TB_HDD_01/sam/analyses/20220707-olur-epidiverse/work/8b/636e263c8924aa275cd5f1fc7787e8 Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Updated custom config file (which was originally used solely to address permissions issues on the outptut files generated by the Docker container) to include improved time limits:
docker { fixOwnership = true } process { // Process-specific resource requirements withLabel:process_low { time = 30.d } withLabel:process_medium { time = 30.d } withLabel:process_high { time = 30.d } withLabel:process_long { time = 30.d } }
Seemed to fix timeout issue as this ran for nearly 6hrs without timing out error, but still getting the kernel memory warning in log files and constant failures at the masking step. Added this line to config file:
docker.runOptions = "--memory-swap '-1'"
Config file now looks like this (grabbed idea to use this line to resolve kernel issue from here):
docker { docker.runOptions = "--memory-swap '-1'" fixOwnership = true } process { // Process-specific resource requirements withLabel:process_low { time = 30.d } withLabel:process_medium { time = 30.d } withLabel:process_high { time = 30.d } withLabel:process_long { time = 30.d } }
20220707
In order to start Nextflow epidiverse/snp, genome FastA file requires FastA index file; this is not documented, but pipeline throws an error when a FastA index is not found.
started Nextflow epidiverse/snp on raven
20220706
fixed Nextflow epidiverse/snp problem (GitHub Issue)
dealt with lake trout BioProject PRJNA316738 data retrieval and QC (GitHub Issue)
20220705
CEABIGR meeting:
- added gene names to transcript counts files
- added code to cover all desired iterations of output files (i.e. missing controls males files)