Previous FastQC/MultiQC analysis of the geoduck Illumina HiSeq data (NMP.fastq.gz files) revealed a high level of overrepresented sequences, high levels of Per Base N Content, failure of Per Sequence GC Content, and a few other bad things.
Ran these through TrimGalore! on our Mox HPC node.
Added an option in TrimGalore! to automatically run FastQC on the trimmed output files.
TrimGalore! slurm script: 20180328_trim_galore_illumina_geoduck_hiseq_slurm.sh
Results:
Slurm output file: slurm-153098.out
I received a job status email on 20180330:
SLURM Job_id=153098 Name=20180328_trim_galore_geoduck_hiseq Failed, Run time 1-17:22:47, FAILED, ExitCode 141
The slurm output file didn’t indicate any errors, so I restarted the job and contacted UW IT to see if I could get more info.
UPDATE
Here’s their response:
04/02/2018 9:13 AM PDT - Matt
Hi Sam,
Your job died because of a networking hiccup that caused GPFS (/gscratch filesystem and such) to expel the node from the GPFS cluster. It’s a symptom of a known ongoing network issue that we’re actively working with Lenovo/Intel/IBM. Things like this aren’t happening super frequently, but enough that we recognized something was wrong and started investigating with vendors. Unfortunately, your job was unlucky and got bitten by it.
So, in short, you or your job didn’t do anything wrong. If you haven’t already (and if it is possible for your use case), we would highly recommend building in some sort of periodic state-preserving behavior (and a method to “resume”) into your longer-running jobs. Jobs can unexpectedly die for any number of reasons, and it is nice not to lose days of compute progress when that happens.
-Matt
Well, okay then.