I previously created a single metagenome assembly using all of the sequencing data from this project.

Now, we want to compare the different water sample metagenomes to each other.

Here’s how the sample names breakdown:

Sample	Develomental Stage (days post-fertilization)	pH Treatment
MG1	13	8.2
MG2	17	8.2
MG3	6	7.1
MG5	10	8.2
MG6	13	7.1
MG7	17	7.1

SBATCH script (GitHub):

20190327_metagenomics_pgen_megahit.sh

#!/bin/bash
## Job Name
#SBATCH --job-name=megahit
## Allocation Definition
#SBATCH --account=coenv
#SBATCH --partition=coenv
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=25-00:00:00
## Memory per node
#SBATCH --mem=120G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190327_metagenomics_pgen_megahit

# Load Python Mox module for Python module availability

module load intel-python3_2017

# Load Open MPI module for parallel, multi-node processing

module load icc_19-ompi_3.1.2

# SegFault fix?
export THREADS_DAEMON_MODEL=1

# Document programs in PATH (primarily for program version ID)

date >> system_path.log
echo "" >> system_path.log
echo "System PATH for $SLURM_JOB_ID" >> system_path.log
echo "" >> system_path.log
printf "%0.s-" {1..10} >> system_path.log
echo ${PATH} | tr : \\n >> system_path.log


# variables
wd=$(pwd)
fastq_dir=/gscratch/srlab/sam/data/metagenomics/P_generosa
megahit=/gscratch/srlab/programs/megahit_v1.1.4_LINUX_CPUONLY_x86_64-bin/megahit
bbmap_dir=/gscratch/srlab/programs/bbmap_38.34
samtools=/gscratch/srlab/programs/samtools-1.9/samtools
cpus=28

## Inititalize arrays
fastq_array_R1=()
fastq_array_R2=()
names_array=()

# Create array of fastq R1 files
for fastq in ${fastq_dir}/*R1*.gz
do
  fastq_array_R1+=(${fastq})
done

# Create array of fastq R2 files
for fastq in ${fastq_dir}/*R2*.gz
do
  fastq_array_R2+=(${fastq})
done

# Create array of sample names
## Uses parameter substitution to strip leading path from filename
## Uses awk to parse out sample name from filename
for R1_fastq in ${fastq_dir}/*R1*.gz
do
  names_array+=($(echo ${R1_fastq#${fastq_dir}} | awk -F"_" '{print $3 $4}'))
done

# Create list of fastq files used in analysis
## Uses parameter substitution to strip leading path from filename
for fastq in ${fastq_dir}*.gz
do
  echo ${fastq#${fastq_dir}} >> fastq.list.txt
done

# Loop through samples
for sample in ${!names_array[@]}
do
  sample_name=$(echo ${names_array[sample]})
  mkdir ${sample_name} && cd ${sample_name}
  # Run Megahit using paired-end reads
  ${megahit} \
  -1 ${fastq_array_R1[sample]} \
  -2 ${fastq_array_R2[sample]} \
  --num-cpu-threads ${cpus} \
  --out-prefix ${sample_name}

  # Create FastA index file
  ${samtools} faidx megahit_out/${sample_name}.contigs.fa

  # Determine coverage
  ## Align reads with BBmap BBwrap
  ${bbmap_dir}/bbwrap.sh \
  ref=megahit_out/${sample_name}.contigs.fa \
  in1=${fastq_array_R1[sample]} \
  in2=${fastq_array_R2[sample]} \
  out=${sample_name}.aln.sam.gz

  ## Output contig coverage
  ${bbmap_dir}/pileup.sh \
  in=${sample_name}.aln.sam.gz \
  out=${sample_name}.coverage.txt

  # Return to working directory
  cd ${wd}
done

RESULTS

I’ll assess these when I get Emma’s individual assemblies and create a new post with that comparison.

Output folder:

20190327_metagenomics_pgen_megahit/
- MG1 Output folder:
- 20190327_metagenomics_pgen_megahit/MG1
  - MG1 FastA:
    - 20190327_metagenomics_pgen_megahit/MG1/megahit_out/MG1.contigs.fa
- MG2 Output folder:
  - 20190327_metagenomics_pgen_megahit/MG2
    - MG2 FastA:
    - 20190327_metagenomics_pgen_megahit/MG1/megahit_out/MG2.contigs.fa
- MG3 Output folder:
  - 20190327_metagenomics_pgen_megahit/MG3
    - MG3 Fasta:
    - 20190327_metagenomics_pgen_megahit/MG3/megahit_out/MG3.contigs.fa
- MG5 Output folder:
  - 20190327_metagenomics_pgen_megahit/MG5
    - MG5 FastA:
      - 20190327_metagenomics_pgen_megahit/MG5/megahit_out/MG5.contigs.fa
- MG6 Output folder:
  - 20190327_metagenomics_pgen_megahit/MG6
    - MG6 FastA:
      - 20190327_metagenomics_pgen_megahit/MG6/megahit_out/MG6.contigs.fa
- MG7 Output folder:
  - 20190327_metagenomics_pgen_megahit/MG7
    - MG7 FastA:
      - 20190327_metagenomics_pgen_megahit/MG7/megahit_out/MG7.contigs.fa