Sequence Data Analysis - C.gigas Larvae OA BS-Seq Data

Compared total amount of data generated from each index. The commands below send the output of the ‘ls -l’ command to awk. Awk sums the file sizes, found in the 5th field ($5) of the ‘ls -l’ command, then prints the sum, divided by 1024^3 to convert from bytes to gigabytes.


$ ls -l 2212_lane2_[C]* | awk '{sum += $5} END {print sum/1024/1024/1024}' 5.33341

Index: GCCAAT $ ls -l 2212_lane2_[G]* | awk '{sum += $5} END {print sum/1024/1024/1024}' 7.00596

There’s ~1.4x data in the GCCAAT files.

Ran FASTQC on the following files downloaded earlier today:

2212_lane2_CTTGTA_L002_R1_001.fastq.gz 2212_lane2_CTTGTA_L002_R1_002.fastq.gz 2212_lane2_CTTGTA_L002_R1_003.fastq.gz 2212_lane2_CTTGTA_L002_R1_004.fastq.gz 2212_lane2_GCCAAT_L002_R1_001.fastq.gz 2212_lane2_GCCAAT_L002_R1_002.fastq.gz 2212_lane2_GCCAAT_L002_R1_003.fastq.gz 2212_lane2_GCCAAT_L002_R1_004.fastq.gz 2212_lane2_GCCAAT_L002_R1_005.fastq.gz 2212_lane2_GCCAAT_L002_R1_006.fastq.gz

The FASTQC command is below. This command runs FASTQC in a for loop over any files that begin with “2212_lane2_C” or “2212_lane2_G” and outputs the analyses to the Arabidopsis folder on Eagle:

$for file in /Volumes/nightingales/C_gigas/2212_lane2_[CG]*; do fastqc "$file" --outdir=/Volumes/Eagle/Arabidopsis/; done

From within the Eagle/Arabidopsis folder, I renamed the FASTQC output files to prepend today’s date:

$for file in 2212_lane2_[GC]*; do mv "$file" "20150413_$file"; done

Then, I unzipped the .zip files generated by FASTQC in order to have access to the images, to eliminate the need for screen shots for display in this notebook entry:

$for file in 20150413_2212_lane2_[CG]*.zip; do unzip "$file"; done

The unzip output retained the old naming scheme, so I renamed the unzipped folders:

$for file in 2212lane2[GC]*; do mv “$file” “20150413_$file”; done

The FASTQC results are linked below: 20150413_2212_lane2_CTTGTA_L002_R1_001_fastqc.html 20150413_2212_lane2_CTTGTA_L002_R1_002_fastqc.html 20150413_2212_lane2_CTTGTA_L002_R1_003_fastqc.html 20150413_2212_lane2_CTTGTA_L002_R1_004_fastqc.html 20150413_2212_lane2_GCCAAT_L002_R1_001_fastqc.html 20150413_2212_lane2_GCCAAT_L002_R1_002_fastqc.html 20150413_2212_lane2_GCCAAT_L002_R1_003_fastqc.html 20150413_2212_lane2_GCCAAT_L002_R1_004_fastqc.html 20150413_2212_lane2_GCCAAT_L002_R1_005_fastqc.html 20150413_2212_lane2_GCCAAT_L002_R1_006_fastqc.html