Progress on generating bedgraphs from our Olympia oyster transcriptome continues.
Transcriptome assembly with Trinity completed 20180919.
Then, aligned the assembled transcriptome to our genome using Bowtie2.
Finally, I used BEDTools to convert the BAM to BED to bedgraph.
This required an initial indexing of our Olympia oyster genome FastA using samtools faidx tool.
SBATCH script file:
20180924_oly_RNAseq_bedgraphs.sh
#!/bin/bash ## Job Name #SBATCH –job-name=20180924_oly_bedgraphs ## Allocation Definition #SBATCH –account=srlab #SBATCH –partition=srlab ## Resources ## Nodes #SBATCH –nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH –time=5-00:00:00 ## Memory per node #SBATCH –mem=500G ##turn on e-mail notification #SBATCH –mail-type=ALL #SBATCH –mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH –workdir=/gscratch/scrubbed/samwhite/20180924_oly_RNAseq_bedgraphs
Load Python Mox module for Python module availability
module load intel-python3_2017
Document programs in PATH (primarily for program version ID)
date >> system_path.log echo “” >> system_path.log echo “System PATH for $SLURM_JOB_ID” >> system_path.log echo “” >> system_path.log printf “%0.s-” {1..10} >> system_path.log echo ${PATH} | tr : \n >> system_path.log
Set genome assembly FastA
oly_genome_fasta=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa
Set indexed genome assembly file
oly_genome_indexed=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa.fai
Set sorted transcriptome assembly bam file
oly_transcriptome=/gscratch/scrubbed/samwhite/20180919_oly_transcriptome_bowtie2/20180919_Olurida_v081.sorted.bam
Set program paths
bedtools=/gscratch/srlab/programs/bedtools-2.27.1/bin samtools=/gscratch/srlab/programs/samtools-1.9/samtools
Index genome FastA
${samtools} faidx ${oly_genome_fasta}
Format indexed genome for bedtools
Requires only two columns: name
length awk -v OFS=‘ {’print $1,$2’} ${oly_genome_indexed} > Olurida_v081.fa.fai.genome
Create bed file
${bedtools}/bamToBed
-i ${oly_transcriptome}
> 20180924_oly_RNAseq.bam.bedCreate bedgraph
Reports depth at each position (-bg in bedgraph format) and report regions with zero coverage (-a).
Screens for portions of reads coming from exons (-split).
Add genome browser track line to header of bedgraph file.
${bedtools}/genomeCoverageBed
-i ${PWD}/20180924_oly_RNAseq.bed
-g Olurida_v081.fa.fai.genome
-bga
-split
-trackline
> 20180924_oly_RNAseq.bed
Alignment was done using the following version of the Olympia oyster genome assembly:
RESULTS:
Output folder:
Indexed and formatted genome file:
Bedgraph file (for IGV):
This doesn’t appear to have worked properly. Here’s a view of the bedgraph file:
<code>
track type=bedGraph
Contig0 0 116746 0
Contig1 0 87411 0
Contig2 0 139250 0
Contig3 0 141657 0
Contig4 0 95692 0
Contig5 0 130522 0
Contig6 0 94893 0
Contig7 0 109667 0
Contig8 0 95943 0
</code>
I’d expect multiple entries for each contig (ideally), indicating start/stop positions for where transcripts align within a given contig. However, this appears to simply be a list of all the genome contigs and their lengths (Start=0, Stop=n).
I would expect to see something li
I’ll look into this further and see where this pipeline goes wrong.