Genomic Resources
Here we try to compile genomic resources such that they are readily available and somewhat described. An effort will be made to keep respective index files alongside so these files can be directly used in IGV etc.
Related Resources - Archived Versions of this page - 091319;
- Nightingales (Google Sheet) : Database of all raw high-throughput sequencing data
Chionoecetes bairdi
Genomes
-
cbai_genome_v1.01.fasta (18MB)
-
cbai_genome_v1.0.fasta (19MB)
Transcriptomes
Assembly Stats Table (Google Sheet)
-
-
MD5 =
6450d6f5650bfb5f910a5f42eef94913
-
BUSCOs:
C:73.8%[S:45.8%,D:28.0%],F:7.9%,M:18.3%,n:978
-
FastA index (
samtools faidx
) -
BLASTx annotation (outfmt6)
-
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-GW, 2020-UW BLASTx against NCBI C.opilio genome.
-
-
-
MD5 =
aeec8ffbf8fa44fb1750caee6abaf68a
-
BUSCOs:
C:96.5%[S:40.3%,D:56.2%],F:2.2%,M:1.3%,n:978
-
FastA index (
samtools faidx
) -
BLASTx annotation (outfmt6)
-
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-UW with non_Alveolata. Derived from
cbai_transcriptome_v3.0.fasta
-
-
-
Assembly from 20200518
-
MD5 =
5516789cbad5fa9009c3566003557875
-
BUSCOs:
C:97.6%[S:39.1%,D:58.5%],F:1.6%,M:0.8%,n:978
-
FastA index (
samtools faidx
) -
BLASTx annotation (outfmt6)
-
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-UW with no taxonomic filter.
-
-
-
MD5 =
1fb788175f9bb7cd5145370a399ae857
-
BUSCOs:
C:98.3%[S:25.2%,D:73.1%],F:1.4%,M:0.3%,n:978
-
FastA index (
samtools faidx
) -
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-GW, 2020-UW with non_Alveolata. Derived from
cbai_transcriptome_v2.0.fasta
-
-
-
Also referred to as
20200507.C_bairdi.Trinity.fasta
. -
MD5 =
01adbd54298495c147767b19ee5c0de9
https://gannet.fish.washington.edu/Atumefaciens/20200526_cbai_trinotate_transcriptome-v3.0/20200526.cbai.trinotate.go_annotations.txt -
BUSCOs:
C:98.8%[S:24.9%,D:73.9%],F:0.9%,M:0.3%,n:978
-
FastA index (
samtools faidx
) -
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-GW, 2020-UW with no taxonomic filter.
-
-
-
MD5 =
032d1f81c7744736ebeefe7f63ed6d95
-
Assembly from 20200527
-
FastA index (
samtools faidx
)- cbai_transcriptome_v1.7.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/cbai_transcriptome_v1.7.fasta.fai
- cbai_transcriptome_v1.7.fasta.fai :
-
BUSCOs:
C:86.7%[S:66.5%,D:20.2%],F:8.2%,M:5.1%,n:978
-
BLASTx Annotation (outfmt6)
-
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-UW with Arthropoda only reads.
-
-
-
MD5 =
46d77ce86cdbbcac26bf1a6cb820651e
-
FastA index (
samtools faidx
)- cbai_transcriptome_v1.6.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/cbai_transcriptome_v1.6.fasta.fai
- cbai_transcriptome_v1.6.fasta.fai :
-
BUSCOs:
C:91.7%[S:62.6%,D:29.1%],F:6.2%,M:2.1%,n:978
-
BLASTx Annotation (outfmt6)
-
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-GW, 2020-UW with Arthropoda only reads.
-
-
-
MD5 =
e61d68c45728ffbb91e3d34c087d9aa9
-
BUSCOs: C:91.8%[S:64.0%,D:27.8%],F:5.9%,M:2.3%,n:978
-
FastA index (
samtools faidx
)- cbai_transcriptome_v1.5.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/cbai_transcriptome_v1.5.fasta.fai
- cbai_transcriptome_v1.5.fasta.fai :
-
Updated assembly from 20200330. Also referred to as
20200408.C_bairdi.megan.Trinity.fasta
-
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019, 2020-GW with Arthropoda only reads.
-
-
-
MD5 =
fb28a203154b44b67ec2e2476d96d326
-
BUSCOs:
C:85.5%[S:64.7%,D:20.8%],F:9.3%,M:5.2%,n:978
-
FastA index (
samtools faidx
)- cbai_transcriptome_v1.0.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/cbai_transcriptome_v1.0.fasta.fasta.fai
- cbai_transcriptome_v1.0.fasta.fai :
-
Initial Trinity assembly from 20200122
-
GO Terms Annotation (Trinotate)
-
internal short-hand: includes 2018, 2019 with Arthropoda only reads.
-
Crassostrea gigas - cgigas_uk_roslin_v1
Crassostrea gigas - oyster_v9
Related Resources
-
Compilation of DNA Methylation Genome Feature Tracks (Crassostrea gigas) circa 2015
-
Re-defining Cgigas Canonical features circa 2015
Genome:
-
Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa :
http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa
-
MD5 = 6de9d1239eb10ea0545bed6c4e746d6c
-
FastA index (
samtools faidx
) :http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa.fai
-
Bisulfite Genome:
-
Crassostrea_gigas.oyster_v9.dna_sm.toplevel_bisulfite.tar.gz :
http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel_bisulfite.tar.gz
-
Gzipped tarball of bisulfite genome for use with Bismark
-
Creation details here
-
Genome Feature Tracks
-
Cgigas_v9_gene.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff
-
Cgigas_v9_exon.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff
-
Cgigas_v9_intron.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
-
Cgigas_v9_TE.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff
- Contains Tandem Repeats and wublastx features.
-
Cgigas_v9_CG.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff
- index:
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff.idx
- index:
-
Cgigas_v9_1k5p_gene_promoter.gff :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff
-
Cgigas_v9_COMP_gene_prom_TE.bed :
https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed
-
Crassostrea_gigas.oyster_v9.40.gff3 :
http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.40.gff3
- MD5 = 90a747fbc94a0a9225c43f75cc40b9db
-
Crassostrea_gigas.oyster_v9.40.abinitio.gff3 :
http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.40.abinitio.gff3
- MD5 = c2a8c388f5a8afb22a115d61dee3dda0
grep "mRNA" Crassostrea_gigas.oyster_v9.40.gff3 > Crassostrea_gigas.oyster_v9.40_mRNA.gff3
Crassostrea virginica
Genomes:
-
Cvirginica_v300.fa :
http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300.fa
-
MD5 = f9135e323583dc77fc726e9df2677a32
-
FastA index (
samtools faidx
)- Cvirginica_v300.fa.fai :
http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300.fa.fai
- Cvirginica_v300.fa.fai :
-
-
GCF_002022765.2_C_virginica-3.0_genomic.fna :
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/022/765/GCF_002022765.2_C_virginica-3.0/GCF_002022765.2_C_virginica-3.0_genomic.fna.gz
- compressed version of
Cvirginica_v300.fa
(same files)
- compressed version of
Bisulfite Genomes:
-
Cvirginica_v300_bisulfite.tar.gz :
http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300_bisulfite.tar.gz
-
Gzipped tarball of bisulfite genome for use with Bismark
-
Creation details here
-
Genome Feature Tracks
-
C_virginica-3.0_Gnomon_mRNA.gff3 :
http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_Gnomon_mRNA.gff3
-
C_virginica-3.0_Gnomon_exon.bed :
http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_Gnomon_exon.bed
-
C_virginica-3.0_intron.bed :
http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_intron.bed
-
C_virginica-3.0_CG-motif.bed :
http://eagle.fish.washington.edu/Cvirg_tracks/C_virginica-3.0_CG-motif.bed
-
MD5 = f88c171bccf45a6f3afcf455b6be810f
-
Dead link in this Jupyter Notebook obscures details on how this was generated (via Galaxy):
- https://github.com/sr320/nb-2018/blob/master/C_virginica/22-CG-track.ipynb
-
-
C_virginica-3.0_TE-all.gff :
http://owl.fish.washington.edu/halfshell/genomic-databank/C_virginica-3.0_TE-all.gff
-
MD5 = d0d81fc6cf7525bc2c61984bee23521b
-
-
C_virginica-3.0_TE-Cg.gff :
http://owl.fish.washington.edu/halfshell/genomic-databank/C_virginica-3.0_TE-Cg.gff
-
MD5 = 83cd753c171076464fee1165b7e1c6ba
-
Hematodinium sp. (Host: Chionoecetes bairdi)
Transcriptomes
Assembly Stats Table (Google Sheet)
-
hemat_transcriptome_v1.7.fasta
-
internal short-hand: includes 2018, 2019, 2020-UW with Alveolata only reads.
-
MD5 =
f9c8f96a49506e1810ff4004426160d8
-
FastA index (
samtools faidx
) -
BUSCOs:
C:15.0%[S:12.2%,D:2.8%],F:12.3%,M:72.7%,n:978
-
BLASTx Annotation
-
GO Terms Annotation
-
-
hemat_transcriptome_v1.6.fasta
-
internal short-hand: includes 2018, 2019, 2020-GW, 2020-UW with Alveolata only reads.
-
MD5 =
f9c8f96a49506e1810ff4004426160d8
-
FastA index (
samtools faidx
) -
BUSCOs:
C:26.5%[S:20.7%,D:5.8%],F:11.2%,M:62.3%,n:978
-
BLASTx Annotation
-
GO Terms Annotation
-
-
hemat_transcriptome_v1.5.fasta
-
MD5 =
b8d4a3c1bad2e07da8431bf70bdabfdd
-
BUSCOs:
C:25.6%[S:20.7%,D:4.9%],F:11.7%,M:62.7%,n:978
-
FastA index (
samtools faidx
)- hemat_transcriptome_v1.5.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/hemat_transcriptome_v1.5.fasta.fai
- hemat_transcriptome_v1.5.fasta.fai :
-
Updated assembly from 20200330.
-
BLASTx Annotation (txt; 355KB)
-
Trinotate GO Terms Annotation (txt; 2.3MB)
-
internal short-hand: includes 2018, 2019, 2020-GW with Alveolata only reads.
-
-
hemat_transcriptome_v1.0.fasta (3.9MB)
-
MD5 =
fa5eb74767d180af5265d2d1f80b6430
-
BUSCOs:
C:25.1%[S:19.2%,D:5.9%],F:9.5%,M:65.4%,n:978
-
FastA index (
samtools faidx
)- hemat_transcriptome_v1.0.fasta.fai :
https://owl.fish.washington.edu/halfshell/genomic-databank/hemat_transcriptome_v1.0.fasta.fai
- hemat_transcriptome_v1.0.fasta.fai :
-
Initial Trinity assembly from 20200122
-
BLASTx Annotation (txt; 308KB)
-
Trinotate GO Terms Annotation (txt; 2.1MB)
-
internal short-hand: includes 2018, 2019 with Alveolata only reads.
-
Metacarcinus magister (Cancer magister)
Genome:
-
mmag_pilon_scaffolds.fasta
-
MD5 = 5dfa2ba11edf0ff8191f112e0b1378d1
-
Not shared publicly until permission received from NOAA.
-
Roberts Lab members can access on Owl:
/web/halfshell/genomic-databank/mmag_pilon_scaffolds.fasta
-
Original filename:
pilon_scaffolds.fasta
-
FastA index (
samtools faidx
)mmag_pilon_scaffolds.fasta.fai
-
Ostrea lurida
Genome:
-
Olurida_v081.fa :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa
-
MD5 = 3ac56372bd62038f264d27eef0883bd3
-
This is
v080
with only contigs > 1000bp. Details of howv080
was reduced found here. -
FastA index (
samtools faidx
)- Olurida_v081.fa.fai :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa.fai
- Olurida_v081.fa.fai :
-
-
Olurida_v080.fa :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080.fa
-
MD5 = 9258398f554493e08fdc30e9c1409864
-
FastA index (
samtools faidx
)- Olurida_v080.fa.fai :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080.fa.fai
- Olurida_v080.fa.fai :
-
Also known as
pbjelly_sjw_01
. Details found here, though confirmation would be good.
-
Bisulfite Genomes:
-
Olurida_v080_bisulfite.tar.gz :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080_bisulfite.tar.gz
-
Gzipped tarball of bisulfite genome for use with Bismark
-
Creation details here
Transcriptomes:
-
Olurida_transcriptome_v3.fasta
- MD5 = 9da3242af2be0463051ec7e1f39b2593
Tissue-specific transcriptomes generated by Katherine Silliman
Genome Feature Tracks
-
Olurida_v081_genome_snap02.all.renamed.putative_function.domain_added.gff (2.9GB) :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081_genome_snap02.all.renamed.putative_function.domain_added.gff
- MD5 =
f54512bd964f45645c34b1e8e403a2b0
- MD5 =
-
Olurida_v081-20190709.CDS.gff :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081-20190709.CDS.gff
-
Olurida_v081-20190709.exon.gff :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081-20190709.exon.gff
-
Olurida_v081-20190709.gene.gff :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081-20190709.gene.gff
-
Olurida_v081-20190709.mRNA.gff :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081-20190709.mRNA.gff
-
Olurida_v081_TE-Cg.gff :
http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081_TE-Cg.gff
-
MD5 = 977fd7cdb460cd0b9df5e875e1e880ea
-
Transposable Element track - more details in Sam's Notebook, including a summary table.
-
-
Olurida_v081_CG-motif.gff :
https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081_CG-motif.gff
Panopea generosa
Genome:
-
Panopea-generosa-v1.0.fa :
https://gannet.fish.washington.edu/Atumefaciens/20191105_swoose_pgen_v074_renaming/Panopea-generosa-v1.0.fa
-
Version of 070 containing 18 largest scaffolds (details on subsetting)
-
FastA file and scaffolds were renamed on 20191105 (notebook)
-
MD5 = 32976550b9030126c07920d5f2db179c
-
BUSCO scores:
-
C:71.6%[S:70.7%,D:0.9%],F:4.7%,M:23.7%,n:978
-
-
FastA index (
samtools faidx
):http://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_v074.fa.fai
Bisulfite Genome:
Genome Feature Tracks:
-
Panopea-generosa-vv0.74.a4
These originate from GenSAS annotation on 20190928
Individual feature GFFs were made with the following shell commands:
features_array=(CDS exon gene mRNA repeat_region rRNA tRNA) input="Panopea-generosa-vv0.74.a4-merged-2019-10-07-4-46-46.gff3" for feature in ${features_array[@]} do output="Panopea-generosa-vv0.74.a4.${feature}.gff3" head -n 3 ${input} \ >> ${output} awk -v feature="$feature" '$3 == feature {print}' ${input} \ >> ${output} done
-
Panopea-generosa-vv0.74.a4-merged-2019-10-07-4-46-46.gff3
- Primary GFF containing all features.
-
Transcriptome:
-
Pgenerosa_transcriptome_v5.fasta :
http://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_transcriptome_v5.fasta
- MD5 = 5a21424ecbc88c3b01daefe56bed79da
Transcriptome generated from various libaries - details here
QPX
Genome:
- QPX_v017.fasta :
http://eagle.fish.washington.edu/QPX_genome/QPX_v017.fasta
CLC v5.1 Mismatch cost = 2; Perform scaffolding = Yes; Mapping mode = Map reads back to contigs (slow); Deletion cost = 3; Similarity fraction = 0.9; Length fraction = 0.8; Insertion cost = 3; Update contigs = Yes; Automatic word size = Yes; Minimum contig length = 10000; Automatic bubble size = Yes; input: filtered_QPX_DNA_GTGAAA_L001_R1 trimmed.
- QPX_v017.fasta :
https://ndownloader.figshare.com/files/3085550
CLC v5.1 Mismatch cost = 2; Perform scaffolding = Yes; Mapping mode = Map reads back to contigs (slow); Deletion cost = 3; Similarity fraction = 0.9; Length fraction = 0.8; Insertion cost = 3; Update contigs = Yes; Automatic word size = Yes; Minimum contig length = 10000; Automatic bubble size = Yes; input: filtered_QPX_DNA_GTGAAA_L001_R1 trimmed.
- QPX_v015.fasta :
https://doi.org/10.1371/journal.pone.0074196.s001
De novo assembly was performed with Genomics Workbench v. 5.0 (CLC Bio, Germany) on quality trimmed sequences with the following parameters: mismatch cost = 2, deletion cost = 3, similarity fraction = 0.9, insertion cost = 3, length fraction = 0.8 and minimum contig size of 100 bp for genomic data and 200 bp for transcriptomic data. In order to remove ribosomal RNA sequences from the transcriptome data, consensus sequences were compared to the NCBI nt database using the BLASTn algorithm [59]. Sequences with significant matches (9) were removed and not considered in subsequent analyses.
Manuscript: https://doi.org/10.1371/journal.pone.0074196
Transcriptome:
QPX_Transcriptome v2.1
Subset of version 1 (v1) that only includes sequences with e-value < 1E-20. Based on Swiss-Prot blastx output, all sequences are oriented 5' - 3'. nucleotides between stop codons; minimum size 200.