Skip to content

Genomic Resources

Here we try to compile genomic resources such that they are readily available and somewhat described. An effort will be made to keep respective index files alongside so these files can be directly used in IGV etc.

Related Resources - Archived Versions of this page - 091319;


Chionoecetes bairdi


C.goreaui

Genomes:

  • /volume1/web/halfshell/genomic-databank/Cladocopium_goreaui_genome_fa (1.1GB)

    • MD5 checksum: eb4a1a7ac2fc0cbc6f5c178240beb932

    • Downloaded 20230216: https://espace.library.uq.edu.au/view/UQ:fba3259

    • Access to the genome requires agreeing to some licensing provisions (primarily the requirement to cite the publication whenever the genome is used), so we will not be providing any public links to the file.

    • Chen et. al, 2022

Genome Indexes (HISAT2):

  • `` (tarball gzip; 563MB)

    • MD5 checksum: ``

    • Needs to be unpacked before use!

Genome Feature Tracks


Crassostrea gigas - cgigas_uk_roslin_v1

Genome assembly with mitochondrial DNA included: - cgigas_uk_roslin_v1_genomic-mito.fa: https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa

Genome feature tracks generated from the NCBI RefSeq link in this Jupyter notebook

Crassostrea gigas - oyster_v9

Related Resources

Genome:

  • Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa : http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa

    • MD5 = 6de9d1239eb10ea0545bed6c4e746d6c

    • FastA index (samtools faidx) : http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa.fai

Bisulfite Genome:

Genome Feature Tracks

  • Cgigas_v9_gene.gff : https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff

  • Cgigas_v9_exon.gff : https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff

  • Cgigas_v9_intron.gff : https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff

  • Cgigas_v9_TE.gff : https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff

  • Cgigas_v9_CG.gff : https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff

    • index: https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff.idx
  • Cgigas_v9_1k5p_gene_promoter.gff : https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff

  • Cgigas_v9_COMP_gene_prom_TE.bed : https://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed

  • Crassostrea_gigas.oyster_v9.40.gff3 : http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.40.gff3

    • MD5 = 90a747fbc94a0a9225c43f75cc40b9db
  • Crassostrea_gigas.oyster_v9.40.abinitio.gff3 : http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.40.abinitio.gff3

    • MD5 = c2a8c388f5a8afb22a115d61dee3dda0
  • Crassostrea_gigas.oyster_v9.40_mRNA.gff3

    • grep "mRNA" Crassostrea_gigas.oyster_v9.40.gff3 > Crassostrea_gigas.oyster_v9.40_mRNA.gff3

Crassostrea virginica

NCBI FTP

Genomes:

  • Cvirginica_v300.fa : http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300.fa

    • MD5 = f9135e323583dc77fc726e9df2677a32

    • FastA index (samtools faidx)

  • GCF_002022765.2_C_virginica-3.0_genomic.fna.gz : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/022/765/GCF_002022765.2_C_virginica-3.0/GCF_002022765.2_C_virginica-3.0_genomic.fna.gz

    • compressed version of Cvirginica_v300.fa (same files)

Annotations:

Bisulfite Genomes:

  • Cvirginica_v300_bisulfite.tar.gz : http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300_bisulfite.tar.gz

    • Gzipped tarball of bisulfite genome for use with Bismark

    • Creation details here

Genome Feature Tracks


Gadus macrocephalus (Pacific cod)

NCBI

Genomes

Genome Feature Tracks

Hematodinium sp. (Host: Chionoecetes bairdi)

Transcriptomes

Assembly Stats Table (Google Sheet)

  • hemat_transcriptome_v1.7.fasta

    - internal short-hand: includes 2018, 2019, 2020-UW with _Alveolata_ only reads.
    
    - MD5 = `f9c8f96a49506e1810ff4004426160d8`
    
    - FastA index (```samtools faidx```)
    
        - [hemat_transcriptome_v1.7.fasta.fai](https://gannet.fish.washington.edu/Atumefaciens/20210308_hemat_trinity_v1.6_v1.7/hemat_transcriptome_v1.7.fasta_trinity_out_dir/hemat_transcriptome_v1.7.fasta.fai)
    
    - [Notebook entry](https://robertslab.github.io/sams-notebook/2021/03/08/Transcriptome-Assembly-Hematodinium-Transcriptomes-v1.6-and-v1.7-with-Trinity-on-Mox.html)
    
    - BUSCOs: `C:15.0%[S:12.2%,D:2.8%],F:12.3%,M:72.7%,n:978`
    
        - [Notebook entry](https://robertslab.github.io/sams-notebook/2020/08/14/Transcriptome-Assessment-BUSCO-Metazoa-on-Hematodinium-v1.6-v1.7-v2.1-and-v3.1-on-Mox.html)
    
    - BLASTx Annotation
    
      - [hemat_transcriptome_v1.7.fasta.blastx.outfmt6](https://gannet.fish.washington.edu/Atumefaciens/20200814_hemat_diamond_blastx_v1.6_v1.7_v2.1_v3.1/hemat_transcriptome_v1.7.fasta.blastx.outfmt6)
    
      - [Notebook entry](https://robertslab.github.io/sams-notebook/2020/08/14/Transcriptome-Annotation-Hematodinium-Transcriptomes-v1.6-v1.7-v2.1-v3.1-with-DIAMOND-BLASTx-on-Mox.html)
    
    - GO Terms Annotation
    
      - [20210310.hemat_transcriptome_v1.7.fasta.trinotate.go_annotations.txt](https://gannet.fish.washington.edu/Atumefaciens/20210309_hemat_trinotate_transcriptome-v1.7/20210310.hemat_transcriptome_v1.7.fasta.trinotate.go_annotations.txt) (Trinotate)
    
      - [Notebook entry](https://robertslab.github.io/sams-notebook/posts/2021/2021-03-09-Transcriptome-Annotation---Trinotate-Hematodinium-v1.7-on-Mox/index.html)
    
  • hemat_transcriptome_v1.6.fasta

    - internal short-hand: includes 2018, 2019, 2020-GW, 2020-UW with _Alveolata_ only reads.
    
    - MD5 = `f9c8f96a49506e1810ff4004426160d8`
    
    - FastA index (```samtools faidx```)
    
        - [hemat_transcriptome_v1.6.fasta.fai](https://gannet.fish.washington.edu/Atumefaciens/20210308_hemat_trinity_v1.6_v1.7/hemat_transcriptome_v1.6.fasta_trinity_out_dir/hemat_transcriptome_v1.6.fasta.fai)
    
    - [Notebook entry](https://robertslab.github.io/sams-notebook/2021/03/08/Transcriptome-Assembly-Hematodinium-Transcriptomes-v1.6-and-v1.7-with-Trinity-on-Mox.html)
    
    - BUSCOs: `C:26.5%[S:20.7%,D:5.8%],F:11.2%,M:62.3%,n:978`
    
      - [Notebook entry](https://robertslab.github.io/sams-notebook/2020/08/14/Transcriptome-Assessment-BUSCO-Metazoa-on-Hematodinium-v1.6-v1.7-v2.1-and-v3.1-on-Mox.html)
    
    - BLASTx Annotation
    
      - [hemat_transcriptome_v1.6.fasta.blastx.outfmt6](https://gannet.fish.washington.edu/Atumefaciens/20200814_hemat_diamond_blastx_v1.6_v1.7_v2.1_v3.1/hemat_transcriptome_v1.6.fasta.blastx.outfmt6)
    
      - [Notebook entry](https://robertslab.github.io/sams-notebook/2020/08/14/Transcriptome-Annotation-Hematodinium-Transcriptomes-v1.6-v1.7-v2.1-v3.1-with-DIAMOND-BLASTx-on-Mox.html)
    
    - GO Terms Annotation
    
      - [20210309.hemat_transcriptome_v1.6.fasta.trinotate.go_annotations.txt](https://gannet.fish.washington.edu/Atumefaciens/20210309_hemat_trinotate_transcriptome-v1.6/20210309.hemat_transcriptome_v1.6.fasta.trinotate.go_annotations.txt) (Trinotate)
    
      - [Notebook entry](https://robertslab.github.io/sams-notebook/posts/2021/2021-03-09-Transcriptome-Annotation---Trinotate-Hematodinium-v1.6-on-Mox/index.html)
    
  • hemat_transcriptome_v1.5.fasta

  • hemat_transcriptome_v1.0.fasta (3.9MB)


Metacarcinus magister (Cancer magister)

Genome:

  • mmag_pilon_scaffolds.fasta

    • MD5 = 5dfa2ba11edf0ff8191f112e0b1378d1

    • Not shared publicly until permission received from NOAA.

    • Roberts Lab members can access on Owl: /web/halfshell/genomic-databank/mmag_pilon_scaffolds.fasta

    • Original filename: pilon_scaffolds.fasta

    • FastA index (samtools faidx)

      • mmag_pilon_scaffolds.fasta.fai

Montipora capitata

Genomes:

Genome Indexes (HISAT2)

Genome Feature Tracks


Mytilus trossulus

Transcriptome:

Ostrea lurida

Genome:

  • Olurida_v081.fa : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa

    • MD5 = 3ac56372bd62038f264d27eef0883bd3

    • This is v080 with only contigs > 1000bp. Details of how v080 was reduced found here.

    • FastA index (samtools faidx)

      • Olurida_v081.fa.fai : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa.fai
  • Olurida_v080.fa : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080.fa

    • MD5 = 9258398f554493e08fdc30e9c1409864

    • FastA index (samtools faidx)

      • Olurida_v080.fa.fai : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080.fa.fai
    • Also known as pbjelly_sjw_01. Details found here, though confirmation would be good.

Bisulfite Genomes:

  • Olurida_v080_bisulfite.tar.gz : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080_bisulfite.tar.gz

  • Gzipped tarball of bisulfite genome for use with Bismark

  • Creation details here

Transcriptomes:

Tissue-specific transcriptomes generated by Katherine Silliman

Genome Feature Tracks


Panopea generosa

Genome:

Bisulfite Genome:

Genome Feature Tracks:

  • Panopea-generosa-vv0.74.a4

    These originate from GenSAS annotation on 20190928

    Individual feature GFFs were made with the following shell commands:

```bash

features_array=(CDS exon gene mRNA repeat_region rRNA tRNA)

input="Panopea-generosa-vv0.74.a4-merged-2019-10-07-4-46-46.gff3"

for feature in ${features_array[@]}
  do
  output="Panopea-generosa-vv0.74.a4.${feature}.gff3"
  head -n 3 ${input} \
  >> ${output}
  awk -v feature="$feature" '$3 == feature {print}' ${input} \
  >> ${output}
done
```

- [GFF files and scaffolds were renamed on 20191105](https://robertslab.github.io/sams-notebook/2019/11/05/Data-Wrangling-Rename-Pgenerosa_v074-Files-and-Scaffolds.html) (notebook)

Fasta files:

Jupyter notebook with creation deets (NB Viewer):

CDS FastA description lines look like this:

  • PGEN_.00g000010.m01.CDS01|PGEN_.00g000010.m01::Scaffold_01:2-125

Explanation for CDS:

  • PGEN_.00g000010.m01.CDS01: Unique sequence ID.
  • PGEN_.00g000010.m01: "Parent" ID. Corresponds to unique mRNA ID.
  • Scaffold_01: Originating scaffold.
  • 2-125: Sequence coordinates from scaffold mentioned above.

mRNA FastA description looks like this:

  • PGEN_.00g000030.m01|PGEN_.00g000030::Scaffold_01:49248-52578

Explanation for mRNA:

  • PGEN_.00g000030.m01: Unique sequence ID.
  • PGEN_.00g000030: "Parent" ID. Corresponds to unique gene ID.
  • Scaffold_01: Originating scaffold.
  • 49248-52578: Sequence coordinates from scaffold mentioned above.

  • Pgenerosa_transcriptome_v5.fasta : http://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_transcriptome_v5.fasta

    • MD5 = 5a21424ecbc88c3b01daefe56bed79da

Transcriptome generated from various libaries - details here.


Pocillipora acuta

Genome:

Genome Index (HISAT2):

Genome Feature Tracks:


Pocillopora meandrina

Genome(s):

Genome Indexes (HISAT2):

Genome Feature Tracks


Pocillipora verrucosa

Genomes:

Genome Indexes (HISAT2):

Genome Feature Tracks


Pycnopodia helianthodes

Genome


QPX

Genome:

  • QPX_v017.fasta : http://eagle.fish.washington.edu/QPX_genome/QPX_v017.fasta

CLC v5.1 Mismatch cost = 2; Perform scaffolding = Yes; Mapping mode = Map reads back to contigs (slow); Deletion cost = 3; Similarity fraction = 0.9; Length fraction = 0.8; Insertion cost = 3; Update contigs = Yes; Automatic word size = Yes; Minimum contig length = 10000; Automatic bubble size = Yes; input: filtered_QPX_DNA_GTGAAA_L001_R1 trimmed.

CLC v5.1 Mismatch cost = 2; Perform scaffolding = Yes; Mapping mode = Map reads back to contigs (slow); Deletion cost = 3; Similarity fraction = 0.9; Length fraction = 0.8; Insertion cost = 3; Update contigs = Yes; Automatic word size = Yes; Minimum contig length = 10000; Automatic bubble size = Yes; input: filtered_QPX_DNA_GTGAAA_L001_R1 trimmed.

De novo assembly was performed with Genomics Workbench v. 5.0 (CLC Bio, Germany) on quality trimmed sequences with the following parameters: mismatch cost = 2, deletion cost = 3, similarity fraction = 0.9, insertion cost = 3, length fraction = 0.8 and minimum contig size of 100 bp for genomic data and 200 bp for transcriptomic data. In order to remove ribosomal RNA sequences from the transcriptome data, consensus sequences were compared to the NCBI nt database using the BLASTn algorithm [59]. Sequences with significant matches (9) were removed and not considered in subsequent analyses.

Manuscript: https://doi.org/10.1371/journal.pone.0074196

Transcriptome:

QPX_Transcriptome v2.1

Subset of version 1 (v1) that only includes sequences with e-value \< 1E-20. Based on Swiss-Prot blastx output, all sequences are oriented 5' - 3'. nucleotides between stop codons; minimum size 200.

Salvelinus namaycush (lake trout)

Genome:

Genome Feature Tracks: