Per this GitHub Issue Steven asked that I take a list of gene names associated with DNA methylation and see if I could extract a list of Panopea generosa (Panopea generosa) gene IDs and corresponding BLAST e-values for each from our P.generosa genome annotation (see Genomic Resources wiki for more info).
Here’s the list of gene names provided:
dnmt1
dnmt3a
dnmt3b
dnmt3l
mbd1
mbd2
mbd3
mbd4
mbd5
mbd6
mecp2
Baz2a
Baz2b
UHRF1
UHRF2
Kaiso
zbtb4
zbtb38b
zfp57
klf4
egr1
wt1
ctcf
tet1
tet2
tet3
The operations were run in a Jupyter Notebook. All results are available in the notebook, as well as in the RESULTS section below.
Briefly, here’s how the process was run:
Use list of gene names to scan GenSAS
Panopea-generosa-vv0.74.a4.gene.gff3
Use list of matches to scan both GenSAS BLAST results files:
Panopea-generosa-vv0.74.a4.5d951a9b74287-blast_functional.tab
Panopea-generosa-vv0.74.a4.5d951bcf45b4b-diamond_functional.tab
Extract e-values for any matches.
Print out tab-delimited table of P.generosa gene IDs, gene names, and both BLAST results e-values, if present.
Jupyter Notebook:
RESULTS
Tab-delimited:
gene_ID gene_name BLASTp_evalue DIAMOND_evalue
PGEN_.00g104080 Baz2b 1.05e-98 5.4e-102
PGEN_.00g104170 Baz2b 3.09e-96 1.2e-109
PGEN_.00g116950 mbd5 6.40e-21 2.8e-20
PGEN_.00g186870 ctcf 1.25e-116
PGEN_.00g192900 UHRF1 2.32e-19
PGEN_.00g202750 mbd2 9.46e-82 2.6e-63
PGEN_.00g209890 mbd2 4.37e-19 9.2e-09
PGEN_.00g209900 mbd4 3.14e-32 8.0e-29
PGEN_.00g243700 egr1 6.24e-58 2.2e-23
PGEN_.00g249090 egr1 4.19e-18 2.6e-06
PGEN_.00g283000 dnmt1 5.03e-10
PGEN_.00g283010 dnmt1 0.0 7.3e-224
Gene_ID | gene_name | BLASTp_evalue | DIAMOND_evalue |
---|---|---|---|
PGEN_.00g104080 | Baz2b | 1.05E-98 | 5.40E-102 |
PGEN_.00g104170 | Baz2b | 3.09E-96 | 1.20E-109 |
PGEN_.00g116950 | mbd5 | 6.40E-21 | 2.80E-20 |
PGEN_.00g186870 | ctcf | 1.25E-116 | |
PGEN_.00g192900 | UHRF1 | 2.32E-19 | |
PGEN_.00g202750 | mbd2 | 9.46E-82 | 2.60E-63 |
PGEN_.00g209890 | mbd2 | 4.37E-19 | 9.20E-09 |
PGEN_.00g209900 | mbd4 | 3.14E-32 | 8.00E-29 |
PGEN_.00g243700 | egr1 | 6.24E-58 | 2.20E-23 |
PGEN_.00g249090 | egr1 | 4.19E-18 | 2.60E-06 |
PGEN_.00g283000 | dnmt1 | 5.03E-10 | 8.20E-28 |
PGEN_.00g283010 | dnmt1 | 0 | 7.30E-224 |