Per this GitHub issue, I’m IDing transposable elements (TEs) in the Crassostrea gigas genome. Even though the C.gigas genome should be fully annotated, Steven wants a comparable set of analyses to compare to the Crassostrea virginica TE mapping we previously performed.
I used the Crassostrea gigas genome we have linked on our GitHub Genomic Resources wiki:
Analysis was performed in the following Jupyter Notebok (GitHub):
RESULTS
This took ~24hrs to complete.
Output folder:
Genome used (from our Genomic Resources wiki):
GFF file:
Summary table (text):
==================================================
file name: Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa
sequences: 7658
total length: 557717710 bp (491860439 bp excl N/X-runs)
GC level: 33.42 %
bases masked: 160369613 bp ( 32.60 %)
==================================================
number of length percentage
elements* occupied of sequence
--------------------------------------------------
Retroelements 48481 19773596 bp 4.02 %
SINEs: 2498 317084 bp 0.06 %
Penelope 5749 1808270 bp 0.37 %
LINEs: 26463 10472676 bp 2.13 %
CRE/SLACS 15 1289 bp 0.00 %
L2/CR1/Rex 1712 307207 bp 0.06 %
R1/LOA/Jockey 299 21470 bp 0.00 %
R2/R4/NeSL 218 69735 bp 0.01 %
RTE/Bov-B 8417 3631379 bp 0.74 %
L1/CIN4 983 64189 bp 0.01 %
LTR elements: 19520 8983836 bp 1.83 %
BEL/Pao 2050 1349545 bp 0.27 %
Ty1/Copia 2139 189535 bp 0.04 %
Gypsy/DIRS1 11971 6501545 bp 1.32 %
Retroviral 1263 69288 bp 0.01 %
DNA transposons 299050 85782505 bp 17.44 %
hobo-Activator 9348 2278556 bp 0.46 %
Tc1-IS630-Pogo 32515 8695261 bp 1.77 %
En-Spm 0 0 bp 0.00 %
MuDR-IS905 0 0 bp 0.00 %
PiggyBac 4136 747000 bp 0.15 %
Tourist/Harbinger 11590 2828277 bp 0.58 %
Other (Mirage, 232 14514 bp 0.00 %
P-element, Transib)
Rolling-circles 0 0 bp 0.00 %
Unclassified: 109149 49075277 bp 9.98 %
Total interspersed repeats: 154631378 bp 31.44 %
Small RNA: 830 93282 bp 0.02 %
Satellites: 2087 401812 bp 0.08 %
Simple repeats: 110847 4687373 bp 0.95 %
Low complexity: 16716 787611 bp 0.16 %
==================================================
* most repeats fragmented by insertions or deletions
have been counted as one element
Runs of >=20 X/Ns in query were excluded in % calcs
The query species was assumed to be root
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
run with rmblastn version 2.6.0+
I’ve put together the TE comparison requested in the GitHub Issue mentioned above in a Google Sheet: