Continuing our various attempts at annotating our geoduck genome assemblies, I will be re-annotating our Pgenerosa_v070 (see Genome Resources GitHub wiki for deets) and realized I hadn’t run RepeatMasker on this assembly previously. Running RepeatMasker will generate a GFF that I can supply to MAKER to aid in repeats identification.
I ran RepeatMasker 4.07 on Roadrunner (Apple Xserve running Ubuntu 16.04LTS) using the “all” species setting.
All of it is detailed in this Jupyter Notebook (GitHub):
RESULTS
This took ~4 days to run - longer than I expected.
Will use the GFF in subsequent MAKER annotation.
Output folder:
Summary table (text):
Output GFF:
SUMMARY TABLE
==================================================
file name: Pgenerosa_v070.fa
sequences: 313649
total length: 2205688688 bp (2005531528 bp excl N/X-runs)
GC level: 33.92 %
bases masked: 175175579 bp ( 8.73 %)
==================================================
number of length percentage
elements* occupied of sequence
--------------------------------------------------
Retroelements 565711 87788537 bp 4.38 %
SINEs: 332333 39506023 bp 1.97 %
Penelope 6883 788411 bp 0.04 %
LINEs: 142649 32744907 bp 1.63 %
CRE/SLACS 1237 100944 bp 0.01 %
L2/CR1/Rex 40317 7764197 bp 0.39 %
R1/LOA/Jockey 10137 2942539 bp 0.15 %
R2/R4/NeSL 3825 551996 bp 0.03 %
RTE/Bov-B 26939 6768723 bp 0.34 %
L1/CIN4 21435 4046589 bp 0.20 %
LTR elements: 90729 15537607 bp 0.77 %
BEL/Pao 6594 918331 bp 0.05 %
Ty1/Copia 16409 1268565 bp 0.06 %
Gypsy/DIRS1 50972 11376086 bp 0.57 %
Retroviral 9680 690936 bp 0.03 %
DNA transposons 259955 34987123 bp 1.74 %
hobo-Activator 29756 3192075 bp 0.16 %
Tc1-IS630-Pogo 67456 9717356 bp 0.48 %
En-Spm 0 0 bp 0.00 %
MuDR-IS905 0 0 bp 0.00 %
PiggyBac 1553 121136 bp 0.01 %
Tourist/Harbinger 7596 1054167 bp 0.05 %
Other (Mirage, 1803 123196 bp 0.01 %
P-element, Transib)
Rolling-circles 0 0 bp 0.00 %
Unclassified: 99928 13654973 bp 0.68 %
Total interspersed repeats: 136430633 bp 6.80 %
Small RNA: 42601 2192413 bp 0.11 %
Satellites: 33350 6282246 bp 0.31 %
Simple repeats: 596607 32793030 bp 1.64 %
Low complexity: 75831 3754962 bp 0.19 %
==================================================
* most repeats fragmented by insertions or deletions
have been counted as one element
Runs of >=20 X/Ns in query were excluded in % calcs
The query species was assumed to be root
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
run with rmblastn version 2.6.0+