Mobile elements
- plasmids
- bacteriophages:
- up to 20% of the genome
- most common transporters of virulence genes in bacteria
- have site specificity
- transposable elements
- up to 2Kbp
- no site specificity
Tandem repeats
- satellites (spanning megabases of DNA, associated with heterochromatin)
- minisatellites (repeat units in the range 6-100 bp, spanning hundreds of base-pairs)
- microsatellites (repeat units in the range 1-5 bp, spanning a few tens of nucleotides).
Insertion Elements(IS)
- 0.7-2.5K bp
- small, genetically compact (1-2 ORFs) : transposase and/or reverse transcriptase
- end in short terminal inverted repeat sequences (IR) 10-40bp
- ISFinder
Software Packages
~/bin/RepeatSearch.amos prefix
...
10: $(BINPATH)/repeat-match -n $(REPEATLEN) $(PREFIX).fasta | $(SCRIPTPATH)/repeat-match2gff.pl > $(REPEATS).gff
20: $(SCRIPTPATH)/extractfromfastagff.pl $(PREFIX).fasta $(REPEATS).gff > $(REPEATS).fasta
30: $(BINPATH)/nucmer -maxmatch $(REPEATS).fasta $(REPEATS).fasta -p $(REPEATS)
40: $(BINPATH)/show-coords -c -l -r -o -H $(REPEATS).delta | awk '{print $18,$19}' | ~/bin/cluster.pl > $(REPEATS).cluster
50: $(SCRIPTPATH)/extractfromfastanames.pl -f $(REPEATS).cluster < $(REPEATS).fasta > $(REPEATS).cluster.fasta
Library:
$ ls /fs/szdevel/dpuiu/RepeatMasker/Libraries/RepeatMaskerLib.embl
$ ~/bin//readseq.sh -f Fasta -o RepeatMaskerLib.fasta RepeatMaskerLib.embl
$ infoseq RepeatMaskerLib.fasta | getSummary.pl -c 1 -t Len
#elem min max mean median n50 sum
Len 9055 4 35042 2205 890 4846 19966330
Articles