Latest revision as of 18:25, 18 February 2010

Mobile elements

plasmids
bacteriophages:
- up to 20% of the genome
- most common transporters of virulence genes in bacteria
- have site specificity
transposable elements
- up to 2Kbp
- no site specificity

Tandem repeats

satellites (spanning megabases of DNA, associated with heterochromatin)
minisatellites (repeat units in the range 6-100 bp, spanning hundreds of base-pairs)
microsatellites (repeat units in the range 1-5 bp, spanning a few tens of nucleotides).

Insertion Elements(IS)

0.7-2.5K bp
small, genetically compact (1-2 ORFs) : transposase and/or reverse transcriptase
end in short terminal inverted repeat sequences (IR) 10-40bp
ISFinder

Software Packages

MUMer repeat-match :
- does not classify the repeats
- works fine for small genomes

  ~/bin/RepeatSearch.amos prefix
 ...
 10: $(BINPATH)/repeat-match -n $(REPEATLEN) $(PREFIX).fasta | $(SCRIPTPATH)/repeat-match2gff.pl > $(REPEATS).gff
 20: $(SCRIPTPATH)/extractfromfastagff.pl $(PREFIX).fasta $(REPEATS).gff > $(REPEATS).fasta
 30: $(BINPATH)/nucmer -maxmatch $(REPEATS).fasta $(REPEATS).fasta -p $(REPEATS)
 40: $(BINPATH)/show-coords  -c -l  -r -o -H $(REPEATS).delta  | awk '{print $18,$19}' | ~/bin/cluster.pl > $(REPEATS).cluster
 50: $(SCRIPTPATH)/extractfromfastanames.pl -f $(REPEATS).cluster < $(REPEATS).fasta > $(REPEATS).cluster.fasta

- does not work on large genomes

 split the genome in smaller pieces (10Mbp?)
 align them to one another:
 nucmer -l 20 -c 65   -g 90  -b 200  (default): takes too long
 nucmer -l 35 -c 2500 -g 100

RepeatModeler
RepeatScout
RepeatMasker ; RepBase : mostly eukariotic genomes

 Library:
   $ ls /fs/szdevel/dpuiu/RepeatMasker/Libraries/RepeatMaskerLib.embl 
 
   $ ~/bin//readseq.sh -f Fasta -o RepeatMaskerLib.fasta RepeatMaskerLib.embl
 
   $ infoseq RepeatMaskerLib.fasta | getSummary.pl -c 1 -t Len
             #elem   min     max     mean    median  n50     sum
     Len     9055    4       35042   2205    890     4846    19966330

Articles

Mobile DNA in obligate intracellular bacteria

@@ Line 1: / Line 1: @@
-= Packages =
+= Mobile elements =
+* plasmids
+* bacteriophages:
+** up to 20% of the genome
+** most common transporters of virulence genes in bacteria
+** have site specificity
+* transposable elements
+** up to 2Kbp
+** no site specificity
+= Tandem repeats =
+* satellites (spanning megabases of DNA, associated with heterochromatin)
+* minisatellites (repeat units in the range 6-100 bp, spanning hundreds of base-pairs)
+* microsatellites (repeat units in the range 1-5 bp, spanning a few tens of nucleotides).
+= Insertion Elements(IS) =
+* 0.7-2.5K bp
+* small, genetically compact (1-2 ORFs) : transposase and/or reverse transcriptase
+* end in short terminal inverted repeat sequences (IR) 10-40bp
+* [http://www-is.biotoul.fr/ ISFinder]
+= Software Packages =
+* [http://mummer.sourceforge.net/manual/#repeat MUMer repeat-match] :
+** does not classify the repeats
+** works fine for small genomes
+   ~/bin/RepeatSearch.amos prefix
+  ...
+: $(BINPATH)/repeat-match -n $(REPEATLEN) $(PREFIX).fasta | $(SCRIPTPATH)/repeat-match2gff.pl > $(REPEATS).gff
+: $(SCRIPTPATH)/extractfromfastagff.pl $(PREFIX).fasta $(REPEATS).gff > $(REPEATS).fasta
+: $(BINPATH)/nucmer -maxmatch $(REPEATS).fasta $(REPEATS).fasta -p $(REPEATS)
+: $(BINPATH)/show-coords  -c -l  -r -o -H $(REPEATS).delta  | awk '{print $18,$19}' | ~/bin/cluster.pl > $(REPEATS).cluster
+: $(SCRIPTPATH)/extractfromfastanames.pl -f $(REPEATS).cluster < $(REPEATS).fasta > $(REPEATS).cluster.fasta
+** does not work on large genomes
+  split the genome in smaller pieces (10Mbp?)
+  align them to one another:
+  nucmer -l 20 -c 65   -g 90  -b 200  (default): takes too long
+  nucmer -l 35 -c 2500 -g 100
-* [http://mummer.sourceforge.net/manual/#repeat MUMer repeat-match] : does not classify the repeats
 * [http://www.repeatmasker.org/RepeatModeler.html RepeatModeler]
 * [http://bix.ucsd.edu/repeatscout/ RepeatScout]
 * [http://www.repeatmasker.org/ RepeatMasker] ; [http://www.girinst.org/server/RepBase RepBase] : mostly eukariotic genomes
+  Library:
+    $ ls /fs/szdevel/dpuiu/RepeatMasker/Libraries/RepeatMaskerLib.embl
+    $ ~/bin//readseq.sh -f Fasta -o RepeatMaskerLib.fasta RepeatMaskerLib.embl
+    $ infoseq RepeatMaskerLib.fasta | getSummary.pl -c 1 -t Len
+              #elem   min     max     mean    median  n50     sum
+      Len     9055    4       35042   2205    890     4846    19966330
+* [http://minisatellites.u-psud.fr/GPMS/ Microorganisms Tandem Repeats Database (Online,FR)]
+* [http://crispr.u-psud.fr/Server/CRISPRfinder.php/ CRISPRfinder (Online,FR)]; [http://nar.oxfordjournals.org/cgi/content/full/gkm360v2 Article]
+* [http://tandem.bu.edu/trf/trf.html TRF]
+= Articles =
+* [http://www.nature.com/nrmicro/journal/v3/n9/full/nrmicro1233.html Mobile DNA in obligate intracellular bacteria]

Repeat search: Difference between revisions

Latest revision as of 18:25, 18 February 2010

Contents

Mobile elements

Tandem repeats

Insertion Elements(IS)

Software Packages

Articles

Navigation menu