Assemblers

Denovo

minimus

hash-overlap minimum overlap length:

    40 bp default : too large for contig assemblies
    20 bp minimum overlap ok; 
    minimizer window length must be >=15bp; could these values be dropped lower?
    very slow on large sequences (ex Ps.fasta, Ps.plasmid.fasta) even if USE_SIMPLE_OVERLAP=1 !!! WHY???
    analyzes internal overlaps, not the sequence end ones only

tigger can handle "well" overlaps as short as 5bp

minimus1

hash-overlap was replaced with a nucmer based overlapper
discards CONTAINED/IDENTITY alignments in order to avoid collapsing repeats
the middle of the contigs could be masked to make the alignment process faster; but CONTAINED overlaps would appear as BEGIN?END

Example:

  ....
  OVERLAP=16
  MINID=94
  MAXTRIM=20
  ...
  20: $(NUCMER) -maxmatch -l $(OVERLAP) -c $(OVERLAP) $(SEQS) $(SEQS) -p $(PREFIX)
  24: $(SHOWCOORDS) -H -c -l -o -r -I $(MINID) $(ALIGN) | egrep 'BEGIN|END' > $(COORDS)
  25: $(SCRIPTDIR)/nucmer2ovl.pl -ignore $(MAXTRIM) -tab $(COORDS)> $(OVLTAB)
  26: $(SCRIPTDIR)/ovl2OVL.pl $(OVLTAB)  > $(OVL)
  28: $(BINDIR)/bank-transact -z -b $(BANK) -m $(OVL)

minimus2

merges 2 assemblies

...

SSAKE

Short Sequence Assembly by K-mer search and 3' read Extension
Rene L Warren, Granger G Sutton, Steven JM Jones, Robert A Holt. "Assembling millions of short DNA sequences using SSAKE" Bioinformatics. Vol. 23 no. 4 2007, pages 500–501
Uses a greedy assembly algorithm: progressively searches for perfect 3’-most k-mers using a DNA prefix tree to identify overlaps between any two sequences; stringently clusters short reads into contigs
Copyright (c) 2006-2008 Canada's Michael Smith Genome Science Centre.
Current release: SSAKE 3.2 (12/07/2007)
Distribution: PERL script; runs on Linux systems
Parameters:

 -m  Minimum number of reads needed to call a base during overhang consensus build up (default -m 16)
 -o  Minimum number of reads needed to call a base during an extension (default -o 2)
 -r  Minimum base ratio used to accept a overhang consensus base (default -r 0.6)

Example:

 $ grep -c "^>" Pa.seq           # total number of reads
 $ (time ssake_3.0.pl -f Pa.seq) > ssake_3.0.log             # version 3.0
 $ (time ssake_3.2.pl -f Pa.seq -t 1) > ssake_3.2.log        # version 3.2
 $ ln -s Pa*contigs  Pa.fasta
 $ ln -s Pa*singlets Pa.singlets.seq
 $ cat Pa.fasta        | grep "^>" | perl -ane '/read(\d+)/; print $1, "\n";' | getSummaryDescriptive.pl -t assembled_reads    # number of assembled reads
 $ cat Pa.singlets.seq | grep "^>" | perl -ane '/read(\d+)/; print $1, "\n";' | getSummaryDescriptive.pl -t singletons         # number of singletons
 $ grep -v N Pa.seq              # number of reads which contain ambiguities and will be discarder from the beginning (should be added to the singletons)

Velvet

Sequence assembler for very short reads
Daniel R. Zerbino and Ewan Birney, "Velvet: Algorithms for De Novo Short Read Assembly Using De Bruijn Graphs", Genome Res. published online Mar 18, 2008
Uses Bruijn graphs to assemble sequences: first uses an error correction algorithm to merge sequences which belong together, then a repeat solver separates paths sharing local overlaps
Distribution: C program, to be compiled with gcc
Open Source, GPL agreement, EBI.
Current release: 0.5.07 (27/02/2008)
Parameters:

 - hash_length: odd integer (if even, it will be decremented) <= 31 (if above, will be reduced)

* minimum overlap length (30-40X coverage data set):
    18bp usually gives the fewest contigs
    24bp is also ok
    15bp is too low => too many short contigs
*  -long option for including "long sequences" (Sanger, 454 or even reference sequences) in the assembly; did not work on including AMOScmp contigs; Sanger& 454 not tested

* velvet_assy.afg lists the assembled reads (TLE messages)
* some reads extend beyond the contig ends
* there are some reads shared by multiple contigs (status:D) ; at least one of the instances occurs at a contig end (goes beyond the contig end)
* many contigs overlap by a few bp
* minimum_ctg length =2*minimum_ovl-1 (23=>45, 21=>41 ...)

Example:

 $ velveth . ovl prefix.seq                                # ovl should be an odd integer like 21, 23 
 $ velvetg . -read_trkg yes
 
 $ bank-transact -c -z -b prefix.bnk -m velvet_assy.afg
 $ bank2contig prefix.bnk  > prefix.contig
 $ bank2fasta -b prefix.bnk > prefix.fasta
 
 $ listReadPlacedStatus -S -E prefix.bnk > prefix.singletons
 $ extractfromfastanames.pl prefix.seq prefix.singletons > prefix.singletons.seq
 
 $ listReadPlacedStatus prefix.bnk > prefix.status
 $ grep D prefix.status  | awk '{print $5,$6}' | sort -u > prefix.pairs

Edena

Edena: Exact DE Novo Assembler
Hernandez, P. François, L. Farinelli, M. Osteras, and J. Schrenzel, "De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer." Genome Res. Published April 3, 2008
Genomic Research Laboratory, Infectious Diseases Service, Geneva University Hospitals, Switzerland
Current release: 2.1.1 (03/17/2008)
Distribution: Compiled executables for Linux-32,Linux-64
Uses a traditional read overlap-layout paradigm (adjusted for short reads)
Parameters:

 --overlapCutoff [int]    Only consider overlaps >= than the specified size. (default 22)
 --minContigSize [int]    Minimum size of the contigs to output. (default 100)

contigs don't overlap

Example:

 $ edena -r prefix.seq -p ovl
 $ edena -e prefix.ovl -p prefix
 $ ln -s prefix_contigs.fasta prefix.fasta
 $ cat *_info.txt                                 # assembly stats
 $ grep assembled Pa_info.txt
 $ cat Pa_contigs.fasta | grep "^>" | perl -ane '/(\d+)$/; print $1, "\n";' | getSummaryDescriptive.pl -t assembled

Comparative

AMOScmp

one should increase the make-consensus error rate if ref and qry are not that similar
all reads from layout are included in consensus
casm-layout: if one sequence end does not align it is left as singleton; the adjacent read to that end is left as singletons if they should "theoretically" overlap;
all CONTAINED reads should be assembled but the one that are adjacent to dirty reads are left as singletons as well !!!
There are some large negative gaps in scaffold file (Solexa read assemblies); Some contigs are "CONTAINED" in the contig just before the large negative gap; all these contig id's are larger than the .scaff last contig ; Contig alignments to the ref have breaks
casm-layout -g 10000 (defaul max gap) seems to produce the problem above; this value should be reduced (Ex 100)

Example:

 $ sort -nk4 AMOS.scaff | head
 322 BE 87996 -24519
 210 BE 7460 -7324
 1271 BE 60490 -4112
 17 BE 1022 -944
 ...

 $more AMOS.scaff
 ...
 322 BE 87996 -24519     # 322 contains contigs 1770, 1771 ...
 1770 BE 39 1100
 1771 BE 130 738
 ...
 1780 BE 72 25004

 $ tail -1 AMOS.scaff
 1767 BE 39264 0

AMOScmp-shortReads-alignmentTrimmed

Defaults:

  MINCLUSTER  = 16
  MINMATCH    = 16
  MINLEN      = 20  # delta-filter -l 20
  MINOVL      = 10  # previously 5 ; mis-assemblies occurred
  MAXTRIM     = 10
  MAXGAP      = 100 # default 10,000 (casm-layout -g) creates some incorrect contigs
  MAJORITY    = 70  # previously 50 
  CONSERR     = 0.06
  ALIGNWIGGLE = 2   # different than the default AMOScmp 15

 * Reads are trimmed based on nucmer alignment to a reference genome; reads that are adjacent to 0 cvg regions are not trimmed
 * If reads are not trimmed CONSERR should be increased, otherwise make-consensus fails
 * If reads are not trimmed, the avg contig length is significantly shorter than if reads are trimmed based on alignments => 
   !!! read trimming is important.
 * assembly is very fragmented if the percent identity between reads and the reference is less than 98%
 * simulations run on a PA-b1 10K read subset using PA14 as reference; random snps were introduced into PA14 to decrease the %identity; (MINCLUSTER=12, MINMATCH=16, MINLEN=20)
   99.4% : 1 contig  (original data, no snps added to PA14)
   98%   : 2 contigs
   96%   : 10 contigs
   93%   : 50 contigs
   93%   : 37 contigs (MINMATCH=14)

AMOScmp-shortReads

Defaults

  MINCLUSTER  = 20
  MINMATCH    = 20
  MINOVL      = 5
  MAXTRIM     = 10
  MAJORITY    = 50
  CONSERR     = 0.06
  ALIGNWIGGLE = 2

 * No read trimming is done

AMOScmp2

Runs AMOScmp of 2 data sets at once (with different paarmeters)

maq

At high %identity (close to 100%) maq outperforms AMOScmp-shortReads-alignmentTrimmed
At lower % identity (<97%) AMOScmp-shortReads-alignmentTrimmed outperforms maq

Other Scripts

EMBOSS megamerge

 * Can merge well short overlapping contigs
 * Does not work well when the overlap identity is low
 * Min ovl length=2

~dpuiu/bin/nucmer.pl

It is a nucmer wrapper which runs nucmer with decreasingly alignment size on the reads not aligned at the previous iterations:
Shorter run time, less disk space
Default parameters:

 -len 20,16,12         : nucmer minmatch (-l)
 -cluster 20,16,14     : nucmer mincluster (-c)
 -filter 20,20,20      : delta-filter minlen (-l)

Could be used by AMOScmp-shortReads, AMOScmp-shortReads-alignmentTrimmed

Cases

No reference sequence

One data set, multiple denovo assemblers

Example:

 * Solexa data
 * edena & velvet assemblers

Solution:

 * run minimus on the contigs

Multipls data sets, one(multiple) denovo assemblers

Example:

* Solexa & 454 data
* velvet assembly on Solexa
* newbler assembly on 454

Solution:

 * run minimus on the contigs

One reference sequence

Example:

 * Solexa data
 * AMOScmp on Solexa
 * velvet on Solexa

Solution:

 * remove the redundant velvet contigs (CONTAINED in AMOScmp contigs) (Optional; leaving the CONTAINED contigs had very similar results, CONSERR did not have to be reduced)
 * run AMOScmp on AMOScmp & velvet contigs => join some AMOScmp gaps
 * run minimus to further merge contigs

Multiple reference sequences

Example:

 * Solexa data
 * AMOScmp on Solexa using Ref1
 * AMOScmp on Solexa using Ref2

Solution:

 * remove the redundant AMOScmp Ref2 contigs (CONTAINED in AMOScmp contigs) (optional)
 * run AMOScmp on AMOScmp Ref1 & Ref2 contigs => join some AMOScmp Ref1 gaps
 * run minimus to further merge contigs

Examples

Pseudomonas_syringae

Reference:

 Name           Length  %GC
 NC_004578.1    6397126 58.40
 NC_004633.1    73661   55.15
 NC_004632.1    67473   56.17

Repeats:

 desc    #repeats   min     max     mean    stdev    sum
 50bp+   991        50      7362    393.73  792.41   390192
 100bp+  429        100     7362    815.36  1060.29  349793

Solexa reads

 Type            #reads       min     max     mean
 Solexa          6340136      32      32      32   (~31x coverage)

Assemblies:

 Assembler   type         input-data  #reads         #ctgs   min     max     mean      stdev     ctgs-sum      #singletons       
 AMOScmp     comparative  Solaxa      6340136        187     20      577929  34863.06  91692.34  6519394       698638(11%)
 velvet      denovo       Solaxa      6340136        25161   45      5057    241.83    212.61    6084887
 edena       denovo       Solaxa      6340136        14084   100     5075    210.92    145.68    2970720       4893301(77%)

Merged assemblies(contigs&singletons):

 assemblers     type         input-data                #reads  #ctgs   min     max     mean     stdev   ctgs-sum   comments    
 AMOScmp-merged ?            AMOScmp(contigs)          187     166     20      804024  39272.2  121124  6519189    #merged 187-166=21 negative gaps  out of a total of 32
 minimus(ovl20) denovo       velvet(contigs)           25161   19121   45      5057    311.3    297.27  5952381    #merged 25161-19121=6040 (25%) gaps
 minimus(ovl15) denovo       velvet(contigs)           25161   16343   45      9903    361.32   359.78  5905143    #merged 25161-16343=8818 (35%) gaps
 minimus(ovl40) denovo       edena+velvet(contigs)     39245   23644   45      6688    257.15   232.94  6080063    #very few 40bp overlaps are found
 minimus(ovl20) denovo       edena+velvet(contigs)     39245   18603   45      6688    322.32   311.02  5996244

Simulated 32bp exact match reads

 Type            #reads       min     max     mean
 Sim(ulated)     6538167      32      32      32   ( 32x coverage)

Single assemblies:

 Assembler   type         input-data  #reads         #ctgs   min     max     mean      stdev     ctgs-sum      #singletons       
 edena-sim   denovo       Sim         6538167        2068    100     47881   2994.03   4857.76   6191673       198699(3%)
 velvet-sim  denovo       Sim         6538167        2207    45      56810   2820.91   5348.36   6225757       123591(2%)

454 reads

 Type            #reads       min     max     mean
 454             77466        35      371     240

Pseudomonas aeruginosa b1 (PAb1)

References:

 Name           Length  %GC
 PA14           6537648 66.29
 PACS2          6492423 66.33
 PAO1           6264404 66.56
 ...

Solexa reads

 Type            #reads       min     max     mean
 Solexa          8627900      33      33      33    (~43X coverage)

Assemblies:

All contigs:

 Assembler      type         #ctgs   min     max     mean      stdev      ctgs-sum      #singletons    comments
 AMOSCmp-PA14   comparative  2053    17      170485  3011.84   11917.53   6183320       1127399
 AMOSCmp-PAO1   comparative  2797    17      75626   2161.19   5812.2     6044851       1592525
 AMOScmp-PA2192 comparative  5816    17      133129  1072.8    3725.22    6239454       1601299        largest assembly
 maq-PA14       comparative  991     33      155551  6199.79   17445.05   6143996       1197385
 velvet         denovo       10684   45      16239   640.34    825.24     6841458       1241079        much better than Ps !!!
 edena          denovo       11180   100     11300   552.36    610.52     6175460       3955865 (46%)  much better than Ps !!!
 ssake          denovo       185030  34      5490    77.21     141.23     14287079      3056893

200bp+ contigs:

 Assembler      type         #ctgs   min     max     mean      stdev      ctgs-sum
 AMOSCmp-PA14   comparative  428     203     170485  14262.09  22852.74   6104175
 AMOSCmp-PAO1   comparative  865     200     75626   6893.96   8766.63    5963278
 AMOSCmp-PA2192 comparative  1299    200     133129  4683.46   6735.52    6083817
 maq-PA14       comparative  368     200     155551  16581.7   25475.92   6102067
 velvet         denovo       7382    200     16239   877.05    896.35     6474426
 edena          denovo       8316    200     11300   692.54    651.24     5759209
 ssake          denovo       12532   200     5490    486       329.93     6090567

Merged assemblies(contigs&singletons):

 Assembler           type   input-data          #reads   #ctgs   min   max     mean    stdev      ctgs-sum       comments
 AMOSCmp-PA14-merge  ?      AMOSCmp-PA14(ctgs)  2053     1931    17    170485  3201.7  12981.57   6182486        2053-1931=122 gaps closed

Staphylococcus aureus MW2

Data

Reads:

 3857879 35bp Solexa reads

Reference:

 NC_003923      2820462 32.83  Staphylococcus aureus subsp. aureus MW2, complete genome.
 AP004832       20654  28.38  Staphylococcus aureus plasmid pMW2 DNA, complete sequence
 total          2841116

Coverage: 47.52X

Other strains:

 NC_002952.2    2902619 32.81  Staphylococcus aureus subsp. aureus MRSA252, complete genome
 NC_002953.3    2799802 32.85  Staphylococcus aureus subsp. aureus MSSA476, complete genome
 NC_002951.2    2809422 32.82  Staphylococcus aureus subsp. aureus COL, complete genome
 NC_009632.1    2906507 32.95  Staphylococcus aureus subsp. aureus JH1, complete genome
 NC_009487.1    2906700 32.95  Staphylococcus aureus subsp. aureus JH9, complete genome
 NC_009782.1    2880168 32.88  Staphylococcus aureus subsp. aureus Mu3, complete genome
 NC_002758.2    2878529 32.88  Staphylococcus aureus subsp. aureus Mu50, complete genome
 NC_003923.1    2820462 32.83  Staphylococcus aureus subsp. aureus MW2, complete genome
 NC_002745.2    2814816 32.84  Staphylococcus aureus subsp. aureus N315, complete genome
 NC_007795.1    2821361 32.87  Staphylococcus aureus subsp. aureus NCTC 8325, complete genome
 NC_009641.1    2878897 32.89  Staphylococcus aureus subsp. aureus str. Newman, complete genome
 NC_007622.1    2742531 32.78  Staphylococcus aureus RF122, complete genome
 NC_007793.1    2872769 32.75  Staphylococcus aureus subsp. aureus USA300, complete genome
 NC_010079.1    2872915 32.76  Staphylococcus aureus subsp. aureus USA300_TCH1516, complete genome

Sequence assemblies

AMOScmp

OVL: 10 bp

Output stats:

 All:
               #elem   min     max     mean    median  n50     sum
 ctg           17      60      658745  167134  102144  485748  2841283
 ctg(200bp+)   15      2090    658745  189411  102265  485748  2841158
 singl         101422  23      35      35      35      35      3549538
 breaks        0

Only 2.62% of the reads are singletons
101232 out of 101422 AMOS singletons (99.81%) are also velvet singletons

velvet

OVL: 23 bp

Output stats:

               #elem   min     max     mean    median  n50     sum
 ctg           1777    45      22892   1583    252     5337    2813162
 ctg(100bp+)   1146    100     22892   2421    1167    5472    2774471
 ctg(200bp+)   939     200     22892   2924    1764    5512    2745649
 singl         289207  35      35      35      35      35      10122245
 breaks 0

7.49% reads are singletons
101232 out of 289207 velvet singletons are also AMOS singletons
13189 reads are duplicates, connect 1126 contig pairs (1242 unique ctgs in these pairs)
1114 out of 1242 contig overlaps were identified by running nucmer (-c 12 -l 12)

 ovlSize count
 22      740
 21      99
 20      55
 19      36
 18      52
 17      33
 16      17
 15      29
 14      21
 13      15
 12      17
 Total   1114

0cvg regions

 $show-coords Sa.all.delta | ~/bin/getNucmerCoverage.pl -M 0 | getSummary.pl -c 3
       #elem   min     max     mean    median  n50     sum
 gap   305     1       570     34      19      70      10521

OVL: 21 bp

               #elem   min     max     mean    median  n50     sum
 all           2099    41      18254   1342    258     4384    2817753
 45bp+         1830    45      18254   1534    448     4391    2806596
 100bp+        1337    100     18254   2075    1083    4558    2774667
 200bp+        1115    200     18254   2461    1486    4625    2743892
 singl         224006

OVL: 23 bp seems to generate a better assembly on this data set !!!

edena

Output stats:

               #elem   min     max     mean    median  n50     sum
 ctg           1175    100     22893   2350    1094    5514    2760953
 ctg(200bp+)   935     204     22893   2918    1766    5594    2727929

Contig assemblies

AMOScmp on velvet contigs

Min Ovl: 10bp Min Cluster(nucmer -c): 40 (many velvet contigs are 45-65 bp)

 All:
       #elem   min     max     mean    median  n50     sum
 ctg   567     45      73756   4890    455     20846   2772573
 singl 5       45      11556   3159    58      11556   15794
 all   572     45      73756   4875    455     20846   2788367
 breaks 2 (1 contig with 2 overlapping alignments; no repeat; why is nucmer breaking the alignment?)
 
 200 bp+:
       #elem   min     max     mean    median  n50     sum
 ctg   339     201     73756   8118    3413    20846   2752170
 singl 2       4086    11556   7821    11556   11556   15642
 all   341     201     73756   8117    3494    20846   2767812
 breaks 2

Reduced the number of contigs from 1777 to 567 !!!

minimus on velvet contigs

Min Ovl: 16bp

 All:
       #elem   min     max     mean    median  n50     sum
 ctg   316     45      39490   5812    3243    12483   1836527
 singl 797     45      22892   1204    161     4724    959356
 all   1113    45      39490   2512    299     9399    2795883
 breaks 20
 
 200 bp+:
       #elem   min     max     mean    median  n50     sum
 ctg   258     202     39490   7093    5069    12483   1829967
 singl 368     200     22892   2504    1347    5210    921478
 all   626     200     39490   4395    2313    9508    2751445
 breaks 19

Reduced the number of contigs from 1777 to 1113 !!!

minimus1 on velvet contigs

Min Ovl: 16bp

 All:
       #elem   min     max     mean    median  n50     sum
 ctg   277     70      32618   6623    4709    11914   1834597
 singl 1022    45      19128   948     96      5057    968872
 all   1299    45      32618   2158    154     9526    2803469
 breaks 15
 
 200 bp+:
       #elem   min     max     mean    median  n50     sum
 ctg   247     200     32618   7413    6004    11984   1830980
 singl 354     200     19128   2590    1397    5294    916800
 all   601     200     32618   4572    2495    9757    2747780
 breaks 15

Reduced the number of contigs from 1777 to 1299 !!!
Generated fewer, larger big contigs than minimus

minimus1 on velvet contigs

Min Ovl: 14bp; Uses only the overlaps represented by Duplicate reads

 All:
       #elem   min     max     mean    median  n50     sum
 ctg   240     70      27522   3955    1996    8075    949307
 singl 1179    45      19128   1575    240     5337    1856945
 all   1419    45      27522   1978    344     6270    2806252
 breaks 11
 
 200 bp+:
       #elem   min     max     mean    median  n50     sum
 ctg   196     201     27522   4815    3010    8075    943828
 singl 612     200     19128   2962    1857    5472    1813005
 all   808     200     27522   3412    2055    6337    2756833
 breaks 11

Fewer contigs get merged!!!

minimus1 on edena contigs

 All:
       #elem   min     max     mean    median  n50     sum
 ctg   67      240     24824   5610    4814    8671    375847
 singl 1022    100     19924   2331    1084    5455    2382674
 all   1089    100     24824   2533    1183    5933    2758521
 breaks 27
 
 200 bp+:
       #elem   min     max     mean    median  n50     sum
 ctg   67      240     24824   5610    4814    8671    375847
 singl 811     204     19924   2902    1766    5588    2353676
 all   878     204     24824   3109    1862    5984    2729523
 breaks 27

Edena contig ends get trimmed. There are fewer overlap between them than velvet contigs!!!

Staphylococcus aureus MW2 "mutated" 10% of sequence are random SNPs

Sequence assemblies

AMOScmp

OVL: 10 bp ...

               #elem   min     max     mean    median  n50     sum
 all           22431   20      705     107     87      124     2409008
 200bp+        1997    200     705     267     246     260     533882
 singl         1934772

velvet

Same as the one above

Contig assemblies

AMOScmp on velvet contigs

Min Ovl: 10bp Min Match (nucmer -l): 20 Min Cluster (nucmer -c): 40

All:

       #elem   min     max     mean    median  n50     sum
 ctg   496     45      47387   5504    2076    14370   2729975
 singl 629     45      11556   112     60      131     70262
 all   1125    45      47387   2489    96      13785   2800237
 breaks 6

200 bp+:

       #elem   min     max     mean    median  n50     sum
 ctg   405     204     47387   6715    3277    14370   2719753
 singl 28      200     11556   1042    288     11556   29165
 all   433     200     47387   6349    2878    14036   2748918
 breaks 4

Min Ovl: 10bp Min Match (nucmer -l): 16 Min Cluster (nucmer -c): 32

 All:
       #elem   min     max     mean    median  n50     sum
 ctg   544     45      53996   5044    1215    16039   2744166
 singl 385     45      11556   133     53      5057    51297
 all   929     45      53996   3009    112     15857   2795463
 breaks 8
 
 200 bp+:
       #elem   min     max     mean    median  n50     sum
 ctg   391     201     53996   6980    3156    16443   2729048
 singl 8       222     11556   3543    4086    11556   28343
 all   399     201     53996   6911    3156    16039   2757391
 breaks 6

Assembly merge

Assemblers

Denovo

minimus

minimus1

minimus2

SSAKE

Velvet

Edena

Comparative

AMOScmp

AMOScmp-shortReads-alignmentTrimmed

AMOScmp-shortReads

AMOScmp2

maq

Other Scripts

EMBOSS megamerge

~dpuiu/bin/nucmer.pl

Cases

No reference sequence

One data set, multiple denovo assemblers

Multipls data sets, one(multiple) denovo assemblers

One reference sequence

Multiple reference sequences

Examples

Pseudomonas_syringae

Solexa reads

Simulated 32bp exact match reads

454 reads

Pseudomonas aeruginosa b1 (PAb1)

Solexa reads

Staphylococcus aureus MW2

Data

Sequence assemblies

AMOScmp

velvet

OVL: 23 bp

OVL: 21 bp

edena

Contig assemblies

AMOScmp on velvet contigs

minimus on velvet contigs

minimus1 on velvet contigs

minimus1 on velvet contigs

minimus1 on edena contigs

Staphylococcus aureus MW2 "mutated" 10% of sequence are random SNPs

Sequence assemblies

AMOScmp

velvet

Contig assemblies

AMOScmp on velvet contigs

Navigation menu

Search