Assembly merge
Jump to navigation
Jump to search
Assemblers
Denovo
Minimus
* hash-overlap overlap: 40 bp default : too large for contig assemblies 20 bp minimum overlap; minimizer window length must be >=15bp; could these values be dropped lower?
Velvet
* overlap: 18bp usually gives fewest contigs 15bp is too low => too many short contigs
Edena
* contigs don't overlap
Comparative
AMOScmp
Cases
No reference sequence
One data set, multiple denovo assemblers
Example:
* Solexa data * edena & velvet assemblers
Solutions:
* merge 2 assembly contigs * run minimus on them
Multipls data sets, one(multiple) denovo assemblers
Example:
Solexa & 454 data velvet assemblers for each set
One reference sequence
Few indels, few rearrangements
Many indels, few rearrangements
Few indels, many rearrangements
Multiple reference sequences
Examples
Pseudomonas_syringae
Reference:
Name Length %GC NC_004578.1 6397126 58.40 NC_004633.1 73661 55.15 NC_004632.1 67473 56.17
Repeats:
desc #repeats min max mean stdev sum 50bp+ 991 50 7362 393.73 792.41 390192 100bp+ 429 100 7362 815.36 1060.29 349793
Data:
Type #reads min max mean Solexa 6340136 32 32 32 Sim(ulated) 6538167 32 32 32 454 77466 35 371 240
Single assemblies:
Assembler type input-data #reads #ctgs min max mean stdev ctgs-sum #singl edena denovo Solaxa 6340136 14084 100 5075 210.92 145.68 2970720 4893301(77%) velvet denovo Solaxa 6340136 25161 45 5057 241.83 212.61 6084887 edena-sim denovo Sim 6538167 2068 100 47881 2994.03 4857.76 6191673 198699(3%) velvet-sim denovo Sim 6538167 2207 45 56810 2820.91 5348.36 6225757 123591(2%) AMOScmp comparative Solaxa 6340136 187 20 577929 34863.06 91692.34 6519394 698638(11%)
Merged assemblies(contigs&singletons):
assemblers type input-data #reads #ctgs min max mean stdev ctgs-sum comments minimus(ovl40) denovo edena+velvet(contigs) 39245 23644 45 6688 257.15 232.94 6080063 #very few 40bp overlaps are found minimus(ovl20) denovo edena+velvet(contigs) 39245 18603 45 6688 322.32 311.02 5996244 minimus(ovl20) denovo velvet(contigs) 25161 19121 45 5057 311.3 297.27 5952381 #merged 25161-19121=6040 (25%) gaps