Comparative assemblies: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
(7 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
=== Modified parameters === | === Modified parameters === | ||
* | Modified AMOScmp pipeline: ~dpuiu/bin/AMOScmp | ||
Alignment: | |||
* Lower '''nucmer''' alignement/cluster sizes : default are 20/65 ; drop to 16/16 (Solexa_read_len/2) | |||
Can go as low as 14/14; 12/12 gives too many spurious alignments: | |||
-D MINMATCH=20 -D MINCLUSTER=20 | -D MINMATCH=20 -D MINCLUSTER=20 | ||
* Run nucmer multiple times: | |||
all reads: given alignement/cluster size | |||
unaligned reads: smaller alignement/cluster size | |||
unaligned reads: smaller alignement/cluster size | |||
... | |||
* Use '''promer''' instead of '''nucmer''': alignement/cluster sizes of 6/11 (in AA) | |||
Layout: | |||
* Drop '''casm-layout''' min ovl from 10 to 5: | * Drop '''casm-layout''' min ovl from 10 to 5: | ||
-D MINOVL=5 | -D MINOVL=5 | ||
* Drop '''casm-layout''' majority from 70 to 50: | * Drop '''casm-layout''' majority from 70 to 50: | ||
-D MAJORITY=50 | -D MAJORITY=50 | ||
Consensus: | |||
* Drop '''make-consensus''' alignment wiggle from 15 to 2 | * Drop '''make-consensus''' alignment wiggle from 15 to 2 | ||
-D ALIGNWIGGLE=2 | -D ALIGNWIGGLE=2 | ||
* Use '''make-consensus''' -x option ??? | * Use '''make-consensus''' -x option ??? | ||
=== Read trimming === | === Read trimming === | ||
* Quality trimming: | * Quality trimming: too stringent | ||
* Align to reference using nucmer (small -c -l); trim reads to alignment coordinates | * Align to reference using nucmer (small -c <n> -l <n>); trim reads to alignment coordinates | ||
* Identify 0 cvg regions; don't trim reads adjacent to these regions | * Identify 0 cvg regions; don't trim reads adjacent to these regions | ||
* Update read clr's; run AMOScmp | * Update read clr's; run AMOScmp | ||
Example: | |||
$ show-coords -c -l -o -r -H $(PREFIX).delta | $(SCRIPTDIR)/getNucmerCoverage.pl -M 0 > $(PREFIX).0cvg | |||
$ delta2clr.pl -zero_cvg $(PREFIX).0cvg -read_len $(READLEN) < $(PREFIX).delta > $(PREFIX).clr | |||
$ awk '{print $1}' $(PREFIX).clr > $(PREFIX).seqs | |||
$ updateClrRanges -i $(PREFIX).bnk $(PREFIX).clr | |||
$ dumpreads -I $(PREFIX).seqs $(BANK) > $(PREFIX).seq | |||
=== Contig merging === | === Contig merging === | ||
* Identify adjacent contig end overlaps | * Identify adjacent contig end overlaps | ||
* Overlaps might be too short to be identified by alignment programs | |||
* Programs that do alignment & sequence merging: | * Programs that do alignment & sequence merging: | ||
* EMBOSS merger: does not handle long sequences | * EMBOSS merger: does not handle long sequences | ||
* fastaMerge.pl | * fastaMerge.pl | ||
Input: multiFasta file; contigs must be ordered and oriented; only checks adjacent contig ends | Input: multiFasta file; contigs must be ordered and oriented; only checks adjacent contig ends | ||
Example: | |||
Example: | |||
$ fastaMerge.pl -min 5 -max 30 -id 0.8 $(PREFIX).fasta -debug 1 > $(PREFIX).merge.fasta | |||
ctg1_id ctg2_id ovl_len ovl_id | ctg1_id ctg2_id ovl_len ovl_id | ||
20 21 10 1 | 20 21 10 1 | ||
Line 37: | Line 61: | ||
36 37 9 0.88 | 36 37 9 0.88 | ||
... | ... | ||
2008_0109_AMOSCmp-PA14-relaxed-17-nucmer-redo2 assembly: # contigs 2053 -> 1927 | 2008_0109_AMOSCmp-PA14-relaxed-17-nucmer-redo2 assembly: # contigs 2053 -> 1927 | ||
=== Multiple references === | === Multiple references === | ||
* Find most similar genome : most number or reads it aligns to it |
Latest revision as of 13:51, 27 February 2008
AMOScmp pipeline
Short reads(Solexa)
Modified parameters
Modified AMOScmp pipeline: ~dpuiu/bin/AMOScmp Alignment: * Lower nucmer alignement/cluster sizes : default are 20/65 ; drop to 16/16 (Solexa_read_len/2) Can go as low as 14/14; 12/12 gives too many spurious alignments: -D MINMATCH=20 -D MINCLUSTER=20 * Run nucmer multiple times: all reads: given alignement/cluster size unaligned reads: smaller alignement/cluster size unaligned reads: smaller alignement/cluster size ... * Use promer instead of nucmer: alignement/cluster sizes of 6/11 (in AA) Layout: * Drop casm-layout min ovl from 10 to 5: -D MINOVL=5 * Drop casm-layout majority from 70 to 50: -D MAJORITY=50 Consensus: * Drop make-consensus alignment wiggle from 15 to 2 -D ALIGNWIGGLE=2 * Use make-consensus -x option ???
Read trimming
* Quality trimming: too stringent
* Align to reference using nucmer (small -c <n> -l <n>); trim reads to alignment coordinates * Identify 0 cvg regions; don't trim reads adjacent to these regions * Update read clr's; run AMOScmp Example: $ show-coords -c -l -o -r -H $(PREFIX).delta | $(SCRIPTDIR)/getNucmerCoverage.pl -M 0 > $(PREFIX).0cvg $ delta2clr.pl -zero_cvg $(PREFIX).0cvg -read_len $(READLEN) < $(PREFIX).delta > $(PREFIX).clr $ awk '{print $1}' $(PREFIX).clr > $(PREFIX).seqs $ updateClrRanges -i $(PREFIX).bnk $(PREFIX).clr $ dumpreads -I $(PREFIX).seqs $(BANK) > $(PREFIX).seq
Contig merging
* Identify adjacent contig end overlaps * Overlaps might be too short to be identified by alignment programs * Programs that do alignment & sequence merging: * EMBOSS merger: does not handle long sequences * fastaMerge.pl Input: multiFasta file; contigs must be ordered and oriented; only checks adjacent contig ends Example: $ fastaMerge.pl -min 5 -max 30 -id 0.8 $(PREFIX).fasta -debug 1 > $(PREFIX).merge.fasta ctg1_id ctg2_id ovl_len ovl_id 20 21 10 1 34 35 18 1 36 37 9 0.88 ... 2008_0109_AMOSCmp-PA14-relaxed-17-nucmer-redo2 assembly: # contigs 2053 -> 1927
Multiple references
* Find most similar genome : most number or reads it aligns to it