Comparative assemblies
AMOScmp pipeline
Short reads(Solexa)
Modified parameters
Modified AMOScmp pipeline: ~dpuiu/bin/AMOScmp
Alignment:
* Lower nucmer alignement/cluster sizes : default are 20/65 ; drop to 16/16 (Solexa_read_len/2)
Can go as low as 14/14; 12/12 gives too many spurious alignments:
-D MINMATCH=20 -D MINCLUSTER=20
* Run nucmer multiple times:
all reads: given alignement/cluster size
unaligned reads: smaller alignement/cluster size
unaligned reads: smaller alignement/cluster size
...
* Use promer instead of nucmer: alignement/cluster sizes of 6/11 (in AA)
Layout:
* Drop casm-layout min ovl from 10 to 5:
-D MINOVL=5
* Drop casm-layout majority from 70 to 50:
-D MAJORITY=50
Consensus:
* Drop make-consensus alignment wiggle from 15 to 2
-D ALIGNWIGGLE=2
* Use make-consensus -x option ???
Read trimming
* Quality trimming: too stringent
* Align to reference using nucmer (small -c <n> -l <n>); trim reads to alignment coordinates
* Identify 0 cvg regions; don't trim reads adjacent to these regions
* Update read clr's; run AMOScmp
Example:
$ show-coords -c -l -o -r -H $(PREFIX).delta | $(SCRIPTDIR)/getNucmerCoverage.pl -M 0 > $(PREFIX).0cvg
$ delta2clr.pl -zero_cvg $(PREFIX).0cvg -read_len $(READLEN) < $(PREFIX).delta > $(PREFIX).clr
$ awk '{print $1}' $(PREFIX).clr > $(PREFIX).seqs
$ updateClrRanges -i $(PREFIX).bnk $(PREFIX).clr
$ dumpreads -I $(PREFIX).seqs $(BANK) > $(PREFIX).seq
Contig merging
* Identify adjacent contig end overlaps
* Overlaps might be too short to be identified by alignment programs
* Programs that do alignment & sequence merging:
* EMBOSS merger: does not handle long sequences
* fastaMerge.pl
Input: multiFasta file; contigs must be ordered and oriented; only checks adjacent contig ends
Example:
$ fastaMerge.pl -min 5 -max 30 -id 0.8 $(PREFIX).fasta -debug 1 > $(PREFIX).merge.fasta
ctg1_id ctg2_id ovl_len ovl_id
20 21 10 1
34 35 18 1
36 37 9 0.88
...
2008_0109_AMOSCmp-PA14-relaxed-17-nucmer-redo2 assembly: # contigs 2053 -> 1927
Multiple references
* Find most similar genome : most number or reads it aligns to it