Comparative assemblies

From Cbcb
Jump to navigation Jump to search

AMOScmp pipeline

Short reads(Solexa)

Modified parameters

 * Smaller nucmer alignement/cluster sizes : default are 20/65 ; drop to 16/16 ; as low as 14/14; 12/12 gives too many  spurious alignments:
   -D MINMATCH=20 -D MINCLUSTER=20 
 * Drop casm-layout min ovl from 10 to 5: 
   -D MINOVL=5 
 * Drop casm-layout majority from 70 to 50: 
   -D MAJORITY=50 
 * Drop make-consensus alignment wiggle from 15 to 2
   -D ALIGNWIGGLE=2
 * Use make-consensus -x option ???
 * Use promer instead of nucmer: alignement/cluster sizes of 6/11 (in AA)

Read trimming

 * Quality trimming: to stringent
 * Align to reference using nucmer (small -c -l); trim reads to alignment coordinates
 * Identify 0 cvg regions; don't trim reads adjacent to these regions
 * Update read clr's; run AMOScmp

Contig merging

 * Identify adjacent contig end overlaps
 * Overlaps might be too short to be identified by alignment programs
 * Programs that do alignment & sequence merging:
     * EMBOSS merger: does not handle long sequences
     * fastaMerge.pl
       Input: multiFasta file; contigs must be ordered and oriented; only checks adjacent contig ends
       Example: ~dpuiu/bin/fastaMerge.pl -min 5 -max 30 -id 0.8 file.fasta -debug 1 > file.merge.fasta
         ctg1_id ctg2_id ovl_len ovl_id
         20      21      10      1
         34      35      18      1
         36      37      9       0.88
         ...
     2008_0109_AMOSCmp-PA14-relaxed-17-nucmer-redo2 assembly: # contigs 2053 -> 1927

Multiple references