Cbcb:Pop-Lab:How do I run the new Bambus

From Cbcb
Revision as of 23:12, 13 September 2008 by Sergek (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The new Bambus (aka Bambus 2) actually consists of three executables that run in order using a supplied AMOS bank. An important note, Bambus 2 is still an early beta. As such, it is advisable to back up the bnk directory before using.

The first program is clk. This program finds all mated reads within contigs and converts the mate distances to be relative to contigs rather than reads. The second program is Bundler. Bundler joins together the contig link messages generated by clk together when they can be to create consensus links between contigs. It will output multiple contig links for a pair of contigs. The final program is OrientContigs. OrientContigs uses the contig links to orient and order the contigs into scaffolds, as well as performing some simplification by joining contigs. Each of the programs is covered in more detail below. To get more help on running any program use -h.

1. clk

 - Modifies the bank to create contig edges.
 - Example: clk -b[ank] data.bnk

2.Bundler

 - Bundle together contig edges to create contig links.
 - Example: Bundler -b[ank] data.bnk [-t[ype] comma separated list of edge types]
 - The -t[ype] option allows only certain contig edges to be processed. ALL means use any type. The types are defined in src/AMOS/Link_AMOS.hh

3. OrientContigs

- Orient and order the contigs based on the links. This program uses a greedy algorithm to orient and order contigs relative to an arbitrary start contig. Edges that contradict the current scaffold are marked bad
  and ignored for the rest of the analysis. They are still output but don't affect any subsequent calculations.
  
  The output is a dot-formatted file as well as an NCBI AGP scaffold format. 
  
  Note that this program does not currently linearize the scaffold but maintains them as a graph. This program also recursively simplifies common patters in the graph. Currently the patterns are
  bubbles or straight lines. For example, contigs A->B->C will be simplified to just A. Also A->B->D will become A as well. This simplification is performed recursively until the graph is stable. Note that the   
  												\>C/> 
  simplification updates the bank in a destructive way by removing contigs and replacing them (as well as their edges) with updated contigs. Therefore it is necessary to make a backup of the bank before running
  Example: OrientContigs -b[ank] <bank_name> [-a[ll] -[r]noreduce -[n]noagressive]. The -all option specifies whether disconnected contigs should be output as their own scaffolds or if they should be ignored. The -noreduce option
  turns off the simplification described above. Finally, the -noaggressive option will mark edges that move a contig more than 3 STDEVS away as bad instead of attempting to reconcile the positions.