Cbcb:Pop-Lab:How do I run the new Bambus: Difference between revisions

From Cbcb
Jump to navigation Jump to search
No edit summary
 
No edit summary
Line 1: Line 1:
The new Bambus (aka Bambus 2) actually consists of three executables that run in order using a supplied AMOS bank. An important note, Bambus 2 is still an early beta. As such, it is advisable to back up
The new Bambus (aka Bambus 2) actually consists of fourexecutables that run in order using a supplied AMOS bank. An important note, Bambus 2 is still an early beta. As such, it is advisable to back up
the bnk directory before using.
the bnk directory before using. Program documentation is also available on the command line by typing <command> -h.


The first program is clk. This program finds all mated reads within contigs and converts the mate distances to be relative to contigs rather than reads. The second program is Bundler. Bundler joins together
The first program is clk. This program finds all mated reads within contigs and converts the mate distances to be relative to contigs rather than reads. The second program is Bundler. Bundler joins together
the contig link messages generated by clk together when they can be to create consensus links between contigs. It will output multiple contig links for a pair of contigs. The final program is OrientContigs.
the contig link messages generated by clk together when they can be to create consensus links between contigs. It will output multiple contig links for a pair of contigs. The third program is MarkRepeats. This program identifies repeats using two methods, the first is shortest paths and the second is by examining a-stat on a component by component basis. The final program is OrientContigs.
OrientContigs uses the contig links to orient and order the contigs into scaffolds, as well as performing some simplification by joining contigs. Each of the programs is covered in more detail below. To get more
OrientContigs uses the contig links to orient and order the contigs into scaffolds, as well as performing some simplification by joining contigs. Each of the programs is covered in more detail below. To get more
help on running any program use -h.
help on running any program use -h.


1. clk
# clk
   - Modifies the bank to create contig edges.
   - Modifies the bank to create contig edges.<br>
   - Example: clk -b[ank] data.bnk
   - Example: clk -b[ank] data.bnk


2.Bundler
# Bundler
   - Bundle together contig edges to create contig links.
   - Bundle together contig edges to create contig links.<br>
   - Example: Bundler -b[ank] data.bnk [-t[ype] comma separated list of edge types]
   - Example: Bundler -b[ank] data.bnk [-t[ype] comma separated list of edge types]
  - The -t[ype] option allows only certain contig edges to be processed. ALL means use any type. The types are defined in src/AMOS/Link_AMOS.hh
    - The -t[ype] option allows only certain contig edges to be processed. ALL means use any type. The types are defined in src/AMOS/Link_AMOS.hh


3. OrientContigs
3. MarkRepeats
- Run shortest paths and connected component repeat detection algorithms. This requires AMOS to be built with the Boost graph library available to it.<br>
- Example: MarkRepeats -b[and] data.bnk [-redundancy X -aggressive]
    - The -redundancy ignores linka containing fewer than X edges.
    - The -aggressive option marks contigs as repetitive based on global astat calculation rather than a connected component one.
 
4. OrientContigs
  - Orient and order the contigs based on the links. This program uses a greedy algorithm to orient and order contigs relative to an arbitrary start contig. Edges that contradict the current scaffold are marked bad
  - Orient and order the contigs based on the links. This program uses a greedy algorithm to orient and order contigs relative to an arbitrary start contig. Edges that contradict the current scaffold are marked bad
   and ignored for the rest of the analysis. They are still output but don't affect any subsequent calculations.
   and ignored for the rest of the analysis. They are still output but don't affect any subsequent calculations.
    
    
   The output is a dot-formatted file as well as an NCBI AGP scaffold format.  
   The output includes a a dot-formatted file, NCBI AGP scaffold formatted file, and xml files formatted to be compatible with Bambus 1 tools.
    
    
   Note that this program does not currently linearize the scaffold but maintains them as a graph. This program also recursively simplifies common patters in the graph. Currently the patterns are
   Note that this program does not currently linearize the scaffold but maintains them as a graph. This program also recursively simplifies common patters in the graph. Currently the patterns are
   bubbles or straight lines. For example, contigs A->B->C will be simplified to just A. Also A->B->D will become A as well. This simplification is performed recursively until the graph is stable. Note that the   
   bubbles or straight lines. For example, contigs A->B->C will be simplified to just A. Also A->B->D will become A as well. This simplification is performed recursively until the graph is stable. Note that the   
   \>C/>  
   \>C/>  
   simplification updates the bank in a destructive way by removing contigs and replacing them (as well as their edges) with updated contigs. Therefore it is necessary to make a backup of the bank before running
   simplification updates the bank in a destructive way by removing contigs and replacing them (as well as their edges) with updated contigs.
 
  The marking of the edges as BAD or GOOD also destructively updates the bank. Therefore it is necessary to make a backup of the bank before running this program.<br>
  Example: OrientContigs -b[ank] <bank_name> [-a[ll] -[r]noreduce -[n]noagressive]. The -all option specifies whether disconnected contigs should be output as their own scaffolds or if they should be ignored. The -noreduce option
- Example: OrientContigs -b[ank] <bank_name> -prefix asm [-all -noreduce -redundancy X -repeats Y -aggressive].  
  turns off the simplification described above. Finally, the -noaggressive option will mark edges that move a contig more than 3 STDEVS away as bad instead of attempting to reconcile the positions.
    - The - prefix option specifies the prefix to use for all output files.
    - The -all option specifies whether disconnected contigs should be output as their own scaffolds or if they should be skipped.  
    - The -noreduce option turns off the graph simplification described above.  
    - The -redundancy option ignores links containing fewer than X edges.
    - The -repeats option reads a file of repeats (Y) which specify one contig ID per line. Repeat contigs and their links are not used for odering/orienting any other data in the graph. Repeats are currently not resolved and are output as single-contig scaffolds. If known, these may be specified or the repeats identified by MarkRepeats (above) may be used.
    - The -aggressive option will not mark edges that move a contig more than 3 STDEVS away as bad and will try to reconcile the positions.

Revision as of 21:19, 31 October 2009

The new Bambus (aka Bambus 2) actually consists of fourexecutables that run in order using a supplied AMOS bank. An important note, Bambus 2 is still an early beta. As such, it is advisable to back up the bnk directory before using. Program documentation is also available on the command line by typing <command> -h.

The first program is clk. This program finds all mated reads within contigs and converts the mate distances to be relative to contigs rather than reads. The second program is Bundler. Bundler joins together the contig link messages generated by clk together when they can be to create consensus links between contigs. It will output multiple contig links for a pair of contigs. The third program is MarkRepeats. This program identifies repeats using two methods, the first is shortest paths and the second is by examining a-stat on a component by component basis. The final program is OrientContigs. OrientContigs uses the contig links to orient and order the contigs into scaffolds, as well as performing some simplification by joining contigs. Each of the programs is covered in more detail below. To get more help on running any program use -h.

  1. clk
 - Modifies the bank to create contig edges.
- Example: clk -b[ank] data.bnk
  1. Bundler
 - Bundle together contig edges to create contig links.
- Example: Bundler -b[ank] data.bnk [-t[ype] comma separated list of edge types] - The -t[ype] option allows only certain contig edges to be processed. ALL means use any type. The types are defined in src/AMOS/Link_AMOS.hh

3. MarkRepeats

- Run shortest paths and connected component repeat detection algorithms. This requires AMOS to be built with the Boost graph library available to it.
- Example: MarkRepeats -b[and] data.bnk [-redundancy X -aggressive] - The -redundancy ignores linka containing fewer than X edges. - The -aggressive option marks contigs as repetitive based on global astat calculation rather than a connected component one.

4. OrientContigs

- Orient and order the contigs based on the links. This program uses a greedy algorithm to orient and order contigs relative to an arbitrary start contig. Edges that contradict the current scaffold are marked bad
  and ignored for the rest of the analysis. They are still output but don't affect any subsequent calculations.
  
  The output includes a a dot-formatted file, NCBI AGP scaffold formatted file, and xml files formatted to be compatible with Bambus 1 tools.
  
  Note that this program does not currently linearize the scaffold but maintains them as a graph. This program also recursively simplifies common patters in the graph. Currently the patterns are
  bubbles or straight lines. For example, contigs A->B->C will be simplified to just A. Also A->B->D will become A as well. This simplification is performed recursively until the graph is stable. Note that the   
  												\>C/> 
  simplification updates the bank in a destructive way by removing contigs and replacing them (as well as their edges) with updated contigs. 
  The marking of the edges as BAD or GOOD also destructively updates the bank. Therefore it is necessary to make a backup of the bank before running this program.
- Example: OrientContigs -b[ank] <bank_name> -prefix asm [-all -noreduce -redundancy X -repeats Y -aggressive]. - The - prefix option specifies the prefix to use for all output files. - The -all option specifies whether disconnected contigs should be output as their own scaffolds or if they should be skipped. - The -noreduce option turns off the graph simplification described above. - The -redundancy option ignores links containing fewer than X edges. - The -repeats option reads a file of repeats (Y) which specify one contig ID per line. Repeat contigs and their links are not used for odering/orienting any other data in the graph. Repeats are currently not resolved and are output as single-contig scaffolds. If known, these may be specified or the repeats identified by MarkRepeats (above) may be used. - The -aggressive option will not mark edges that move a contig more than 3 STDEVS away as bad and will try to reconcile the positions.