Dpuiu alignment: Difference between revisions

From Cbcb
Jump to navigation Jump to search
No edit summary
 
 
(5 intermediate revisions by the same user not shown)
Line 2: Line 2:


Programs:
Programs:
* filter-anc.pl -max <i> -percent <i> -w <i> -W <i>
* filter-anc.pl -max -percent -w -W
   max:      max distance between the ref_intercept (default 1,000,000)
   max:      max distance between the ref_intercept (default 1,000,000)
   percent:  min pc of disagreements for deleting an alignment (default 25)
   percent:  min percent agreement for keeping an alignment (default 25)
   w:        min window size (default 4)
   w:        min window size (default 4)   => 1+w*2
   W:        max window size (default 100)
   W:        max window size (default 100) => 1+W*2


* merge-anc.pl  -max <i>
  Bacterial genomes (few Mbp) use -max 100000  -W 10
  Mammalian genomes (few Gbp) use -max 1000000 -W 100
 
* merge-anc.pl  -max  
   max:      max distance between the ref_intercept (default 1,000,000)
   max:      max distance between the ref_intercept (default 1,000,000)


Line 21: Line 24:
   528 PA14-PA7.filter.anc
   528 PA14-PA7.filter.anc
     9 PA14-PA7.merge.anc
     9 PA14-PA7.merge.anc
----
= Pairwise Alignment Processing (poster) =
Daniela Puiu*, Arthur Delcher, Steven Salzberg
University of Maryland, College Park
== Abstract==
* We present a software tool for sequence alignment processing.
* The tool filters and clusters local alignments
* The tool was initially developed for finding syntenic regions in large eukariotic genomes but could also be used for analyzing rearrangements in microbial genomes.
* The software takes as input alignment data which is generally very fragmented (low identity) and noisy (repeats, rearrangements) and tries to filter and cluster it in
* A moving window approach is used for comparing each alignment to its neighbors and decide if it agrees or not with its neighbors.  The alignments which clearly disagree get discarded.
* The process is run iteratively on increasingly larger window sizes till
== Motivation ==
Sequence alignment is one of the most common bioinformatics problems.
Due to its complexity there is no one general solution and multiple algorithms have been implemented for aligning different data set types.  There is always a tradeoff between speed and accuracy.
== Objectives ==
== Materials and Methods ==
== Results ==
== References ==
== Acknowledgments ==

Latest revision as of 18:57, 5 October 2009

Filtering

Programs:

  • filter-anc.pl -max -percent -w -W
 max:      max distance between the ref_intercept (default 1,000,000)
 percent:  min percent agreement for keeping an alignment (default 25)
 w:        min window size (default 4)   => 1+w*2
 W:        max window size (default 100) => 1+W*2
 Bacterial genomes (few Mbp) use -max 100000  -W 10
 Mammalian genomes (few Gbp) use -max 1000000 -W 100 
 
  • merge-anc.pl -max
 max:      max distance between the ref_intercept (default 1,000,000)
  • Example:
 cat PA14-PA7.delta     | ~/bin/shrinkIds.pl | ~/bin/DELTA/delta2anc.pl > PA14-PA7.anc 
 cat PA14-PA7.anc       | ~/bin/DELTA/filter-anc.pl -max 100000 -W 10   > PA14-PA7.filter.anc
 PA14-PA7.filter.anc    | ~/bin/DELTA/merge-anc.pl -max 100000          > PA14-PA7.merge.anc
 cat PA14-PA7.merge.anc | ~/bin/DELTA/anc2delta.pl                      > PA14-PA7.merge.delta

 wc -l *anc
  691 PA14-PA7.anc
  528 PA14-PA7.filter.anc
    9 PA14-PA7.merge.anc



Pairwise Alignment Processing (poster)

Daniela Puiu*, Arthur Delcher, Steven Salzberg University of Maryland, College Park

Abstract

  • We present a software tool for sequence alignment processing.
  • The tool filters and clusters local alignments
  • The tool was initially developed for finding syntenic regions in large eukariotic genomes but could also be used for analyzing rearrangements in microbial genomes.
  • The software takes as input alignment data which is generally very fragmented (low identity) and noisy (repeats, rearrangements) and tries to filter and cluster it in
  • A moving window approach is used for comparing each alignment to its neighbors and decide if it agrees or not with its neighbors. The alignments which clearly disagree get discarded.
  • The process is run iteratively on increasingly larger window sizes till


Motivation

Sequence alignment is one of the most common bioinformatics problems. Due to its complexity there is no one general solution and multiple algorithms have been implemented for aligning different data set types. There is always a tradeoff between speed and accuracy.


Objectives

Materials and Methods

Results

References

Acknowledgments