Difference between revisions of "Partition"

From Cbcb
Jump to navigation Jump to search
Line 23: Line 23:
 
   [A]
 
   [A]
 
   info = animals that start with A
 
   info = animals that start with A
   regexp = [Aa].*
+
   regexp = a.*
 
   [B]
 
   [B]
 
   info = animals that start with B
 
   info = animals that start with B
   regexp = [Bb].*
+
   regexp = b.*
 
   ...
 
   ...

Revision as of 14:33, 13 August 2009

Summary

Partition is a python script that takes an input of regular expressions and metadata to build an xml file of matching header information from a fasta-formatted file. Partition.py is located at:

   /fs/szasmg/metagenomics/Partition/Partition.py [stable]
   /fs/szasmg/metagenomics/Partition/MetaPart/src/Partition.py [latest, but possibly unstable builds]

Options

-p    Populate the given partition/XML.
-b    Given the input file, build a partition.
-m    Metadata file that will be used to populate the partitions.
-h    Header information for the metadata, if not present column information for metadata will be found in first line of the metadata.
-f    Input fasta file.
-s    Split the fasta file based on the partition information and output to the directory.
-o    Name of the output .part file.
-c    Convert an old partition format into the new xml format.

Tutorial

Format of a input regular expression (*.re) file

Example of an input file that clusters animals by their first letter:

  [animals]
  info = all animals
  [A]
  info = animals that start with A
  regexp = a.*
  [B]
  info = animals that start with B
  regexp = b.*
  ...