From Cbcb
Revision as of 14:52, 13 August 2009 by Cmhill (talk | contribs)
Jump to navigation Jump to search


Partition is a python script that takes an input of regular expressions and metadata to build an xml file of matching header information from a fasta-formatted file. is located at:

   /fs/szasmg/metagenomics/Partition/ [stable]
   /fs/szasmg/metagenomics/Partition/MetaPart/src/ [latest, but possibly unstable builds]


-p    Populate the given partition/XML.
-b    Given the input file, build a partition.
-m    Metadata file that will be used to populate the partitions.
-h    Header information for the metadata, if not present column information for metadata will be found in first line of the metadata.
-f    Input fasta file.
-s    Split the fasta file based on the partition information and output to the directory.
-o    Name of the output .part file.
-c    Convert an old partition format into the new xml format.


Format of an input regular expression (*.re) file

The format of an

  key1 = value1
  key2 = value2
  key1 = value1

Example of an input file that clusters animals by their first letter:

  info = all animals
  info = animals that start with A
  regexp = a.*
  info = animals that start with B
  regexp = b.*

Using the format above, the partition can only have two levels. It is possible to have multiple levels, but the input file needs to be an xml file (explained below).