Difference between revisions of "Partition"

From Cbcb
Jump to navigation Jump to search
Line 18: Line 18:
 
==Tutorial==
 
==Tutorial==
 
===Format of an input regular expression (*.re) file===
 
===Format of an input regular expression (*.re) file===
The format of an
+
The format of the *.re input file is:
 
   [name]
 
   [name]
 
   key1 = value1
 
   key1 = value1
Line 26: Line 26:
 
   key1 = value1
 
   key1 = value1
 
   ...
 
   ...
 +
where [name] represents the name of the partition, and the key-value pairs will represent the attributes.
 +
 
Example of an input file that clusters animals by their first letter:
 
Example of an input file that clusters animals by their first letter:
 
   [animals]
 
   [animals]

Revision as of 14:53, 13 August 2009

Summary

Partition is a python script that takes an input of regular expressions and metadata to build an xml file of matching header information from a fasta-formatted file. Partition.py is located at:

   /fs/szasmg/metagenomics/Partition/Partition.py [stable]
   /fs/szasmg/metagenomics/Partition/MetaPart/src/Partition.py [latest, but possibly unstable builds]

Options

-p    Populate the given partition/XML.
-b    Given the input file, build a partition.
-m    Metadata file that will be used to populate the partitions.
-h    Header information for the metadata, if not present column information for metadata will be found in first line of the metadata.
-f    Input fasta file.
-s    Split the fasta file based on the partition information and output to the directory.
-o    Name of the output .part file.
-c    Convert an old partition format into the new xml format.

Tutorial

Format of an input regular expression (*.re) file

The format of the *.re input file is:

  [name]
  key1 = value1
  key2 = value2
  ...
  [name2]
  key1 = value1
  ...

where [name] represents the name of the partition, and the key-value pairs will represent the attributes.

Example of an input file that clusters animals by their first letter:

  [animals]
  info = all animals
  [A]
  info = animals that start with A
  regexp = a.*
  [B]
  info = animals that start with B
  regexp = b.*
  ...

Using the format above, the partition can only have two levels. It is possible to have multiple levels, but the input file needs to be an xml file (explained below).