Partition: Difference between revisions

From Cbcb
Jump to navigation Jump to search
No edit summary
No edit summary
Line 17: Line 17:


==Tutorial==
==Tutorial==
===Format of a input regular expression (*.re) file===
===Format of an input regular expression (*.re) file===
The format of an
  [name]
  key1 = value1
  key2 = value2
  ...
  [name2]
  key1 = value1
  ...
Example of an input file that clusters animals by their first letter:
Example of an input file that clusters animals by their first letter:
   [animals]
   [animals]
Line 28: Line 36:
   regexp = b.*
   regexp = b.*
   ...
   ...
Using the format above, the partition can only have two levels.  It is possible to have multiple levels, but the input file needs to be an xml file (explained below).

Revision as of 14:52, 13 August 2009

Summary

Partition is a python script that takes an input of regular expressions and metadata to build an xml file of matching header information from a fasta-formatted file. Partition.py is located at:

   /fs/szasmg/metagenomics/Partition/Partition.py [stable]
   /fs/szasmg/metagenomics/Partition/MetaPart/src/Partition.py [latest, but possibly unstable builds]

Options

-p    Populate the given partition/XML.
-b    Given the input file, build a partition.
-m    Metadata file that will be used to populate the partitions.
-h    Header information for the metadata, if not present column information for metadata will be found in first line of the metadata.
-f    Input fasta file.
-s    Split the fasta file based on the partition information and output to the directory.
-o    Name of the output .part file.
-c    Convert an old partition format into the new xml format.

Tutorial

Format of an input regular expression (*.re) file

The format of an

  [name]
  key1 = value1
  key2 = value2
  ...
  [name2]
  key1 = value1
  ...

Example of an input file that clusters animals by their first letter:

  [animals]
  info = all animals
  [A]
  info = animals that start with A
  regexp = a.*
  [B]
  info = animals that start with B
  regexp = b.*
  ...

Using the format above, the partition can only have two levels. It is possible to have multiple levels, but the input file needs to be an xml file (explained below).