Partition: Difference between revisions
		
		
		
		Jump to navigation
		Jump to search
		
| No edit summary | No edit summary | ||
| Line 17: | Line 17: | ||
| ==Tutorial== | ==Tutorial== | ||
| ===Format of  | ===Format of an input regular expression (*.re) file=== | ||
| The format of an | |||
|    [name] | |||
|    key1 = value1 | |||
|    key2 = value2 | |||
|    ... | |||
|    [name2] | |||
|    key1 = value1 | |||
|    ... | |||
| Example of an input file that clusters animals by their first letter: | Example of an input file that clusters animals by their first letter: | ||
|     [animals] |     [animals] | ||
| Line 28: | Line 36: | ||
|     regexp = b.* |     regexp = b.* | ||
|     ... |     ... | ||
| Using the format above, the partition can only have two levels.  It is possible to have multiple levels, but the input file needs to be an xml file (explained below). | |||
Revision as of 14:52, 13 August 2009
Summary
Partition is a python script that takes an input of regular expressions and metadata to build an xml file of matching header information from a fasta-formatted file. Partition.py is located at:
/fs/szasmg/metagenomics/Partition/Partition.py [stable] /fs/szasmg/metagenomics/Partition/MetaPart/src/Partition.py [latest, but possibly unstable builds]
Options
-p Populate the given partition/XML. -b Given the input file, build a partition. -m Metadata file that will be used to populate the partitions. -h Header information for the metadata, if not present column information for metadata will be found in first line of the metadata. -f Input fasta file. -s Split the fasta file based on the partition information and output to the directory. -o Name of the output .part file. -c Convert an old partition format into the new xml format.
Tutorial
Format of an input regular expression (*.re) file
The format of an
[name] key1 = value1 key2 = value2 ... [name2] key1 = value1 ...
Example of an input file that clusters animals by their first letter:
[animals] info = all animals [A] info = animals that start with A regexp = a.* [B] info = animals that start with B regexp = b.* ...
Using the format above, the partition can only have two levels. It is possible to have multiple levels, but the input file needs to be an xml file (explained below).