Cbcb:Pop-Lab:Ted-Report: Difference between revisions

From Cbcb
Jump to navigation Jump to search
(Restructured the page to link to entries from previous years rather than storing them all on one page)
Line 1: Line 1:
== Summer 2009 Goals ==
== Older Entries ==
* Research traditional approaches taken to gene-level analysis of metagenomic data <br>
[[Cbcb:Pop-Lab:Ted-Report-2009 | 2009]]
* Critically evaluate the traditional approaches in general and in the context of current Pop Lab projects <br>
* Identify portions of the analysis that can be automated <br>
* Develop scalable tools to do the automated analysis
 
== May 15, 2009 ==
* Read [[VisANT]] paper and user manual[http://visant.bu.edu/]. Determined VisANT will work for manual metabolic pathway analysis of even large scale data sets and can be automated by running in "Batch Mode". <br>
* Need to read about FastHMM[http://www.microbesonline.org/fasthmm/]
* Still need to make "Welcome Wiki" for n00bs (read: new members)
 
== May 22, 2009 ==
* Made Welcome Wiki
* Read metagenomics papers
* Determined that VisANT can be used with Bo's data by importing it as MicroArray data
 
== May 29, 2009 ==
* Took an early Summer vacation last weekend:
** Drove to NC to see friend graduate with BS' in CS & Physics
** Went sailing for the first time at girlfriends' parents' place in VA
* Refined Welcome Wiki
* Read metagenomics/pathway reconstruction/analysis papers
* Organized reading group for Palsson Systems Bio book
 
== June 5, 2009 ==
* Read metagenomics/pathway reconstruction/analysis papers and first two chapters of Palsson book.
* Currently building test set for incorporation of Phymm into metagenomics pipeline.
** A single archaeal genome was chosen from the findings of Mihai's 2006 Science paper analyzing the human distal gut.
** Two pairs of bacterial genomes were chosen for the test set using columns on the NCBI RefSeq ''Complete Genomes'' website[http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi]:
*# The pairs of bacterial genomes were taken from the ''Groups'': Bacteroidetes/Chlorobi and Firmicutes because they are the two most predominant groups present in the human gut.
*# I chose genomes with a complete set of NCBI ''Tools'' available.
*# After this I attempted to choose genomes with significantly different GC content.
*# Finally, preference was given to organisms I recognized from gut microbiome papers/discussions, or failing that, just a really awesome name.
 
=== Description of Trial Metagenomic Dataset ===
 
::{| class="wikitable" border="1"
|-
!  Organism
!  Classification
!  Genome Length
|- align="center"
|  align="left"|Methanobrevibacter_smithii_ATCC_35061
|  archaea
|  1,853,160 bp
|- align="center"
|  align="left"|Bacteroides fragilis NCTC 9343
|  bacteroidetes
|  5,205,140 bp
|- align="center"
|  align="left"|Porphyromonas gingivalis W83
|  bacteroidetes
|  2,343,476 bp
|- align="center"
|  align="left"|Aster yellows witches'-broom phytoplasma AYWB
|  firmicutes
|  706,569 bp
|- align="center"
|  align="left"|Bacillus subtilis subsp. subtilis str. 168
|  firmicutes
|  4,214,630 bp
|}
Note: plasmid DNA was not included <br>
The combined length for all the genomes is: 14,322,975 bp
 
== June 12, 2009 ==
* Today is my birthday!!! :D
* Last week's meeting was a success!
** The books came in Wednesday so the reading is now readily available.
** Read chapter 3 and part of the 2006 paper for Friday.
* Met with Arthur to discuss Phymm.
** I have gotten better at using MetaSim and have generated the previously described test data set composed of 1 million 200bp reads, approximately 15x coverage (actual average read length: 215.42bp).
** I have been relearning how to use the AMOS package in preparation of piping the output from Phymm into it
** Note: It appears that Phymm can be "parallelized" by dividing the query file into smaller files and merging the output files. According to Arthur, each read is scored independently. So the only limits are the number of reads and the number of processors.
*** I have parsed the test set into 10 files and intend to run them simultaneously on Ginkgo.
** I backed up his copy of RefSeq Bacteria/Archaea and removed the test sequences from the original database. I now need to find a way to remove the sequences from the BLAST database and I'm ready to go.
 
== June 19 & 26, 2009 ==
* Forgot to update these weeks so this is from memory
* I moved all my stuff to /fs/szattic-asmg4/tgibbons/
** Hopefully they get the backups set up soon...
* Arthur and I got phymmBL to run on those files I mentioned last time
* I concatenated the output files back together using merge sort:
sort -um results.03.combined_<input file prefix>* > <output file prefix>.fna
* I wrote a post-processing script for the phymm output called phymm2minimus.pl
** The script outputs to STDOUT the completeness at each taxonomic level of classifications
** Version 3 of the script then iterates through the original script, creating bins for each taxonomic level
** The scripts and test files are located in /fs/szattic-asmg4/tgibbons/phymm/
 
== July 3, 2009 ==
I'm back to reading about metabolic network analysis and finally getting some tractable ideas <br>
I'm trying to create a data structure that will store the matrix: <br>
RN.#####            (potential reactions capable by query sequences)
        .EC.#####  (array of ECs of proteins necessary for reaction)
                  .# (stoichiometry of protein in reaction; not sure if this will work as a child of a pointer)
        .CO.#####  (array of compounds necessary for reaction)
                  .# (stoichiometry of compound in reaction; not sure if this will work as a child of a pointer)
        .r.bool    (reversibility of reaction, shown by => / <=> in KEGG RN equations)
 
Plus two additional data structures for manipulation by protein and/or compound:
EC.#####            (array of ECs in query)
        .RN.#####    (array of reactions in which each KO participates)
 
CO.#####            (array of compounds used in any reactions)
        .RN.#####    (array of reactions in which each compound participates)
        .ChEBI.##### (ChEBI number for compound, or maybe just use the ChEBI number in the first place)
 
This data structure should be thought of an R x C matrix plus metadata (such as necessary proteins). Other thoughts on the use of the matrix:
* It would be interesting to evaluate the matrix using chemical moieties (ex. CoA, AMP) as described in section 6.5 Palsson's book, but that would require more significant processing of the KEGG information and might break the above-described data structure. Palsson offers a list of common metabolic moieties in Table 6.2 on p. 102, which he has adapted from another bioengineering book. The KEGG database implicitly treats enzymes as moieties.
* A common question will be to find the metabolites required to make the essential building blocks for an organism. To answer this we will need a list of said building blocks, or target products. Palsson offers such a list in Table 6.1 on p. 98, which he has adapted from the same bioengineering book.
* Next we will likely want to fill in the gaps. This may require the storage of additional information from the assignments of protein sequences into KOs, as well as the expansion of the arrays. Alternatively, it may be better to simply create all new data structures but that would likely end up being redundant. With that information we could then attempt statistical analysis of partial reactions to determine the likely hood that the lacking component was simply overlooked by sampling or something.
* A further step may be to evaluate the gene-set's ability to up-take the necessary precursors through either passive or active means.
* Another interesting form of analysis, when possible, would be to multiply the reaction matrix by the concentrations of substrates in the environment (presumably determined by mass spec) and observe the potential output concentrations.
 
Useful files I'm considering using for the network analysis:
* [ftp://ftp.genome.jp/pub/kegg/genes/ko ftp://ftp.genome.jp/pub/kegg/genes/ko] contains detailed information about each KO
* [ftp://ftp.genome.jp/pub/kegg/ligand/enzyme/enzyme ftp://ftp.genome.jp/pub/kegg/ligand/enzyme/enzyme] contains detailed enzyme information by EC number instead of KO
* [ftp://ftp.genome.jp/pub/kegg/ligand/reaction/reaction ftp://ftp.genome.jp/pub/kegg/ligand/reaction/reaction] contains detailed information about each RN, including enzyme and compound information
* [ftp://ftp.genome.jp/pub/kegg/ligand/compound/compound ftp://ftp.genome.jp/pub/kegg/ligand/compound/compound] contains, as you might be guessing by now, compound information
 
Proposed method of building arrays:
# Assign protein sequences to KOs
# Build list of reactions associated with each KO
# Eliminate any reaction lacking necessary protein(s)
#* Alternatively, store the partial reactions elsewhere or mark them as incomplete or something
# Build R, KO, and C from the final list of reactions using the files above
 
== July 10, 2009 ==
Goals for July:
* Decide on a method for assigning protein sequences to KOs and implement it
* Implement data structure described in July 3rd entry using C++
** Test output by printing to tab-delimited files that can be viewed in Excel
** Start playing around with software tools using test data
* Finish Part II of the Palsson book
Algorithm for elucidating all necessary precursors (warning: iz fugly):
# Push all the precursors onto a stack
# Iterate through precursors
# For any reaction containing a positive coefficient for the precursor, push the reactants (compounds with negative coefficients) of the reaction onto a stack
# *A miracle happens*
# Print the output as a network of reactions(?)
 
== July 17, 2009 ==
Talked to Dan:
* Regenerated test set (see June 5 entry) of 1,000,000 reads with default 454 error model and a uniform read length of 200bp
Other than that I'm trying to read up on pathway analysis and then help James with his project for Volker, but complications from moving last weekend are making this a disappointingly unproductive week.
 
== July 24 & 31, 2009 ==
* I have been learning bash scripting to finish integrating phymm into the AMOS pipeline. I seem to be nearly done, but I'm running into problems with the MetaSim data set. Hopefully repeating the procedures will at least help smooth out the edges.
* I've also resumed using Google Code via SVN to backup and log my work, as well at make it available to the lab: code.google.com/p/pop-lab/
** Note: Google Code is currently testing an implementation of Mercurial and should begin supporting it for the general user pool soon!
* As for the pathway analysis, I realized at this week's lab meeting that I've been brainstorming more or less exactly what Bo's been implementing. So I'm back to reading without a strong sense of direction.
* After much playing around with MetaSim I became more comfortable with how to use it and figured out some of the quirks not described in the manual. I've generated another test data set from the same 5 sequences and found some strange inconsistencies. The same number of files have been output, however the new files are smaller, and at the same time the phymm output files are larger. Also phymm appears to have created significantly different bins for the new data set, despite being generated from the same sequences. The average taxonomic completeness at a given taxonomic level for the first set was about 98%. For the new set the average is around 80% (with the noteworthy exception of Family at 98%).
 
== August 7, 2009 ==
* I have apparently fixed all the bugs in my phymm2minimus.pl script regarding handling special characters in the NCBI taxonomy file! I am pretty happy/excited about this. Consequently, I have another set of bins I can attempt to assemble now.
* Arrrgh! toAmos is still telling me the fasta sequence entries have 208 entries while the fasta trace entries have 312 entries! Blaaaarrgh!!!
 
== August 28, 2009 ==
Focus on automated analysis of host-pathogen interactions in metagenomic samples
 
== September 25, 2009 ==
I've been making good progress testing the effects of prebinning on minimus assemblies. So far I'm still using the test set described in the [[Cbcb:Pop-Lab:Ted-Report#June_5.2C_2009|June 5 entry]], although I'm now generating reads using a simpler perl script written by Arthur Brady instead of MetaSim.
 
== October 9, 2009 ==
Generated three sets of a million 200bp reads, assembled each with Minimus, then for each set:
* Labeled each read with PhymmBL (which generates labels with BLAST as an intermediate step)
* Generated bins at six phylogenetic levels (phylum-phylotype) using each labeling method (BLAST & PhymmBL)
* Assembled each bin using Minimus
* Combined the contigs from each bin along with the unbinned reads (not all reads have a classification at all phylogenetic levels), and then assembled the full set with Minimus
I then calculated the n50s and compared each assembly with dnadiff:
* Unlike with the reads generated with MetaSim, I did not observe a ~2.5x increase in n50 size. In fact, the n50 was a little bit lower after binning.
** Unfortunately, because the MetaSim reads were suspect due to excessive incompleteness of the taxanomic assignment, these results seem more reliable
** The n50s were
* The dnadiff reports were also not encouraging, showing a reduction in total length of all contigs from the expected 14Mbps to 11Mbps
Next step:
* Find out what went wrong. For now this means looking for:
** Differences in read assignments between the binned and unbinned assemblies
** Homogeneity of reads in bins (ie. make sure that even if reads from a single genome have been scattered across many bins, bins do not contain reads from a variety of genomes)
To do list for next week:
# Check for homogeneity in Phymm Bins
# Create gene annotation/comparison pipeline for Volker
# Modify Phymm
 
== October 16, 2009 ==
* Check for homogeneity in Phymm Bins
** Traced back read labels to original sequences to discover the source labels are lost while sampling reads :(
** So now I'm generating a new set (testSet10) with equal 10x coverage for each of the five genomes and will cat the multi-fasta read files instead of the genome fasta files. Then, I'll repeat two days of computation and finally be where I thought I'd started. The new test will have the following composition:
 
=== Description of Regenerated Trial Metagenomic Dataset ===
 
::{| class="wikitable" border="1"
|-
!  Organism
!  ID
!  Classification
!  Genome Length
!  # 200bp Reads
|- align="center"
|  align="left"|Methanobrevibacter_smithii_ATCC_35061
|  NC_009515
|  archaea
|  1,853,160 bp
|  92,658
|- align="center"
|  align="left"|Bacteroides fragilis NCTC 9343
|  NC_003228
|  bacteroidetes
|  5,205,140 bp
|  260,257
|- align="center"
|  align="left"|Porphyromonas gingivalis W83
|  NC_002950
|  bacteroidetes
|  2,343,476 bp
|  117,174
|- align="center"
|  align="left"|Aster yellows witches'-broom phytoplasma AYWB
|  NC_007716
|  firmicutes
|  706,569 bp
|  35,328
|- align="center"
|  align="left"|Bacillus subtilis subsp. subtilis str. 168
|  NC_000964
|  firmicutes
|  4,214,630 bp
|  210,732
|}
 
* Create gene annotation/comparison pipeline for Volker
*# BLAST genes against genomes
*#* All genes vs all genomes probably isn't a very practical approach. Consider an initial setup step that blasts all genes vs all genomes and stores the resulting hits in a database for later use, or allowing user to select the queries and targets. Probably the first approach would be better because it wouldn't take up <i>too</i> much space and would allow rapid, flexible analysis. I should talk to Bo about setting up something like that.
*# Rank genes by %ID ratio between user-defined partition
*# Provide other associated information for genes such as locale
** This is all inspired by feed back I got from Volker and Petros Karakousis at Johns Hopkins. After showing them an ordered list of genes that were ranked by the ratio of their presence in virulent strains minus their presence in non-virulent strains, they had many questions about the top hits (where "virulence" was defined as being a pulmonary pathogen). They found it interesting that there were nearly 80 genes perfectly conserved across all virulent strains, yet completely absent in the non-virulent strains. They didn't recognize most of the genes however, and seemed curious about the kinds of data that could be provided by James' other scripts.
* <s>Modify Phymm</s>
** <s>Create personal directory in phymmTests filled with mostly links back to phymmTests</s>
** <s>done.</s> <s>Broke something.</s> Fixed it.
* I have midterms the next two weeks, so things will probably be slow going. Should have known when I resisted studying at the beginning of the semester that it would catch up with my research eventually.
 
== October 30, 2009 ==
Trying to frame mycobacterium genomic analysis such that it's appropriate to use as my project in Carl's Biological Networks class:
* Have a collection of mycobacterium genomes and plasmids, some annotated, some not
* Need to compare networks across genomes/plasmids
** Look into pathBLAST, Genemerge, other comparative genomic software
* First must:
** Quickly annotate genes in novel sequences
*** Gene finding software
*** Blast known genes in other strains
*** Find some way to combine results of both
** Identify pathways in all sequences
*** Assign genes to KO #s
 
=== NCBI Info Summary for Mycobacterium Genome Sequences ===
 
* [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi NCBI] has 22 completed genomic sequences representing as many strains, plus 9 plasmid sequences non-uniformly distributed among them
* [http://www.genome.jp/kegg/catalog/org_list.html KEGG] has pathway annotations for 21 strains
* 3 strains were annotated by TIGR, none containing plasmids. These will be treated as the highest confidence annotations.
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=88 Mycobacterium avium 104]
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=92 Mycobacterium smegmatis str. MC2 155]
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=223 Mycobacterium tuberculosis CDC1551]
* 5 strains were annotated by JGI, including 3 strains containing a total of 6 plasmids. These will be considered as having the next highest level of confidence.
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=15760 Mycobacterium gilvum PYR-GCK] - contains 3 plasmids
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=16079 Mycobacterium sp. JLS]
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=16081 Mycobacterium sp. KMS] - contains 2 plasmids
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=15762 Mycobacterium sp. MCS] - contains 1 plasmid
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=15761 Mycobacterium vanbaalenii PYR-1]
* 6 strains were annotated by other centers I've heard regularly referenced and will probably treat with some intermediate level of confidence if needed.
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=89 Mycobacterium bovis AF2122/97] - Sanger
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=90 Mycobacterium leprae TN] - Sanger
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=16725 Mycobacterium marinum M] - Sanger, 1 plasmid
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=15642 Mycobacterium tuberculosis F11] - Broad
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=224 Mycobacterium tuberculosis H37Rv] - Sanger
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=21055 Mycobacterium tuberculosis KZN 1435] - Broad
* 8 strains were annotated by various other institutes and will be treated with relatively low confidence.
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=15691 Mycobacterium abscessus] - Genoscope, 1 plasmid
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=91 Mycobacterium avium subsp. paratuberculosis K-10] - U Minnesota
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=18279 Mycobacterium bovis BCG] - Fiocruz Brazil
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=18059 Mycobacterium bovis BCG str. Pasteur 1173P2] - Institut Pasteur
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=31211 Mycobacterium bovis BCG str. Tokyo 172] - Japan BCG
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=31271 Mycobacterium leprae Br4923] - Institut Pasteur
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=18883 Mycobacterium tuberculosis H37Ra] - Chinese National HGC, Shanghai
** [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=16230 Mycobacterium ulcerans Agy99] - Institut Pasteur, 1 plasmid
* All of these, plus the two novel sequences, will comprise the study set.
** Kansasii
** Gastri
 
== November 6, 2009 ==
Start testing Minimus for assembling metagenomic data sets from [http://www.nature.com/nmeth/journal/v4/n6/full/nmeth1043.html Nature Methods paper (Mavromatis et al. 2007)] under the following conditions:
 
=== Metagenomic Assembly Performance Analysis ===
 
::{| class="wikitable" border="1"
|-
!  Dataset
!  Read Length
!  Error Model
!  Overall Coverage
!  Dispersion
!  Total Assemblies
|- align="center"
|  align="left"|simMC, simLC, simHC
|  200bp, 400bp, 600bp
|  none, 454
|  1x, 3x, 10x
|  uniform, random skew (light, medium, heavy), dominant organism(s)
|  54 x dispersion cases
|- align="center"
|  align="left"|simMC, simLC, simHC
|  50bp, 100bp
|  none, Solexa
|  1x, 3x, 10x
|  uniform, random skew (light, medium, heavy), dominant organism(s)
|  36 x dispersion cases
|}
Current plan of attack for analyzing mycobacterial genomes (listed as [http://www.genome.jp/kegg/soap/doc/keggapi_manual.html KEGG SOAP API commands]):
* [http://www.genome.jp/kegg/soap/doc/keggapi_manual.html#label:106 get_genes_by_organism]
* get_*_by_gene where * represents: enzymes, best_neighbors, best_best_neighbors, reverse_best_neighbors, paralogs, motifs, ko, or pathways
Will build stoichiometry arrays with these functions and manipulate them with python
 
== November 13, 2009 ==
 
=== Subsets of Mycobacerial Strains ===
Met with Volker and trimmed down the study set considerably. I am now focusing on three phenotypically-defined subsets:
 
# Pulmonary Pathogens
#* H37Rv
#* CDC1551
#* Bovis
# Facultative Pathogens
#* Abscessus
#* Avium
#* Kansasii W58
#* Bovis BCG
#* H37Ra
# Environmental Non-Pathongenic
#* Smegmatis
#* Gastri
#* Vanbaalenii
#* Gilvum
 
With the intention of making the following comparisons: <br>
 
# Pulmonary vs Fac-Path
# Pulmonary vs Attenuated
# Fac-Path vs Non-Path
# Gastri vs Kansasii
#* Non-path vs Fac-path
#* Closely related
# Bovis vs (H37Rv & CDC1551)
#* Bovis not very contagious between humans, but just as deadly
# Pulmonary vs Non-Path
#* If time
 
=== Downloading Information From KEGG ===
 
Using SOAPpy to download information by gene for complete sets of genes takes hours. Luckily I've discovered that python supports "pickling" data structures, which saves them to files which can be easily imported by other programs. I've generated the following Python pickles. <br>
 
Most of the pickles are generated directly from SOAPpy functions as follows: <br>
"x_by_y.kegg" is generated with the associated SOAPpy function "get_x_by_y(y[i])" over all i. <br>
For nested data structures, "x_by_y_by_z.kegg", I nested the functions as you would expect except where otherwise noted.
 
Information by Organism:
# pathways_by_organism.kegg
#* each pathway name is appended by the organism name
#*filled directly from the associated SOAPpy function
# genPaths_by_organism.kegg
#* generalized pathway names
#* created by stripping the appended organism name
# genes_by_organism.kegg
#* filled directly from the associated SOAPpy function
Information by Gene:
# enzymes_by_gene.kegg
#* filled directly from the associated SOAPpy function
# kos_by_gene.kegg
#*filled directly from the associated SOAPpy function
Information by Enzyme:
# reactions_by_enzyme.kegg
#* filled directly from the associated SOAPpy function
# compounds_by_enzyme.kegg
#*filled directly from the associated SOAPpy function
Information by Reaction:
# enzymes_by_reaction.kegg
#* filled directly from the associated SOAPpy function
# compounds_by_reaction.kegg
#*filled directly from the associated SOAPpy function
Information by Pathway:
# reactions_by_pathway.kegg
#* filled directly from the associated SOAPpy function
 
=== Computationally Comparing the Metabolic Networks ===
 
For an early comparison, I like the representation scheme used in [http://www.pnas.org/content/105/38/14482.full this paper by Borenstein et al.] They represent pathways as directed networks with compounds as nodes and reactions as edges.


== January 15, 2010 ==
== January 15, 2010 ==
Line 401: Line 10:
* Option 1
* Option 1
* Option 2
* Option 2
* etc


=== Comparative Network Analysis pt. 2 ===
=== Comparative Network Analysis pt. 2 ===

Revision as of 16:00, 12 January 2010

Older Entries

2009

January 15, 2010

Minimus Documentation

  • Simple use case
    • Necessary tools for set up (toAmos)
  • Option 1
  • Option 2
  • etc

Comparative Network Analysis pt. 2