Main Page: Difference between revisions

From pattaran
Jump to navigation Jump to search
No edit summary
 
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''MediaWiki has been successfully installed.'''
Consult the [//meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.
 
==COLLABORATIVE: ABI Development: Methodology for Pattern Creation, Imprint Validation, and Discovery from the Annotated Biological Web (PattArAn)==
 
===Acknowledgement===
 
This project is funded by the National Science Foundation Biological Infrastructure [http://www.nsf.gov/div/index.jsp?div=DBI NSF DBI] program.
 
[http://nsf.gov/awardsearch/showAward.do?AwardNumber=1147144 UMD award]  [http://nsf.gov/awardsearch/showAward.do?AwardNumber=1146256 U. Iowa award] [http://nsf.gov/awardsearch/showAward.do?AwardNumber=1146300 St. Bonaventure U award]
 
===Intellectual Merit===
 
The project will develop a methodology that exploits the wealth of annotation knowledge,
notably Gene Ontology (GO) and Plant Ontology (PO) annotations of Arabidopsis genes.
 
Motivated by the availability of rich and as yet insufficiently tapped  collections of gene annotations,
the project aims to facilitate the discovery of hidden knowledge that could be the basis of further
scientific research.  The methodology will extract patterns of interest from annotation graphs (pattern
discovery). Literature-based methods will extract sentences that validate the  biological meaning
underlying these patterns (pattern validation).
 
To demonstrate the methodology, the PattArAn tool (Patterns in Arabidopsis Annotations) will be
customized for Arabidopsis. PattArAn will provide the user with a graphical presentation of patterns
of Arabidopsis genes and associated GO and PO CV terms. Graph data mining techniques and
efficient algorithmic solutions to identify dense subgraphs (DSG) and to perform graph
summarization (GS) will be developed. Algorithms to mine the literature for  relevant sentences for
an extracted pattern (referred to as the imprint) will be developed. PattArAn will enable iterative
exploration and will incorporate allied steps such as consulting gene function prediction.
The project will involve collaboration with biologists for building and refining annotation graphs, and
validating patterns to ensure relevance to their research.
 
===Broader Impact===
 
This project brings broad contributions to the Arabidopsis thaliana community;
PattArAn may assist Arabidopsis curators to manage GO-PO annotations.  It can
also be used to bootstrap an annotation database  for other plant species.  The 
project will offer significant research and educational experiences for graduate 
students (University of Maryland and Iowa) and undergraduate students (St.
Bonaventure University).  Team members will continue to mentor women and students
from under-represented communities, participate in outreach activities, lead a 
Journal Club, etc. The outcomes from this research project will be disseminated 
via biology and bioinformatics venues.
 
==Publications==
 
An Evaluation of Metrics to Compute Concept Similarity Based on Evidence from Ontologies
Guillermo Palma, Eric Haag, Louiqa Raschid, Andreas Thor, Maria Esther Vidal
Email louiqa@umiacs.umd.edu for a copy.
 
PAnG - Finding Patterns in Annotation Graphs
P. Anderson, J. Benik, L. Raschid, A. Thor and M. E. Vidal
Proceedings of the ACM SIGMOD International Conference (Demonstration Paper), 2012. [http://www.informatik.uni-trier.de/~ley/db/conf/sigmod/sigmod2012.html#AndersonTBRV12]
 
MeSH: a window into full text for document summarization.
Sanmitra Bhattacharya, Viet Ha-Thuc, Padmini Srinivasan
Journal of Bioinformatics [ISMB/ECCB] 27(13): 120-128 (2011) [http://www.informatik.uni-trier.de/~ley/db/journals/bioinformatics/bioinformatics27.html#BhattacharyaHS11]
 
Link Prediction for Annotation Graphs using Graph Summarization
Andreas Thor, Philip Anderson, Louiqa Raschid,
Saket Navlakha, Barna Saha, Samir Khuller, Xiao-Ning Zhang
Proceedings of the International Semantic Web Conference 2011. [http://www.informatik.uni-trier.de/~ley/db/conf/semweb/iswc2011-1.html#ThorARNSKZ11]
 
Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs
Barna Saha and Allison Hoch and Samir Khuller and Louiqa Raschid and Xiao-Ning Zhang
In Research in Computational Molecular Biology.
Lecture Notes in Computer Science 6044 Springer ISBN 978-3-642-12682-6.
Also Proceedings of RECOMB 2010. Bonnie Berger, editor. Pages 456-472, 2010. [http://www.informatik.uni-trier.de/~ley/db/conf/recomb/recomb2010.html#SahaHKRZ10]
 
==Visualization Tools and Datasets==


Consult the [//meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.
===PattArAn semEP visualization===
 
===PattArAn DSG+GS graph summary portal for Arabidopsis===
 
Arabidopsis genes, GO and PO annotations as of Fall 2012 [http://pattaran.umiacs.umd.edu/]
This site will let you search for genes based on GO or PO terms and visualize their annotations
as well as graph summaries.
 
===PattArAn ALI portal for Arabidopsis===
 
This is a manually curated  "ground truth" database corresponding to the literature based imprint
for triplets or doublets of elected Arabidopsis genes and their GO and PO annotations.
[http://pang.umiacs.umd.edu/sentenceutility/]
 
===PAnG portal for LinkedCT===
 
Clinical trials and Condition and Interventions from LinkedCT.org as of xxx 2012 [http://pang.umiacs.umd.edu/linkedct.html]
This site will let you search for clinical trials based on conditions and interventions and
let you visualize the graphs and graph summaries.
 
===Dataset===
 
Format for semEP input. I picked file names in an arbitrary manner.
genes.txt - list of genes.
geneGO.txt - gene-GO annotations; should be in a separate file since it needs to be refreshed.
GO.txt - contains GO ID, GO term, GO link to AMIGO?
GOGOdist.txt - distance for pairs of GO terms. [Check with Guillermo re: computing using d_tax code.]
 
== Summer 2012 Imprint Experiment ==
 
Mahmuda Khan's experience with finding the imprints for sentences is described here.
[[File:Mahmuda_Summer2001.pptx]]
 
 
Overview of the project and instructions to the supervisors are available here.
[[File:Overview.docx]]
 
Annotation flowchart.
[[File:FlowchartAnnotation.pdf]]
 
Padmini's worksheet [https://docs.google.com/document/d/19ISt8qu4RRnpnCOJOg5H-_u6SWLRHU6RNVHlqo_9bxk/edit]
 
DILS poster [http://homepage.cs.uiowa.edu/~psriniva/UMD-UI/DILS_Poster_Final.pptx]


== Getting started ==
== Getting started ==

Latest revision as of 06:04, 25 June 2014

Consult the User's Guide for information on using the wiki software.

COLLABORATIVE: ABI Development: Methodology for Pattern Creation, Imprint Validation, and Discovery from the Annotated Biological Web (PattArAn)

Acknowledgement

This project is funded by the National Science Foundation Biological Infrastructure NSF DBI program.
UMD award  U. Iowa award St. Bonaventure U award

Intellectual Merit

The project will develop a methodology that exploits the wealth of annotation knowledge, 
notably Gene Ontology (GO) and Plant Ontology (PO) annotations of Arabidopsis genes.
Motivated by the availability of rich and as yet insufficiently tapped  collections of gene annotations, 
the project aims to facilitate the discovery of hidden knowledge that could be the basis of further 
scientific research.  The methodology will extract patterns of interest from annotation graphs (pattern 
discovery). Literature-based methods will extract sentences that validate the  biological meaning 
underlying these patterns (pattern validation). 
To demonstrate the methodology, the PattArAn tool (Patterns in Arabidopsis Annotations) will be 
customized for Arabidopsis. PattArAn will provide the user with a graphical presentation of patterns 
of Arabidopsis genes and associated GO and PO CV terms. Graph data mining techniques and 
efficient algorithmic solutions to identify dense subgraphs (DSG) and to perform graph 
summarization (GS) will be developed. Algorithms to mine the literature for  relevant sentences for 
an extracted pattern (referred to as the imprint) will be developed. PattArAn will enable iterative 
exploration and will incorporate allied steps such as consulting gene function prediction. 
The project will involve collaboration with biologists for building and refining annotation graphs, and 
validating patterns to ensure relevance to their research.

Broader Impact

This project brings broad contributions to the Arabidopsis thaliana community; 
PattArAn may assist Arabidopsis curators to manage GO-PO annotations.  It can 
also be used to bootstrap an annotation database  for other plant species.  The  
project will offer significant research and educational experiences for graduate  
students (University of Maryland and Iowa) and undergraduate students (St. 
Bonaventure University).  Team members will continue to mentor women and students 
from under-represented communities, participate in outreach activities, lead a  
Journal Club, etc. The outcomes from this research project will be disseminated  
via biology and bioinformatics venues.

Publications

An Evaluation of Metrics to Compute Concept Similarity Based on Evidence from Ontologies
Guillermo Palma, Eric Haag, Louiqa Raschid, Andreas Thor, Maria Esther Vidal
Email louiqa@umiacs.umd.edu for a copy.
PAnG - Finding Patterns in Annotation Graphs
P. Anderson, J. Benik, L. Raschid, A. Thor and M. E. Vidal
Proceedings of the ACM SIGMOD International Conference (Demonstration Paper), 2012. [1]
MeSH: a window into full text for document summarization.
Sanmitra Bhattacharya, Viet Ha-Thuc, Padmini Srinivasan
Journal of Bioinformatics [ISMB/ECCB] 27(13): 120-128 (2011) [2]
Link Prediction for Annotation Graphs using Graph Summarization
Andreas Thor, Philip Anderson, Louiqa Raschid, 
Saket Navlakha, Barna Saha, Samir Khuller, Xiao-Ning Zhang
Proceedings of the International Semantic Web Conference 2011. [3]
Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs
Barna Saha and Allison Hoch and Samir Khuller and Louiqa Raschid and Xiao-Ning Zhang 
In Research in Computational Molecular Biology.
Lecture Notes in Computer Science 6044 Springer ISBN 978-3-642-12682-6.
Also Proceedings of RECOMB 2010. Bonnie Berger, editor. Pages 456-472, 2010. [4]

Visualization Tools and Datasets

PattArAn semEP visualization

PattArAn DSG+GS graph summary portal for Arabidopsis

Arabidopsis genes, GO and PO annotations as of Fall 2012 [5]
This site will let you search for genes based on GO or PO terms and visualize their annotations
as well as graph summaries.

PattArAn ALI portal for Arabidopsis

This is a manually curated  "ground truth" database corresponding to the literature based imprint 
for triplets or doublets of elected Arabidopsis genes and their GO and PO annotations.
[6]

PAnG portal for LinkedCT

Clinical trials and Condition and Interventions from LinkedCT.org  as of xxx 2012 [7]
This site will let you search for clinical trials based on conditions and interventions and 
let you visualize the graphs and graph summaries.

Dataset

Format for semEP input. I picked file names in an arbitrary manner.
genes.txt - list of genes.
geneGO.txt - gene-GO annotations; should be in a separate file since it needs to be refreshed.
GO.txt - contains GO ID, GO term, GO link to AMIGO?
GOGOdist.txt - distance for pairs of GO terms. [Check with Guillermo re: computing using d_tax code.]

Summer 2012 Imprint Experiment

Mahmuda Khan's experience with finding the imprints for sentences is described here. 
File:Mahmuda Summer2001.pptx


Overview of the project and instructions to the supervisors are available here.
File:Overview.docx
Annotation flowchart.
File:FlowchartAnnotation.pdf
Padmini's worksheet [8]
DILS poster [9]

Getting started