Main Page
Jump to navigation
Jump to search
Consult the User's Guide for information on using the wiki software.
COLLABORATIVE: ABI Development: Methodology for Pattern Creation, Imprint Validation, and Discovery from the Annotated Biological Web (PattArAn)
Acknowledgement
This project is funded by the National Science Foundation Biological Infrastructure NSF DBI program.
UMD award U. Iowa award St. Bonaventure U award
Intellectual Merit
The project will develop a methodology that exploits the wealth of annotation knowledge, notably Gene Ontology (GO) and Plant Ontology (PO) annotations of Arabidopsis genes.
Motivated by the availability of rich and as yet insufficiently tapped collections of gene annotations, the project aims to facilitate the discovery of hidden knowledge that could be the basis of further scientific research. The methodology will extract patterns of interest from annotation graphs (pattern discovery). Literature-based methods will extract sentences that validate the biological meaning underlying these patterns (pattern validation).
To demonstrate the methodology, the PattArAn tool (Patterns in Arabidopsis Annotations) will be customized for Arabidopsis. PattArAn will provide the user with a graphical presentation of patterns of Arabidopsis genes and associated GO and PO CV terms. Graph data mining techniques and efficient algorithmic solutions to identify dense subgraphs (DSG) and to perform graph summarization (GS) will be developed. Algorithms to mine the literature for relevant sentences for an extracted pattern (referred to as the imprint) will be developed. PattArAn will enable iterative exploration and will incorporate allied steps such as consulting gene function prediction. The project will involve collaboration with biologists for building and refining annotation graphs, and validating patterns to ensure relevance to their research.
Broader Impact
This project brings broad contributions to the Arabidopsis thaliana community; PattArAn may assist Arabidopsis curators to manage GO-PO annotations. It can also be used to bootstrap an annotation database for other plant species. The project will offer significant research and educational experiences for graduate students (University of Maryland and Iowa) and undergraduate students (St. Bonaventure University). Team members will continue to mentor women and students from under-represented communities, participate in outreach activities, lead a Journal Club, etc. The outcomes from this research project will be disseminated via biology and bioinformatics venues.
Publications
An Evaluation of Metrics to Compute Concept Similarity Based on Evidence from Ontologies Guillermo Palma, Eric Haag, Louiqa Raschid, Andreas Thor, Maria Esther Vidal Email louiqa@umiacs.umd.edu for a copy.
PAnG - Finding Patterns in Annotation Graphs P. Anderson, J. Benik, L. Raschid, A. Thor and M. E. Vidal Proceedings of the ACM SIGMOD International Conference (Demonstration Paper), 2012. [1]
MeSH: a window into full text for document summarization. Sanmitra Bhattacharya, Viet Ha-Thuc, Padmini Srinivasan Journal of Bioinformatics [ISMB/ECCB] 27(13): 120-128 (2011) [2]
Link Prediction for Annotation Graphs using Graph Summarization Andreas Thor, Philip Anderson, Louiqa Raschid, Saket Navlakha, Barna Saha, Samir Khuller, Xiao-Ning Zhang Proceedings of the International Semantic Web Conference 2011. [3]
Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs Barna Saha and Allison Hoch and Samir Khuller and Louiqa Raschid and Xiao-Ning Zhang In Research in Computational Molecular Biology. Lecture Notes in Computer Science 6044 Springer ISBN 978-3-642-12682-6. Also Proceedings of RECOMB 2010. Bonnie Berger, editor. Pages 456-472, 2010. [4]
Visualization Tools and Datasets
PattArAn semEP visualization
PattArAn DSG+GS graph summary portal for Arabidopsis
Arabidopsis genes, GO and PO annotations as of Fall 2012 [5] This site will let you search for genes based on GO or PO terms and visualize their annotations as well as graph summaries.
PattArAn ALI portal for Arabidopsis
This is a manually curated "ground truth" database corresponding to the literature based imprint for triplets or doublets of elected Arabidopsis genes and their GO and PO annotations. [6]
PAnG portal for LinkedCT
Clinical trials and Condition and Interventions from LinkedCT.org as of xxx 2012 [7] This site will let you search for clinical trials based on conditions and interventions and let you visualize the graphs and graph summaries.
Dataset
Format for semEP input. I picked file names in an arbitrary manner. genes.txt - list of genes. geneGO.txt - gene-GO annotations; should be in a separate file since it needs to be refreshed. GO.txt - contains GO ID, GO term, GO link to AMIGO? GOGOdist.txt - distance for pairs of GO terms. [Check with Guillermo re: computing using d_tax code.]
Summer 2012 Imprint Experiment
Mahmuda Khan's experience with finding the imprints for sentences is described here. File:Mahmuda Summer2001.pptx
Overview of the project and instructions to the supervisors are available here. File:Overview.docx
Annotation flowchart. File:FlowchartAnnotation.pdf
Padmini's worksheet [8]
DILS poster [9]