Main Page

From pattaran
Revision as of 06:04, 25 June 2014 by Louiqa (talk | contribs) (→‎Dataset)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Consult the User's Guide for information on using the wiki software.

COLLABORATIVE: ABI Development: Methodology for Pattern Creation, Imprint Validation, and Discovery from the Annotated Biological Web (PattArAn)

Acknowledgement

This project is funded by the National Science Foundation Biological Infrastructure NSF DBI program.
UMD award  U. Iowa award St. Bonaventure U award

Intellectual Merit

The project will develop a methodology that exploits the wealth of annotation knowledge, 
notably Gene Ontology (GO) and Plant Ontology (PO) annotations of Arabidopsis genes.
Motivated by the availability of rich and as yet insufficiently tapped  collections of gene annotations, 
the project aims to facilitate the discovery of hidden knowledge that could be the basis of further 
scientific research.  The methodology will extract patterns of interest from annotation graphs (pattern 
discovery). Literature-based methods will extract sentences that validate the  biological meaning 
underlying these patterns (pattern validation). 
To demonstrate the methodology, the PattArAn tool (Patterns in Arabidopsis Annotations) will be 
customized for Arabidopsis. PattArAn will provide the user with a graphical presentation of patterns 
of Arabidopsis genes and associated GO and PO CV terms. Graph data mining techniques and 
efficient algorithmic solutions to identify dense subgraphs (DSG) and to perform graph 
summarization (GS) will be developed. Algorithms to mine the literature for  relevant sentences for 
an extracted pattern (referred to as the imprint) will be developed. PattArAn will enable iterative 
exploration and will incorporate allied steps such as consulting gene function prediction. 
The project will involve collaboration with biologists for building and refining annotation graphs, and 
validating patterns to ensure relevance to their research.

Broader Impact

This project brings broad contributions to the Arabidopsis thaliana community; 
PattArAn may assist Arabidopsis curators to manage GO-PO annotations.  It can 
also be used to bootstrap an annotation database  for other plant species.  The  
project will offer significant research and educational experiences for graduate  
students (University of Maryland and Iowa) and undergraduate students (St. 
Bonaventure University).  Team members will continue to mentor women and students 
from under-represented communities, participate in outreach activities, lead a  
Journal Club, etc. The outcomes from this research project will be disseminated  
via biology and bioinformatics venues.

Publications

An Evaluation of Metrics to Compute Concept Similarity Based on Evidence from Ontologies
Guillermo Palma, Eric Haag, Louiqa Raschid, Andreas Thor, Maria Esther Vidal
Email louiqa@umiacs.umd.edu for a copy.
PAnG - Finding Patterns in Annotation Graphs
P. Anderson, J. Benik, L. Raschid, A. Thor and M. E. Vidal
Proceedings of the ACM SIGMOD International Conference (Demonstration Paper), 2012. [1]
MeSH: a window into full text for document summarization.
Sanmitra Bhattacharya, Viet Ha-Thuc, Padmini Srinivasan
Journal of Bioinformatics [ISMB/ECCB] 27(13): 120-128 (2011) [2]
Link Prediction for Annotation Graphs using Graph Summarization
Andreas Thor, Philip Anderson, Louiqa Raschid, 
Saket Navlakha, Barna Saha, Samir Khuller, Xiao-Ning Zhang
Proceedings of the International Semantic Web Conference 2011. [3]
Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs
Barna Saha and Allison Hoch and Samir Khuller and Louiqa Raschid and Xiao-Ning Zhang 
In Research in Computational Molecular Biology.
Lecture Notes in Computer Science 6044 Springer ISBN 978-3-642-12682-6.
Also Proceedings of RECOMB 2010. Bonnie Berger, editor. Pages 456-472, 2010. [4]

Visualization Tools and Datasets

PattArAn semEP visualization

PattArAn DSG+GS graph summary portal for Arabidopsis

Arabidopsis genes, GO and PO annotations as of Fall 2012 [5]
This site will let you search for genes based on GO or PO terms and visualize their annotations
as well as graph summaries.

PattArAn ALI portal for Arabidopsis

This is a manually curated  "ground truth" database corresponding to the literature based imprint 
for triplets or doublets of elected Arabidopsis genes and their GO and PO annotations.
[6]

PAnG portal for LinkedCT

Clinical trials and Condition and Interventions from LinkedCT.org  as of xxx 2012 [7]
This site will let you search for clinical trials based on conditions and interventions and 
let you visualize the graphs and graph summaries.

Dataset

Format for semEP input. I picked file names in an arbitrary manner.
genes.txt - list of genes.
geneGO.txt - gene-GO annotations; should be in a separate file since it needs to be refreshed.
GO.txt - contains GO ID, GO term, GO link to AMIGO?
GOGOdist.txt - distance for pairs of GO terms. [Check with Guillermo re: computing using d_tax code.]

Summer 2012 Imprint Experiment

Mahmuda Khan's experience with finding the imprints for sentences is described here. 
File:Mahmuda Summer2001.pptx


Overview of the project and instructions to the supervisors are available here.
File:Overview.docx
Annotation flowchart.
File:FlowchartAnnotation.pdf
Padmini's worksheet [8]
DILS poster [9]

Getting started