Semantic Search Engine project (Karsha Pariksha)
Project Introduction
Karsha Pariksha - Document Annotation and Semantic Search (DASS) - a tool to annotate financial documents using terms from the Financial Industry Business Ontology (FIBO) and a semantics based search engine. We initially focus on bond prospectus-es from the Electronic Municipal Market Access system (EMMA) managed by The Municipal Securities Rulemaking Board (MSRB). EMMA is a comprehensive, centralized on-line source providing free access to municipal disclosures/ municipal bonds. These bonds are issued by states, counties, cities or their agencies to finance public-purpose projects –schools, roads, bridges, utilities, affordable housing, airports, hospitals, and other public facilities and programs.
Software Deliverables
1. Enhance current EMMA search facility
Karsha Document annotation and search tool currently has facility to search EMMA documents but needed to enhance search facility to Save the searches and retrieve back
Emma Search web page- http://emma.msrb.org/Search/Search.aspx This page is made available through our tool to conduct searches and download documents We may need to save search index (search fields), resulting document indentifies
Save the results in to Karsha Database
Optimized the script to download the documents, where detect previously downloaded documents and make a pointer to where document is currently in the database rather than downloading it again.
Save transaction data under Trade activity in to the database
Each contact has meta data as below, that needed to be save in to the database
Dated Date: 11/28/2012 Maturity Date: 12/01/2014 Interest Rate: 5 % Principal Amount At Issuance: $13,265,000 Initial Offering Price/Yield: 109.446
This work needs adding/modifying current database schema to capture above data.
2. Tools to conduct experiments on Pariksha
Cluster EMMA documents based on the Okapi similarity score
FIBO annotations of financial contracts based on recommended FIBO terms
FIBO annotations of financial contracts based on the AnnSim and AnnSim^+ metrics , this needs converting FIBO debt section in to OWL format to map the relationships among FIBO terms.
Reading List
Current tool(usrname/paswd:admin/admin) and source code repository
FIBO terms and related diagrams
Measuring Relatedness Between Scientific Entities in Annotation Datasets Guillermo Palma, Maria-Esther Vidal, Eric Haag, Louiqa Raschid and Andreas Thor
Comparing the Disease Signatures of Drugs Using Shared Annotations and Ontological Relatedness Guillermo Palma, Maria-Esther Vidal, Louiqa Raschid and Andreas Thor
Paper submitted to ICTER conference by Karsha
Tutorials on JSP servlet
Possible WSo2 tools