Semantic Search Engine project (Karsha Pariksha)

From ngfci
Jump to: navigation, search

Project Introduction

Karsha Pariksha - Document Annotation and Semantic Search (DASS) - a tool to annotate financial documents using terms from the Financial Industry Business Ontology (FIBO) and a semantics based search engine. We initially focus on bond prospectus-es from the Electronic Municipal Market Access system (EMMA) managed by The Municipal Securities Rulemaking Board (MSRB). EMMA is a comprehensive, centralized on-line source providing free access to municipal disclosures/ municipal bonds. These bonds are issued by states, counties, cities or their agencies to finance public-purpose projects –schools, roads, bridges, utilities, affordable housing, airports, hospitals, and other public facilities and programs.


Software Deliverables

1. Enhance current EMMA search facility

Karsha Document annotation and search tool currently has facility to search EMMA documents but needed to enhance search facility to Save the searches and retrieve back

Emma Search web page- http://emma.msrb.org/Search/Search.aspx This page is made available through our tool to conduct searches and download documents We may need to save search index (search fields), resulting document indentifies

Save the results in to Karsha Database

Optimized the script to download the documents, where detect previously downloaded documents and make a pointer to where document is currently in the database rather than downloading it again.

Save transaction data under Trade activity in to the database

Each contact has meta data as below, that needed to be save in to the database

Dated Date: 11/28/2012
Maturity Date: 12/01/2014
Interest Rate: 5 % 
Principal Amount At Issuance: $13,265,000
Initial Offering Price/Yield: 109.446

This work needs adding/modifying current database schema to capture above data.


2. Tools to conduct experiments on Pariksha

Cluster EMMA documents based on the Okapi similarity score

FIBO annotations of financial contracts based on recommended FIBO terms

FIBO annotations of financial contracts based on the AnnSim and AnnSim^+ metrics , this needs converting FIBO debt section in to OWL format to map the relationships among FIBO terms.


Reading List

Karsha DASS wiki

Current tool(usrname/paswd:admin/admin) and source code repository

FIBO terms and related diagrams

Measuring Relatedness Between Scientific Entities in Annotation Datasets Guillermo Palma, Maria-Esther Vidal, Eric Haag, Louiqa Raschid and Andreas Thor

Comparing the Disease Signatures of Drugs Using Shared Annotations and Ontological Relatedness Guillermo Palma, Maria-Esther Vidal, Louiqa Raschid and Andreas Thor

Paper submitted to ICTER conference by Karsha File:Ontology Based Annotation Mechanism for Financial Documents.pdf

Tutorials on JSP servlet

Possible WSo2 tools