KarshaAnnSS: Difference between revisions
Jump to navigation
Jump to search
Line 3: | Line 3: | ||
==Karsha: Document Annotation and Semantic Search (DASS)== | ==Karsha: Document Annotation and Semantic Search (DASS)== | ||
Karsha DASS is a repository of financial documents that have been annotated using terms from the Financial | Karsha DASS is a repository of financial documents that have been annotated using | ||
We are developing a sample repository comprising a collection of bond prospectus (corporate and municipal bonds) and their supplements. | terms from the Financial Industry Business Ontology (FIBO). | ||
Documents can also be annotated using other ontologies and/or thesauri. | |||
We are developing a sample repository comprising a collection of bond prospectus | |||
(corporate and municipal bonds) and their supplements. | |||
Karsha constructs a Lucene index of sections of the document (indexing the keywords within sentences). It uses Okapi cosine keyword based similarity to compare the sections (sentences) of the document with definitions for | Karsha constructs a Lucene index of sections of the document | ||
ontology terms and chooses/recommends the Top K terms. We focus on the FIBO since it provides an excellent set of definitions. | (indexing the keywords within sentences). | ||
It uses Okapi cosine keyword based similarity to compare the sections (sentences) | |||
of the document with definitions for ontology terms and chooses/recommends the | |||
Top K terms. We focus on the FIBO since it provides an excellent set of definitions. |
Revision as of 05:52, 14 September 2012
Karsha is Free Open Source Software that is licensed by the Lanka Software Foundation
Karsha: Document Annotation and Semantic Search (DASS)
Karsha DASS is a repository of financial documents that have been annotated using terms from the Financial Industry Business Ontology (FIBO). Documents can also be annotated using other ontologies and/or thesauri. We are developing a sample repository comprising a collection of bond prospectus (corporate and municipal bonds) and their supplements.
Karsha constructs a Lucene index of sections of the document (indexing the keywords within sentences). It uses Okapi cosine keyword based similarity to compare the sections (sentences) of the document with definitions for ontology terms and chooses/recommends the Top K terms. We focus on the FIBO since it provides an excellent set of definitions.