KarshaAnnSS: Difference between revisions

From ngfci
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Karsha is Free Open Source Software that is licensed by the Lanka Software Foundation=
==Karsha is Free Open Source Software that is licensed by the Lanka Software Foundation==


==Karsha: Document Annotation and Semantic Search  (DASS)==
==Karsha: Document Annotation and Semantic Search  (DASS)==


  Karsha DASS is a repository of financial documents that have been annotated using terms from the Financial   Industry Business Ontology (FIBO).  Documents can also be annotated using other ontologies and/or thesauri.
  Karsha DASS is a repository of financial documents that have been annotated using  
We are developing a sample repository comprising a collection of bond prospectus (corporate and municipal bonds) and their supplements.
terms from the Financial Industry Business Ontology (FIBO).
  Documents can also be annotated using other ontologies and/or thesauri.
We are developing a sample repository comprising a collection of bond prospectus  
(corporate and municipal bonds) and their supplements.


  Karsha constructs a Lucene index of sections of the document (indexing the keywords within sentences). It uses Okapi cosine keyword based similarity to compare the sections (sentences) of the document with definitions for  
  Karsha constructs a Lucene index of sections of the document  
ontology terms and chooses/recommends the Top K terms. We focus on the FIBO since it provides an excellent set of definitions.
(indexing the keywords within sentences).  
It uses Okapi cosine keyword based similarity to compare the sections (sentences)  
of the document with definitions for ontology terms and chooses/recommends the  
Top K terms. We focus on the FIBO since it provides an excellent set of definitions.
 
Potential use cases include the following:
- Rank and retrieve documents using FIBO search terms.
- Cluster documents to better understand the contents of a repository.
- Compare pairs of documents for similarities as well as gaps or dis-similarity.
- Karsha can be extended to include sentence understanding so that one can answer
  more refined questions such as 'which of these instruments is likely to be
  impacted by a fluctuation of XYZ'?

Latest revision as of 05:53, 14 September 2012

Karsha is Free Open Source Software that is licensed by the Lanka Software Foundation

Karsha: Document Annotation and Semantic Search (DASS)

Karsha DASS is a repository of financial documents that have been annotated using 
terms from the Financial  Industry Business Ontology (FIBO).  
Documents can also be annotated using other ontologies and/or thesauri.
We are developing a sample repository comprising a collection of bond prospectus 
(corporate and municipal bonds) and their supplements.
Karsha constructs a Lucene index of sections of the document 
(indexing the keywords within sentences). 
It uses Okapi cosine keyword based similarity to compare the sections (sentences) 
of the document with definitions for ontology terms and chooses/recommends the 
Top K terms. We focus on the FIBO since it provides an excellent set of definitions.
Potential use cases include the following:
- Rank and retrieve documents using FIBO search terms.
- Cluster documents to better understand the contents of a repository.
- Compare pairs of documents for similarities as well as gaps or dis-similarity.
- Karsha can be extended to include sentence understanding so that one can answer 
  more refined questions such as 'which of these instruments is likely to be 
  impacted by a fluctuation of XYZ'?