Personal tools

Webarc:Temporal Okapi Retrieval Method: Difference between revisions

From Adapt

Jump to: navigation, search
No edit summary
No edit summary
Line 3: Line 3:


== How To Use==
== How To Use==
In a similar way to how you would call the original Lemur/Indri toolkit API as described in http://lemurproject.org/lemur/progexamples.php. The main difference is to instantiate TempOkapiRetMethod as your RetrievalMethod object, and to set index-wide stats before call scoreCollection, and pass term-wide stats as a parameter when calling scoreCollection.
In a similar way to how you would call the original Lemur/Indri toolkit APIs as described in http://lemurproject.org/lemur/progexamples.php. The main differences are to instantiate TempOkapiRetMethod as your RetrievalMethod object, to set index-wide stats before calling scoreCollection, and to pass term-wide stats as a parameter when calling scoreCollection.


Example:
Example:

Revision as of 22:42, 10 November 2009

What It Does

Implemented as a new retrieval method within the Lemur toolkit (see http://lemurproject.org/lemur/progexamples.php#exam-newmethod for details), this new method takes advantage of new statistics parameters added in the index. In particular, based on Okapi BM-25, it uses the new statistics parameters when scoring.

How To Use

In a similar way to how you would call the original Lemur/Indri toolkit APIs as described in http://lemurproject.org/lemur/progexamples.php. The main differences are to instantiate TempOkapiRetMethod as your RetrievalMethod object, to set index-wide stats before calling scoreCollection, and to pass term-wide stats as a parameter when calling scoreCollection.

Example:

void runQuery(void)
{
   // open index
   lemur::api::Index *idx = new lemur::index::LemurIndriIndex(); 
   if (!(idx->open("Y:\data\wikipedia\lemur_index\monthly\month-003"))) {
         printf("Open Index Failed: %s\n", strIndex);
         fflush(stdout);
         return;
   }
   lemur::retrieval::ArrayAccumulator *accu = new lemur::retrieval::ArrayAccumulator(idx->docCount());
   lemur::api::RetrievalMethod *rm = new lemur::retrieval::TempOkapiRetMethod(*idx, *accu);

   }


   // obtain statistics first (this step may seem redundant in this example as we are dealing with only one index, but is necessary as we will need to override local statistics found in the current index, in real experiments) 
   int docCount = idx->docCount();
   long colTermCount = idx->termCount();
   int docAvgLen = idx->docLengthAvg();
   int termCountUnique = >idx->termCountUnique();

   map<lemur::api::TERMID_T, double> docCount_t;
   lemur::parse::StringQuery *qryterms = new lemur::parse::StringQuery();

   // construct query and term statistics
   lemur::api::Term tt;
   tt.spelling("university");
   qryterms->add("university");
   lemur::api::TERMID_T ti = retMethod->getIndex()->term(tt.spelling());
   docCount_t.insert(pair<lemur::api::TERMID_T, double>(ti, idx->docCount(ti)));

   lemur::api::Term tt;
   tt.spelling("maryland");
   qryterms->add("maryland");
   lemur::api::TERMID_T ti = retMethod->getIndex()->term(tt.spelling());
   docCount_t.insert(pair<lemur::api::TERMID_T, double>(ti, idx->docCount(ti)));

   // set index-wide stats
   retMethod->setCollectionStats(docAverageLength, docCount, colTermCount, termCountUnique);

   // finally score docs against query 
   lemur::api::IndexedRealVector results;

   retMethod->scoreCollection(*qr, docCount_t, results);

   // results are returned in 'results'

   return;
}

Notes

N/A

Source Codes

svn co http://narasvn.umiacs.umd.edu/repository/src/webarc/lemur-4.10