Webarc:Temporal Okapi Retrieval Method
From Adapt
What It Does
Implemented as a new retrieval method within the Lemur toolkit (see http://lemurproject.org/lemur/progexamples.php#exam-newmethod for details), this new method takes advantage of new statistics parameters added in the index. In particular, based on Okapi BM-25, it uses the new statistics parameters when scoring.
How To Use
The similar to how you would call the original Lemur/Indri toolkit API as described in http://lemurproject.org/lemur/progexamples.php. The main difference is to instantiate TempOkapiRetMethod as your RetrievalMethod object, and to set index-wide stats before call scoreCollection, and pass term-wide stats as a parameter when calling scoreCollection.
Example:
void runQuery(void) { // open index lemur::api::Index *idx = new lemur::index::LemurIndriIndex(); if (!(idx->open("Y:\data\wikipedia\lemur_index\monthly\month-003"))) { printf("Open Index Failed: %s\n", strIndex); fflush(stdout); return; } lemur::retrieval::ArrayAccumulator *accu = new lemur::retrieval::ArrayAccumulator(idx->docCount()); lemur::api::RetrievalMethod *rm = new lemur::retrieval::TempOkapiRetMethod(*idx, *accu); } // obtain statistics first (this step may seem redundant in this example as we are dealing with only one index, but is necessary as we will need to override local statistics found in the current index, in real experiments) int docCount = idx->docCount(); long colTermCount = idx->termCount(); int docAvgLen = idx->docLengthAvg(); int termCountUnique = >idx->termCountUnique(); map<lemur::api::TERMID_T, double> docCount_t; lemur::parse::StringQuery *qryterms = new lemur::parse::StringQuery(); // construct query and term statistics lemur::api::Term tt; tt.spelling("university"); qryterms->add("university"); lemur::api::TERMID_T ti = retMethod->getIndex()->term(tt.spelling()); docCount_t.insert(pair<lemur::api::TERMID_T, double>(ti, idx->docCount(ti))); lemur::api::Term tt; tt.spelling("maryland"); qryterms->add("maryland"); lemur::api::TERMID_T ti = retMethod->getIndex()->term(tt.spelling()); docCount_t.insert(pair<lemur::api::TERMID_T, double>(ti, idx->docCount(ti))); // set index-wide stats retMethod->setCollectionStats(docAverageLength, docCount, colTermCount, termCountUnique); // finally score docs against query lemur::api::IndexedRealVector results; retMethod->scoreCollection(*qr, docCount_t, results); // results are returned in 'results' return; }
Notes
N/A
Source Codes
svn co http://narasvn.umiacs.umd.edu/repository/src/webarc/lemur-4.10