Webarc:Temporal Okapi Retrieval Method: Difference between revisions
From Adapt
No edit summary |
No edit summary |
||
(2 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
== How To Use== | == How To Use== | ||
The similar to how you would call the original Lemur/Indri toolkit | The usage is similar to how you would call the original Lemur/Indri toolkit APIs as described in http://lemurproject.org/lemur/progexamples.php. The main differences are to instantiate TempOkapiRetMethod as your RetrievalMethod object, to set index-wide stats before calling scoreCollection, and to pass term-wide stats as a parameter when calling scoreCollection. | ||
Example: | Example: |
Latest revision as of 15:10, 11 November 2009
What It Does
Implemented as a new retrieval method within the Lemur toolkit (see http://lemurproject.org/lemur/progexamples.php#exam-newmethod for details), this new method takes advantage of new statistics parameters added in the index. In particular, based on Okapi BM-25, it uses the new statistics parameters when scoring.
How To Use
The usage is similar to how you would call the original Lemur/Indri toolkit APIs as described in http://lemurproject.org/lemur/progexamples.php. The main differences are to instantiate TempOkapiRetMethod as your RetrievalMethod object, to set index-wide stats before calling scoreCollection, and to pass term-wide stats as a parameter when calling scoreCollection.
Example:
void runQuery(void) { // open index lemur::api::Index *idx = new lemur::index::LemurIndriIndex(); if (!(idx->open("Y:\data\wikipedia\lemur_index\monthly\month-003"))) { printf("Open Index Failed: %s\n", strIndex); fflush(stdout); return; } lemur::retrieval::ArrayAccumulator *accu = new lemur::retrieval::ArrayAccumulator(idx->docCount()); lemur::api::RetrievalMethod *rm = new lemur::retrieval::TempOkapiRetMethod(*idx, *accu); } // obtain statistics first (this step may seem redundant in this example as we are dealing with only one index, but is necessary as we will need to override local statistics found in the current index, in real experiments) int docCount = idx->docCount(); long colTermCount = idx->termCount(); int docAvgLen = idx->docLengthAvg(); int termCountUnique = >idx->termCountUnique(); map<lemur::api::TERMID_T, double> docCount_t; lemur::parse::StringQuery *qryterms = new lemur::parse::StringQuery(); // construct query and term statistics lemur::api::Term tt; tt.spelling("university"); qryterms->add("university"); lemur::api::TERMID_T ti = retMethod->getIndex()->term(tt.spelling()); docCount_t.insert(pair<lemur::api::TERMID_T, double>(ti, idx->docCount(ti))); lemur::api::Term tt; tt.spelling("maryland"); qryterms->add("maryland"); lemur::api::TERMID_T ti = retMethod->getIndex()->term(tt.spelling()); docCount_t.insert(pair<lemur::api::TERMID_T, double>(ti, idx->docCount(ti))); // set index-wide stats retMethod->setCollectionStats(docAverageLength, docCount, colTermCount, termCountUnique); // finally score docs against query lemur::api::IndexedRealVector results; retMethod->scoreCollection(*qr, docCount_t, results); // results are returned in 'results' return; }
Notes
N/A
Source Codes
svn co http://narasvn.umiacs.umd.edu/repository/src/webarc/lemur-4.10