Webarc:Berkeley DB Wrapper for Carryover DB: Difference between revisions
From Adapt
No edit summary |
(No difference)
|
Revision as of 23:21, 9 November 2009
What It Does
Two wrapper classes to be used by C/C++ codes via JNI.
How To Build
In Eclipse, export 'mwbdbwrap' as a JAR.
- Right-click on 'mwbdbwrap' in Package Explorer, select 'export'.
- Select mwbdbwrap/src (should have been already selected).
- Put <your directory>/mwbdbwrap.jar in Export destination.
- Select 'Export generated class files and resources'
- Select 'Add directory entries' in options
- Click 'Finish'
Usage
Example Usage
/*==========================================================================
* Copyright (c) 2003-2004 University of Massachusetts. All Rights Reserved.
*
* Use of the Lemur Toolkit for Language Modeling and Information Retrieval
* is subject to the terms of the software license set forth in the LICENSE
* file included with this software, and also available at
* http://www.lemurproject.org/license.html
*
*==========================================================================
*/
//
// BDBTaggedDocumentIterator
//
// 22 September 2009 -- scsong
//
#ifndef INDRI_TRECDOCUMENTITERATOR_BDB_HPP
#define INDRI_TRECDOCUMENTITERATOR_BDB_HPP
#include <string>
#include <fstream>
#include <jni.h>
#include "indri/DocumentIterator.hpp"
#include "indri/Buffer.hpp"
#include "indri/UnparsedDocument.hpp"
namespace indri
{
namespace parse
{
class BDBTaggedDocumentIterator : public DocumentIterator {
private:
UnparsedDocument _document;
// std::ifstream _mfin;
FILE *_in;
indri::utility::Buffer _buffer;
indri::utility::Buffer _metaBuffer;
std::string _lastMetadataTag;
char* _fileName;
// std::string _bdbName;
bool _readLine( char*& beginLine, size_t& lineLength );
const char* _startDocTag;
const char* _endDocTag;
const char* _endMetadataTag;
JavaVM* _jvm;
JNIEnv* _jniEnv;
jobject _bdb;
jclass _clsRevisionDatabase;
jclass _clsRevisionData;
jmethodID _mid_RevisionDatabase_getNext;
jmethodID _mid_RevisionDatabase_construct;
jmethodID _mid_RevisionDatabase_close;
jfieldID _fid_RevisionData_date;
jfieldID _fid_RevisionData_fileName;
jfieldID _fid_RevisionData_offset;
int _startDocTagLength;
int _endDocTagLength;
int _endMetadataTagLength;
void _create_vm();
class RevisionData {
private:
JNIEnv* _rdenv;
public:
long date;
const char* filename;
long offset;
RevisionData(JNIEnv *env, jobject obj, jfieldID date, jfieldID fname, jfieldID offset);
~RevisionData();
};
void _openDB(const char* dbName);
void _closeDB();
RevisionData* _getNextDocument();
UnparsedDocument* _nextDocument();
public:
BDBTaggedDocumentIterator();
~BDBTaggedDocumentIterator();
void setTags( const char* startDoc, const char* endDoc, const char* endMetadata );
void open( const std::string& filename );
void close();
UnparsedDocument* nextDocument();
};
}
}
#endif // INDRI_TRECDOCUMENTITERATOR_BDB_HPP
Output Files
Under the same directory under which Fresh DBs are located, new directories for Carryover DBs are generated. The new directories are named by concatenating '-co' at the end of Merge DB names. I.e. given a month, if Merge DB name is <month-003>, Carryover DB name will be given as <month-003-co>.
Notes
- Make sure that the jar file (je-3.3.87.jar for example) for Java Berkeley DB is reachable (via CLASSPATH for example) when using this wrapper.
Source Codes
svn co http://narasvn.umiacs.umd.edu/repository/src/webarc/colstate