Webarc:Berkeley DB Wrapper for Carryover DB
From Adapt
What It Does
Two wrapper classes to be used by C/C++ codes via JNI.
How To Build
In Eclipse, export 'mwbdbwrap' as a JAR.
- Right-click on 'mwbdbwrap' in Package Explorer, select 'export'.
- Select mwbdbwrap/src (should have been already selected).
- Put <your directory>/mwbdbwrap.jar in Export destination.
- Select 'Export generated class files and resources'
- Select 'Add directory entries' in options
- Click 'Finish'
Usage
Example Usage
/*========================================================================== * Copyright (c) 2003-2004 University of Massachusetts. All Rights Reserved. * * Use of the Lemur Toolkit for Language Modeling and Information Retrieval * is subject to the terms of the software license set forth in the LICENSE * file included with this software, and also available at * http://www.lemurproject.org/license.html * *========================================================================== */ // // BDBTaggedDocumentIterator // // 22 September 2009 -- scsong // #ifndef INDRI_TRECDOCUMENTITERATOR_BDB_HPP #define INDRI_TRECDOCUMENTITERATOR_BDB_HPP #include <string> #include <fstream> #include <jni.h> #include "indri/DocumentIterator.hpp" #include "indri/Buffer.hpp" #include "indri/UnparsedDocument.hpp" namespace indri { namespace parse { class BDBTaggedDocumentIterator : public DocumentIterator { private: UnparsedDocument _document; // std::ifstream _mfin; FILE *_in; indri::utility::Buffer _buffer; indri::utility::Buffer _metaBuffer; std::string _lastMetadataTag; char* _fileName; // std::string _bdbName; bool _readLine( char*& beginLine, size_t& lineLength ); const char* _startDocTag; const char* _endDocTag; const char* _endMetadataTag; JavaVM* _jvm; JNIEnv* _jniEnv; jobject _bdb; jclass _clsRevisionDatabase; jclass _clsRevisionData; jmethodID _mid_RevisionDatabase_getNext; jmethodID _mid_RevisionDatabase_construct; jmethodID _mid_RevisionDatabase_close; jfieldID _fid_RevisionData_date; jfieldID _fid_RevisionData_fileName; jfieldID _fid_RevisionData_offset; int _startDocTagLength; int _endDocTagLength; int _endMetadataTagLength; void _create_vm(); class RevisionData { private: JNIEnv* _rdenv; public: long date; const char* filename; long offset; RevisionData(JNIEnv *env, jobject obj, jfieldID date, jfieldID fname, jfieldID offset); ~RevisionData(); }; void _openDB(const char* dbName); void _closeDB(); RevisionData* _getNextDocument(); UnparsedDocument* _nextDocument(); public: BDBTaggedDocumentIterator(); ~BDBTaggedDocumentIterator(); void setTags( const char* startDoc, const char* endDoc, const char* endMetadata ); void open( const std::string& filename ); void close(); UnparsedDocument* nextDocument(); }; } } #endif // INDRI_TRECDOCUMENTITERATOR_BDB_HPP
Output Files
Under the same directory under which Fresh DBs are located, new directories for Carryover DBs are generated. The new directories are named by concatenating '-co' at the end of Merge DB names. I.e. given a month, if Merge DB name is <month-003>, Carryover DB name will be given as <month-003-co>.
Notes
- Make sure that the jar file (je-3.3.87.jar for example) for Java Berkeley DB is reachable (via CLASSPATH for example) when using this wrapper.
Source Codes
svn co http://narasvn.umiacs.umd.edu/repository/src/webarc/colstate