Personal tools

Webarc:Berkeley DB Wrapper for Carryover DB

From Adapt

Revision as of 23:21, 9 November 2009 by Scsong (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

What It Does

Two wrapper classes to be used by C/C++ codes via JNI.

How To Build

In Eclipse, export 'mwbdbwrap' as a JAR.

  1. Right-click on 'mwbdbwrap' in Package Explorer, select 'export'.
  2. Select mwbdbwrap/src (should have been already selected).
  3. Put <your directory>/mwbdbwrap.jar in Export destination.
  4. Select 'Export generated class files and resources'
  5. Select 'Add directory entries' in options
  6. Click 'Finish'

Usage

Example Usage

/*==========================================================================
* Copyright (c) 2003-2004 University of Massachusetts.  All Rights Reserved.
*
* Use of the Lemur Toolkit for Language Modeling and Information Retrieval
* is subject to the terms of the software license set forth in the LICENSE
* file included with this software, and also available at
* http://www.lemurproject.org/license.html
*
*==========================================================================
*/


//
// BDBTaggedDocumentIterator
//
// 22 September 2009 -- scsong
//

#ifndef INDRI_TRECDOCUMENTITERATOR_BDB_HPP
#define INDRI_TRECDOCUMENTITERATOR_BDB_HPP

#include <string>
#include <fstream>
#include <jni.h>
#include "indri/DocumentIterator.hpp"
#include "indri/Buffer.hpp"
#include "indri/UnparsedDocument.hpp"


namespace indri
{
	namespace parse
	{

		class BDBTaggedDocumentIterator : public DocumentIterator {
		private:
			UnparsedDocument _document;
//			std::ifstream _mfin;
			FILE *_in;
			indri::utility::Buffer _buffer;
			indri::utility::Buffer _metaBuffer;
			std::string _lastMetadataTag;
			char* _fileName;
//			std::string _bdbName;

			bool _readLine( char*& beginLine, size_t& lineLength );

			const char* _startDocTag;
			const char* _endDocTag;
			const char* _endMetadataTag;
			JavaVM* _jvm;
			JNIEnv* _jniEnv;
			jobject _bdb;
			jclass _clsRevisionDatabase;
			jclass _clsRevisionData;
			jmethodID _mid_RevisionDatabase_getNext;
			jmethodID _mid_RevisionDatabase_construct;
			jmethodID _mid_RevisionDatabase_close;
			jfieldID _fid_RevisionData_date;
			jfieldID _fid_RevisionData_fileName;
			jfieldID _fid_RevisionData_offset;

			int _startDocTagLength;
			int _endDocTagLength;
			int _endMetadataTagLength;

			void _create_vm();

			class RevisionData {
			private:
				JNIEnv* _rdenv;
			public:
				long date;
				const char* filename;
				long offset;
				RevisionData(JNIEnv *env, jobject obj, jfieldID date, jfieldID fname, jfieldID offset);
				~RevisionData();
			};
			void _openDB(const char* dbName);
			void _closeDB();
			RevisionData* _getNextDocument();
			UnparsedDocument* _nextDocument();

		public:
			BDBTaggedDocumentIterator();
			~BDBTaggedDocumentIterator();

			void setTags( const char* startDoc, const char* endDoc, const char* endMetadata );

			void open( const std::string& filename );
			void close();

			UnparsedDocument* nextDocument();

		};
	}
}

#endif // INDRI_TRECDOCUMENTITERATOR_BDB_HPP

Output Files

Under the same directory under which Fresh DBs are located, new directories for Carryover DBs are generated. The new directories are named by concatenating '-co' at the end of Merge DB names. I.e. given a month, if Merge DB name is <month-003>, Carryover DB name will be given as <month-003-co>.

Notes

  • Make sure that the jar file (je-3.3.87.jar for example) for Java Berkeley DB is reachable (via CLASSPATH for example) when using this wrapper.

Source Codes

svn co http://narasvn.umiacs.umd.edu/repository/src/webarc/colstate