Personal tools

WarcManager

From Adapt

Revision as of 14:06, 27 April 2011 by Toaster (talk | contribs)
Jump to: navigation, search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Overview

The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data.


Installation

The installation consists of a few simple steps.

1. Create database and setup permissions.

  • mysql> create database webarc;
  • mysql> grant all on webarc.* to webarc@localhost identified by 'PASSWORD';
  • mysql> use webarc;
  • mysql> source schema.sql;

2. Install tomcat and JDBC driver

  • $ tar -xf apache-tomcat-6.0.32.tar.gz
  • $ cp mysql-connector-java-5.0.7-bin.jar apache-tomcat-6.0.32/lib

3. Install and configure the Warc Manager

  • $ cp warc-webapp-1.0-SNAPSHOT.war apache-tomcat-6.0.32/webapps/warc.war
  • $ mkdir -p apache-tomcat-6.0.32/conf/Catalina/localhost
  • $ cp context.xml apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml
  • edit apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml and make sure the password under the resource line is the same as you specified above:
  <Resource auth="Container" driverClassName="com.mysql.jdbc.Driver" maxActive="20" maxIdle="10" maxWait="-1" name="jdbc/warcdb" password="webarc" testOnBorrow="true" type="javax.sql.DataSource" url="jdbc:mysql://localhost/webarc" username="webarc" validationQuery="SELECT 1"/>

Detailed Installation

Indexing Web Content

Searching