Personal tools

WarcManager

From Adapt

Revision as of 14:10, 27 April 2011 by Toaster (talk | contribs)
Jump to: navigation, search

Overview

The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data.


Installation

The installation consists of a few simple steps.

1. Create database and setup permissions.

  • mysql> create database webarc;
  • mysql> grant all on webarc.* to webarc@localhost identified by 'PASSWORD';
  • mysql> use webarc;
  • mysql> source schema.sql;

2. Install tomcat and JDBC driver

  • $ tar -xf apache-tomcat-6.0.32.tar.gz
  • $ cp mysql-connector-java-5.0.7-bin.jar apache-tomcat-6.0.32/lib

3. Install and configure the Warc Manager

  • $ cp warc-webapp-1.0-SNAPSHOT.war apache-tomcat-6.0.32/webapps/warc.war
  • $ mkdir -p apache-tomcat-6.0.32/conf/Catalina/localhost
  • $ cp context.xml apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml
  • edit apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml and make sure the password under the resource line is the same as you specified above:


 <Resource auth="Container" driverClassName="com.mysql.jdbc.Driver" maxActive="20" 
maxIdle="10" maxWait="-1" name="jdbc/warcdb" password="PASSWORD" testOnBorrow="true"
type="javax.sql.DataSource" url="jdbc:mysql://localhost/webarc" username="webarc"
validationQuery="SELECT 1"/>

Detailed Installation

Indexing Web Content

Searching