Personal tools

Difference between revisions of "WarcManager"

From Adapt

Jump to: navigation, search
 
Line 1: Line 1:
 
=Overview=
 
=Overview=
 
The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data.
 
The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data.
 +
 +
 +
* Source Code Repository: https://subversion.umiacs.umd.edu/warc-utils/
 +
* Nightly builds: http://adaptvm01.umiacs.umd.edu:8080/jenkins/job/Warc%20Manager%202/
  
 
=Installation=
 
=Installation=
 +
 +
The installation consists of a few simple steps.
 +
 +
1. Create database and setup permissions.
 +
* mysql> create database webarc;
 +
* mysql> grant all on webarc.* to webarc@localhost identified by 'PASSWORD';
 +
* mysql> use webarc;
 +
* mysql> source schema.sql;
 +
2. Install tomcat and JDBC driver
 +
* $ tar -xf apache-tomcat-6.0.32.tar.gz
 +
* $ cp mysql-connector-java-5.0.7-bin.jar apache-tomcat-6.0.32/lib
 +
3. Install and configure the Warc Manager
 +
* $ cp warc-webapp-1.0-SNAPSHOT.war apache-tomcat-6.0.32/webapps/warc.war
 +
* $ mkdir -p apache-tomcat-6.0.32/conf/Catalina/localhost
 +
* $ cp context.xml apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml
 +
* edit apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml and make sure the password under the resource line is the same as you specified above:
 +
 +
<pre>
 +
  <Resource auth="Container" driverClassName="com.mysql.jdbc.Driver" maxActive="20" maxIdle="10" maxWait="-1" name="jdbc/warcdb" password="webarc" testOnBorrow="true" type="javax.sql.DataSource" url="jdbc:mysql://localhost/webarc" username="webarc" validationQuery="SELECT 1"/></pre>
 +
==Detailed Installation==
 +
 +
=Indexing Web Content=
  
 
=Searching=
 
=Searching=

Revision as of 14:06, 27 April 2011

Overview

The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data.


Installation

The installation consists of a few simple steps.

1. Create database and setup permissions.

  • mysql> create database webarc;
  • mysql> grant all on webarc.* to webarc@localhost identified by 'PASSWORD';
  • mysql> use webarc;
  • mysql> source schema.sql;

2. Install tomcat and JDBC driver

  • $ tar -xf apache-tomcat-6.0.32.tar.gz
  • $ cp mysql-connector-java-5.0.7-bin.jar apache-tomcat-6.0.32/lib

3. Install and configure the Warc Manager

  • $ cp warc-webapp-1.0-SNAPSHOT.war apache-tomcat-6.0.32/webapps/warc.war
  • $ mkdir -p apache-tomcat-6.0.32/conf/Catalina/localhost
  • $ cp context.xml apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml
  • edit apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml and make sure the password under the resource line is the same as you specified above:
  <Resource auth="Container" driverClassName="com.mysql.jdbc.Driver" maxActive="20" maxIdle="10" maxWait="-1" name="jdbc/warcdb" password="webarc" testOnBorrow="true" type="javax.sql.DataSource" url="jdbc:mysql://localhost/webarc" username="webarc" validationQuery="SELECT 1"/>

Detailed Installation

Indexing Web Content

Searching