Personal tools

WarcManager: Difference between revisions

From Adapt

Jump to: navigation, search
No edit summary
 
No edit summary
Line 1: Line 1:
=Overview=
=Overview=
The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data.
The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data.
* Source Code Repository: https://subversion.umiacs.umd.edu/warc-utils/
* Nightly builds: http://adaptvm01.umiacs.umd.edu:8080/jenkins/job/Warc%20Manager%202/


=Installation=
=Installation=
The installation consists of a few simple steps.
1. Create database and setup permissions.
* mysql> create database webarc;
* mysql> grant all on webarc.* to webarc@localhost identified by 'PASSWORD';
* mysql> use webarc;
* mysql> source schema.sql;
2. Install tomcat and JDBC driver
* $ tar -xf apache-tomcat-6.0.32.tar.gz
* $ cp mysql-connector-java-5.0.7-bin.jar apache-tomcat-6.0.32/lib
3. Install and configure the Warc Manager
* $ cp warc-webapp-1.0-SNAPSHOT.war apache-tomcat-6.0.32/webapps/warc.war
* $ mkdir -p apache-tomcat-6.0.32/conf/Catalina/localhost
* $ cp context.xml apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml
* edit apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml and make sure the password under the resource line is the same as you specified above:
<pre>
  <Resource auth="Container" driverClassName="com.mysql.jdbc.Driver" maxActive="20" maxIdle="10" maxWait="-1" name="jdbc/warcdb" password="webarc" testOnBorrow="true" type="javax.sql.DataSource" url="jdbc:mysql://localhost/webarc" username="webarc" validationQuery="SELECT 1"/></pre>
==Detailed Installation==
=Indexing Web Content=


=Searching=
=Searching=

Revision as of 14:06, 27 April 2011

Overview

The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data.


Installation

The installation consists of a few simple steps.

1. Create database and setup permissions.

  • mysql> create database webarc;
  • mysql> grant all on webarc.* to webarc@localhost identified by 'PASSWORD';
  • mysql> use webarc;
  • mysql> source schema.sql;

2. Install tomcat and JDBC driver

  • $ tar -xf apache-tomcat-6.0.32.tar.gz
  • $ cp mysql-connector-java-5.0.7-bin.jar apache-tomcat-6.0.32/lib

3. Install and configure the Warc Manager

  • $ cp warc-webapp-1.0-SNAPSHOT.war apache-tomcat-6.0.32/webapps/warc.war
  • $ mkdir -p apache-tomcat-6.0.32/conf/Catalina/localhost
  • $ cp context.xml apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml
  • edit apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml and make sure the password under the resource line is the same as you specified above:
  <Resource auth="Container" driverClassName="com.mysql.jdbc.Driver" maxActive="20" maxIdle="10" maxWait="-1" name="jdbc/warcdb" password="webarc" testOnBorrow="true" type="javax.sql.DataSource" url="jdbc:mysql://localhost/webarc" username="webarc" validationQuery="SELECT 1"/>

Detailed Installation

Indexing Web Content

Searching