WarcManager: Difference between revisions
From Adapt
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
=Overview= | =Overview= | ||
The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data. | The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data. | ||
* Source Code Repository: https://subversion.umiacs.umd.edu/warc-utils/ | |||
* Nightly builds: http://adaptvm01.umiacs.umd.edu:8080/jenkins/job/Warc%20Manager%202/ | |||
=Installation= | =Installation= | ||
The installation consists of a few simple steps. | |||
1. Create database and setup permissions. | |||
* mysql> create database webarc; | |||
* mysql> grant all on webarc.* to webarc@localhost identified by 'PASSWORD'; | |||
* mysql> use webarc; | |||
* mysql> source schema.sql; | |||
2. Install tomcat and JDBC driver | |||
* $ tar -xf apache-tomcat-6.0.32.tar.gz | |||
* $ cp mysql-connector-java-5.0.7-bin.jar apache-tomcat-6.0.32/lib | |||
3. Install and configure the Warc Manager | |||
* $ cp warc-webapp-1.0-SNAPSHOT.war apache-tomcat-6.0.32/webapps/warc.war | |||
* $ mkdir -p apache-tomcat-6.0.32/conf/Catalina/localhost | |||
* $ cp context.xml apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml | |||
* edit apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml and make sure the password under the resource line is the same as you specified above: | |||
<pre> | |||
<Resource auth="Container" driverClassName="com.mysql.jdbc.Driver" maxActive="20" maxIdle="10" maxWait="-1" name="jdbc/warcdb" password="webarc" testOnBorrow="true" type="javax.sql.DataSource" url="jdbc:mysql://localhost/webarc" username="webarc" validationQuery="SELECT 1"/></pre> | |||
==Detailed Installation== | |||
=Indexing Web Content= | |||
=Searching= | =Searching= |
Revision as of 14:06, 27 April 2011
Overview
The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data.
- Source Code Repository: https://subversion.umiacs.umd.edu/warc-utils/
- Nightly builds: http://adaptvm01.umiacs.umd.edu:8080/jenkins/job/Warc%20Manager%202/
Installation
The installation consists of a few simple steps.
1. Create database and setup permissions.
- mysql> create database webarc;
- mysql> grant all on webarc.* to webarc@localhost identified by 'PASSWORD';
- mysql> use webarc;
- mysql> source schema.sql;
2. Install tomcat and JDBC driver
- $ tar -xf apache-tomcat-6.0.32.tar.gz
- $ cp mysql-connector-java-5.0.7-bin.jar apache-tomcat-6.0.32/lib
3. Install and configure the Warc Manager
- $ cp warc-webapp-1.0-SNAPSHOT.war apache-tomcat-6.0.32/webapps/warc.war
- $ mkdir -p apache-tomcat-6.0.32/conf/Catalina/localhost
- $ cp context.xml apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml
- edit apache-tomcat-6.0.32/conf/Catalina/localhost/warc.xml and make sure the password under the resource line is the same as you specified above:
<Resource auth="Container" driverClassName="com.mysql.jdbc.Driver" maxActive="20" maxIdle="10" maxWait="-1" name="jdbc/warcdb" password="webarc" testOnBorrow="true" type="javax.sql.DataSource" url="jdbc:mysql://localhost/webarc" username="webarc" validationQuery="SELECT 1"/>