Personal tools

Webarc:Main

From Adapt

Revision as of 22:28, 24 September 2008 by Scsong (talk | contribs)
Jump to: navigation, search

Overview

Web.jpg

In the era of digital information, the efforts to preserve valuable human activities have broadened to also include documents, images, audio and video in their digital form. An unprecedented amount of information encompassing almost every facet of human activity across the world exists in the form of zeros and ones, and is also growing at an extremely fast pace. Moreover, the digital representation is often the only form in which such information is recorded.

However, many digital objects face another set of challenges. They are dynamically updated at an unknown frequency, and often are interrelated to each other with temporal dependence. For these objects, the data collecting process in an archive needs to be able to determine whether or not an update occurred when it encounters a previously archived object. Otherwise, the archive can significantly waste its storage space by storing duplicate copies over and over again. Also, an archive needs to be aware of the interlinking relationships among the archived objects to better organize and manage the holdings, possibly accelerating the access performance. For example, in a system where small objects are packaged together in a container, and accesses are made on a container basis, placing heavily interlinked objects together in the same container will greatly improve the overall access speed.

Another important issue in the long-term preservation pertains to discovery and delivery of the preserved contents. In essence, the major purpose of preservation is to provide the preserved knowledge to the users who need it in the future. It is, thus, vital for any preservation system to provide an easy way to find and access the relevant contents. However, it is not a trivial matter to provide an effective, yet cost-effective, method to find the requested information mainly due to the large and ever-growing size of the preserved data. Preservation systems that solely rely on a relational database with well defined schemas may allow their users to find information more easily using well-structured queries. However, fitting every type of digital objects into a fixed set of schemas is often impossible. Clearly, we need a more general framework to enable effective information discovery and access to the archived contents.

In this research, we address three important problems in long-term preservation. First, we devise a methodology to efficiently store and index inter-related objects [1]. Second, we devise a methodology to temporally locate multi-version archived contents, as well as to detect duplicates [2]. Third, we devise a methodology to discover requested information from the preserved contents [3].

Our Approaches