Long term preservation of digital objects requires systematic methodologies to address the following requirements.
- Each preserved digital object should encapsulate information regarding content, structure, context, provenance, and access to enable the long term maintenance and lifecycle management of the digital object.
- Efficient management of technology evolution, both hardware and software, and the appropriate handling of technology obsolescence (for example, format obsolescence).
- Efficient risk management and disaster recovery mechanisms either from technology degradation and failure, or natural disasters such as fires, floods, and hurricanes, or human-induced operational errors, or security failures and breaches.
- Efficient mechanisms to ensure the authenticity and integrity of content, context, and structure of archived information throughout the preservation period.
- Ability for information discovery and content access and presentation, with an automatic enforcement of authorization and IP rights, throughout the lifecycle of each object.
- Scalability in terms of ingestion rate, capacity and processing power to manage and preserve large scale heterogeneous collections of complex objects, and the speed at which users can discover and retrieve information.
Ability to accommodate possible changes over time in organizational structures and stewardships, relocation, repurposing, and reclassification.
Our technology approach is based on a number of premises. The first premise is to encapsulate properties of content, structure, context, presentation, and preservation within a digital object architecture, and enable the infrastructure to manage and preserve these objects. The digital object must contain the essential features that capture what is being preserved, and should include behavioral information about its lifecycle management and preservation.
The second premise of our approach is to separate the management of the digital objects into three levels of abstraction, resulting in a well-defined three-layered architecture. The data layer is responsible for managing the bits representing the digital object across storage systems (possibly evolving through both time and space), while the second layer deals with the semantics of the data and relationships between objects rather storage and bits. The third layer deals with the management of preservation processes and the basic security infrastructure.
Our vision of the overall software architecture necessary to address long term preservation and access is reflected below.
While our previous work has somewhat focused on the ingestion workflow and bit-level management and preservation, the focus of this proposal is on the more complex issues regarding preservation processes, search, and quantitative evaluations.