Toaster:ADAPT Middle Layer Prototype
From Adapt
Version 1 Goals
- Automatic replication of packages
- Create data nodes that support preservation functionality
- Node are able to perform preservation actions unassisted (replication, checking, etc)
- package sets are grouped together to allow for nodes to validate data in the sets
- Scale to 100 nodes
- Focus on data nodes
- internal prototype, document results in TR
Version 2 Goals
- assuming preservation approach is feasable
- review all v1 components, identify success/failure
- identify which components need re-written
- Secure communication and look at authorization within a distributed archive
- Modularize data transfer
- pluggable low-level data store (tsm, srb, etc)
- releaseable beta
External services to provide
- Ingestion of prior-created packages according to pre-defined storage classes.
- Query status of packages (acknowledge replication)
- URL based access to packages and items contained within packages
- Location service for packages (on manager)
Technology
- Raw sockets for bulk transfer
- web service for control information (axis)
- security layered in later w/ wss4j as appropriate
- Replication occurs at the entire fileset level.
- All tomcat/servlet based
Work Plan
First Steps
- Specify inter-node and external communication (wsdl docs)
- Define Package Layout
- Work out details on fileset based replication
- Replication and auditing
- Develop manager
- Develop data nodes
Components
Manager
- The manager is disposable,
- all information stored is useful for active preservation
- Manager is not authoritative for any package level infomration, merely a cache.
- Create storage classes for placement of packages into fileset groups
- Store location of filesets and fileset groups
- Cache location of packages and provide lookup service
- receive updates of package locations from nodes
- receive fileset and fileset group information from fileset group masters
Data Nodes
- Nodes are authoritative for package information.
- provide data storage
- self-validate data stored in data store's
- cache package metadata (ID, checksums, etc)
- Access servlet (url) to packages and items in a package
- Allow for external challange of a package's integrity
- Activity history (purge, add, etc...) on pakcages
- nodes contain filesets which belong to fileset groups.
- fileset groups actively manage packages stored within.
Package
- Compound object made up of many files
- Packages are just storage containers, and are not aware of contextual information outside it's uuid.
- files have minimal types attaches (metadata, manifest, data)
- Simple format, similiar to tar (id,size,data,id,size,data....)
- contains checksum, uuid.
- package named with global uuid, but internal file id's are package unique only
- 64-bit size
Items that will not be addressed
- Creating metadata and access interfaces outside of demos (PAWN will publish to these)
- Creating of an AIP and organizing metadata/data (it's just data in a package)
- Access control, calls will be secured later with ws-security as time permits
- Versioning and format migration of data.
- Creating peer-peer management functionality (reserach project)
- cryptographically self-naming packages. Pakcages are identified by sha-256 digest in this prototype.
---++Comments %COMMENT{type="below"}%