Personal tools

Log in

Toaster:ADAPT Middle Layer Prototype

From Adapt

Jump to: navigation, search

Version 1 Goals

Automatic replication of packages
Create data nodes that support preservation functionality
Node are able to perform preservation actions unassisted (replication, checking, etc)
- package sets are grouped together to allow for nodes to validate data in the sets
Scale to 100 nodes
Focus on data nodes
internal prototype, document results in TR

Version 2 Goals

assuming preservation approach is feasable

review all v1 components, identify success/failure
- identify which components need re-written
Secure communication and look at authorization within a distributed archive
Modularize data transfer
pluggable low-level data store (tsm, srb, etc)
releaseable beta

External services to provide

Ingestion of prior-created packages according to pre-defined storage classes.
Query status of packages (acknowledge replication)
URL based access to packages and items contained within packages
Location service for packages (on manager)

Technology

Raw sockets for bulk transfer
web service for control information (axis)
- security layered in later w/ wss4j as appropriate
Replication occurs at the entire fileset level.
All tomcat/servlet based

Terminology

Work Plan

First Steps

Specify inter-node and external communication (wsdl docs)
Define Package Layout
Work out details on fileset based replication
Replication and auditing
Develop manager
Develop data nodes

Components

Manager

The manager is disposable,
- all information stored is useful for active preservation
- Manager is not authoritative for any package level infomration, merely a cache.
Create storage classes for placement of packages into fileset groups
Store location of filesets and fileset groups
Cache location of packages and provide lookup service
- receive updates of package locations from nodes
- receive fileset and fileset group information from fileset group masters

Data Nodes

Nodes are authoritative for package information.
provide data storage
- self-validate data stored in data store's
cache package metadata (ID, checksums, etc)
Access servlet (url) to packages and items in a package
Allow for external challange of a package's integrity
Activity history (purge, add, etc...) on pakcages
nodes contain filesets which belong to fileset groups.
- fileset groups actively manage packages stored within.

Package

Compound object made up of many files
- Packages are just storage containers, and are not aware of contextual information outside it's uuid.
files have minimal types attaches (metadata, manifest, data)
Simple format, similiar to tar (id,size,data,id,size,data....)
contains checksum, uuid.
package named with global uuid, but internal file id's are package unique only
64-bit size

Items that will not be addressed

Creating metadata and access interfaces outside of demos (PAWN will publish to these)
Creating of an AIP and organizing metadata/data (it's just data in a package)
Access control, calls will be secured later with ws-security as time permits
Versioning and format migration of data.
Creating peer-peer management functionality (reserach project)
cryptographically self-naming packages. Pakcages are identified by sha-256 digest in this prototype.

---++Comments %COMMENT{type="below"}%

Retrieved from "https://wiki.umiacs.umd.edu/adapt/index.php?title=Toaster:ADAPT_Middle_Layer_Prototype&oldid=1837"