Personal tools

Toaster:ADAPT Middle Layer Prototype

From Adapt

Jump to: navigation, search

Version 1 Goals

  • Automatic replication of packages
  • Create data nodes that support preservation functionality
  • Node are able to perform preservation actions unassisted (replication, checking, etc)
    • package sets are grouped together to allow for nodes to validate data in the sets
  • Scale to 100 nodes
  • Focus on data nodes
  • internal prototype, document results in TR

Version 2 Goals

assuming preservation approach is feasable
  • review all v1 components, identify success/failure
    • identify which components need re-written
  • Secure communication and look at authorization within a distributed archive
  • Modularize data transfer
  • pluggable low-level data store (tsm, srb, etc)
  • releaseable beta

External services to provide

  • Ingestion of prior-created packages according to pre-defined storage classes.
  • Query status of packages (acknowledge replication)
  • URL based access to packages and items contained within packages
  • Location service for packages (on manager)

Technology

  • Raw sockets for bulk transfer
  • web service for control information (axis)
    • security layered in later w/ wss4j as appropriate
  • Replication occurs at the entire fileset level.
  • All tomcat/servlet based

Work Plan

First Steps

Components

Manager

  • The manager is disposable,
    • all information stored is useful for active preservation
    • Manager is not authoritative for any package level infomration, merely a cache.
  • Create storage classes for placement of packages into fileset groups
  • Store location of filesets and fileset groups
  • Cache location of packages and provide lookup service
    • receive updates of package locations from nodes
    • receive fileset and fileset group information from fileset group masters

Data Nodes

  • Nodes are authoritative for package information.
  • provide data storage
    • self-validate data stored in data store's
  • cache package metadata (ID, checksums, etc)
  • Access servlet (url) to packages and items in a package
  • Allow for external challange of a package's integrity
  • Activity history (purge, add, etc...) on pakcages
  • nodes contain filesets which belong to fileset groups.
    • fileset groups actively manage packages stored within.

Package

  • Compound object made up of many files
    • Packages are just storage containers, and are not aware of contextual information outside it's uuid.
  • files have minimal types attaches (metadata, manifest, data)
  • Simple format, similiar to tar (id,size,data,id,size,data....)
  • contains checksum, uuid.
  • package named with global uuid, but internal file id's are package unique only
  • 64-bit size

Items that will not be addressed

  • Creating metadata and access interfaces outside of demos (PAWN will publish to these)
  • Creating of an AIP and organizing metadata/data (it's just data in a package)
  • Access control, calls will be secured later with ws-security as time permits
  • Versioning and format migration of data.
  • Creating peer-peer management functionality (reserach project)
  • cryptographically self-naming packages. Pakcages are identified by sha-256 digest in this prototype.

---++Comments %COMMENT{type="below"}%