Personal tools

Swap: Difference between revisions

From Adapt

Jump to: navigation, search
No edit summary
No edit summary
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
SWAP, the Simple Web-Accessible Storage
SWAP, the Simple Web-Accessible Preservation system.


=Overview=
SWAP is a web-based distributed storage system for managing multiple collections. It interleaves data across many storage partitions on different nodes while providing a consistent namespace. Clients place and retrieve data using basic http operations. Even though data is distributed, clients may access any stored data through any swap node. It has also been designed to be completely disposable, allowing for data recovery in case the SWAP software is completely removed.


SWAP provides a simple mechanism to distribute data across multiple partitions or servers while providing simple mechanism to retrieve all data.


Below is a set of key design goals for SWAP:
* [[Swap:Overview|Overview]]
 
* [[Swap:Configuring SWAP|Configuring SWAP]]
# Data and collections must be easily recoverable in the absence of ANY software. (Operating System excluded)
* [[Swap:Data Access|Downloading and Uploading data]]
# No centralized metadata server.
* [[Swap:Command Line Access]] - accessing SWAP files using common unix utilities
# Provide http access for data placement and retrieval.
 
SWAP organizes data into individual collections called file groups. A file group can be thought of as a directory with subdirectories and files. A file group stores its files in multiple directories called slices. These slices are placed on different SWAP servers. Data is spread between multiple slices based on the digest of its path name. To determine which slice contains any piece of data, create an MD5 digest from the full path to the file, then mod this hash by the number of slices in that file group. The result will determine which slice contains the file.
 
MD5_digest(file_path) % num_slices = slice of file
 
The following diagram shows how data is spread among different slices in SWAP:
 
 
After a collection has been distributed between all slices, access to any file is provided through a simple web gateway. A client does not need to know which server any given file is on. If the client requests a file from a server that does not have it, a 302/redirect it returned pointing the web browser at the correct server.
 
Client request:
# Client contacts server-a and requests testdata/file1
# server-a computes the slice number for "testdata/file1".
# If server-a is responsible for that slice, it will either return the requested data or a 404/not found.
# If the server is not responsible for the slice, it will return a 302/redirect to the client pointing at the correct server.
# The contacts the correct server which will return either the data or a 404/not found.
 
Using this method to determine the location of data on any node removes the need for a central catalog that tracks the location of every file name. Coordination is needed between nodes to ensure they share file group and slice metadata.


=Technical Documentation=
=Technical Documentation=


[[Swap:Configuring SWAP|Configuring SWAP]]
* [[Swap:Node Details|Server Metadata]]
[[Swap:Bugs]]
* [https://scm.umiacs.umd.edu/redmine/adapt/projects/show/swap Bug Tracker]
[[Swap:Command Line Access]]
* [[Swap:Benchmarking|Benchmarks]]
* [[Swap:Custom Protocol]]
* [[Swap:Configuring Authentication]]

Latest revision as of 20:02, 2 June 2010

SWAP, the Simple Web-Accessible Preservation system.

SWAP is a web-based distributed storage system for managing multiple collections. It interleaves data across many storage partitions on different nodes while providing a consistent namespace. Clients place and retrieve data using basic http operations. Even though data is distributed, clients may access any stored data through any swap node. It has also been designed to be completely disposable, allowing for data recovery in case the SWAP software is completely removed.


Technical Documentation