Personal tools

Swap: Difference between revisions

From Adapt

Jump to: navigation, search
No edit summary
No edit summary
Line 1: Line 1:
SWAP, the Simple Web-Accessible Storage
SWAP, the Simple Web-Accessible Storage


=Overview=
=Goals=


SWAP provides a simple mechanism to distribute data across multiple partitions or servers while providing simple mechanism to retrieve all data.  
SWAP provides a simple mechanism to distribute data across multiple partitions or servers while providing simple mechanism to retrieve all data.  
Line 10: Line 10:
# No centralized metadata server.
# No centralized metadata server.
# Provide http access for data placement and retrieval.
# Provide http access for data placement and retrieval.
=Overview=


SWAP organizes data into individual collections called file groups. A file group can be thought of as a directory with subdirectories and files. A file group stores its files in multiple directories called slices. These slices are placed on different SWAP servers. Data is spread between multiple slices based on the digest of its path name. To determine which slice contains any piece of data, create an MD5 digest from the full path to the file, then mod this hash by the number of slices in that file group. The result will determine which slice contains the file.  
SWAP organizes data into individual collections called file groups. A file group can be thought of as a directory with subdirectories and files. A file group stores its files in multiple directories called slices. These slices are placed on different SWAP servers. Data is spread between multiple slices based on the digest of its path name. To determine which slice contains any piece of data, create an MD5 digest from the full path to the file, then mod this hash by the number of slices in that file group. The result will determine which slice contains the file.  
Line 18: Line 20:




After a collection has been distributed between all slices, access to any file is provided through a simple web gateway. A client does not need to know which server any given file is on. If the client requests a file from a server that does not have it, a 302/redirect it returned pointing the web browser at the correct server.  
After a collection has been distributed between all slices, access to any file is provided through a web server that runs on each SWAP server. When a client wishes to retrieve a file, it may contact any SWAP server to retrieve the file. If the client requests a file from a server that does not have it, a 302/redirect it returned pointing the web client at the correct server.  


Client request:  
Client request:  

Revision as of 04:34, 4 February 2010

SWAP, the Simple Web-Accessible Storage

Goals

SWAP provides a simple mechanism to distribute data across multiple partitions or servers while providing simple mechanism to retrieve all data.

Below is a set of key design goals for SWAP:

  1. Data and collections must be easily recoverable in the absence of ANY software. (Operating System excluded)
  2. No centralized metadata server.
  3. Provide http access for data placement and retrieval.

Overview

SWAP organizes data into individual collections called file groups. A file group can be thought of as a directory with subdirectories and files. A file group stores its files in multiple directories called slices. These slices are placed on different SWAP servers. Data is spread between multiple slices based on the digest of its path name. To determine which slice contains any piece of data, create an MD5 digest from the full path to the file, then mod this hash by the number of slices in that file group. The result will determine which slice contains the file.

MD5_digest(file_path) % num_slices = slice of file

The following diagram shows how data is spread among different slices in SWAP:


After a collection has been distributed between all slices, access to any file is provided through a web server that runs on each SWAP server. When a client wishes to retrieve a file, it may contact any SWAP server to retrieve the file. If the client requests a file from a server that does not have it, a 302/redirect it returned pointing the web client at the correct server.

Client request:

  1. Client contacts server-a and requests testdata/file1
  2. server-a computes the slice number for "testdata/file1".
  3. If server-a is responsible for that slice, it will either return the requested data or a 404/not found.
  4. If the server is not responsible for the slice, it will return a 302/redirect to the client pointing at the correct server.
  5. The contacts the correct server which will return either the data or a 404/not found.

Using this method to determine the location of data on any node removes the need for a central catalog that tracks the location of every file name. Coordination is needed between nodes to ensure they share file group and slice metadata.

Technical Documentation

Configuring SWAP Swap:Bugs Swap:Command Line Access