Personal tools

Swap:Overview

From Adapt

Jump to: navigation, search

Goals

SWAP provides a simple mechanism to distribute data across multiple partitions or servers while providing simple mechanism to retrieve all data.

Below is a set of key design goals for SWAP:

  1. Data and collections must be easily recoverable in the absence of ANY software. (Operating System excluded)
  2. No centralized metadata server.
  3. Provide http access for data placement and retrieval.
  4. Provide a hierarchical namespace to data collections and files

Overview

SWAP organizes data into individual collections called file groups. A file group can be thought of as a directory with subdirectories and files. A file group stores its files in multiple directories called slices. These slices are placed on different SWAP servers. Data is spread between multiple slices based on the digest of its path name. To determine which slice contains any piece of data, create an MD5 digest from the full path to the file, then mod this hash by the number of slices in that file group. The result will determine which slice contains the file.

MD5_digest(file_path) % num_slices = slice of file

The following diagram shows how data is spread among different slices in SWAP:


After a collection has been distributed between all slices, access to any file is provided through a web server that runs on each SWAP server. When a client wishes to retrieve a file, it may contact any SWAP server to retrieve the file. If the client requests a file from a server that does not have it, a 302/redirect it returned pointing the web client at the correct server.

Client request:

  1. Client contacts server-a and requests /directory/to/file
  2. server-a computes the slice number for "/directory/to/file".
  3. If server-a is responsible for that slice, it will either return the requested data or a 404/not found.
  4. If the server is not responsible for the slice, it will return a 302/redirect to the client pointing at the correct server.
  5. The contacts the correct server which will return either the data or a 404/not found.

Using this method to determine the location of data on any node removes the need for a central catalog that tracks the location of every file name. Coordination is needed between nodes to ensure they share file group and slice metadata.

Swap-http.png