Personal tools

Toaster:DigArchPackage

From Adapt

Jump to: navigation, search

Package Format

Each packages starts with a 128 byte header that contains the URI http://umiacs.umd.edu/adapt/package/1.0 in UTF-8 padded with null bytes. This header is used to identify the file format which can be easily recognized and understood by a human reader.

The header is followed by a series of data blocks. Each data block starts with a block header in the following format:

Bytes Field Description
4 Magic Number Always the bytes 0x39 0xc4 0x5a 0x57 (the first four bytes in the MD-5 sum of the word ADAPT) and used to verify that a random seek into the file properly points to a block header. Can also be used to try and find headers when salvaging a corrupt file.
4 Identifier Unique identifier for this data block as an unsigned integer. Data block identifiers are only unique within a package and are sequential numbers with the first block being one.
4 Length Length of the data in bytes as an unsigned integer (does not include the CRC-32 checksum of the data block)
1 Type Type of the data within the block used to easily extract sections of the package. The following values are used: =0x01= - Manifest, =0x02= - Metadata, =0x03= - Data, =0xff= - EOF. The EOF is a special block and explained below.
1 Header Check CRC-8 checksum of this header used to quickly verify that the header values are correct.

The header is followed by the binary data and a CRC-32 checksum of the data which can be used to quickly verify the contents of the data on a read.

All manifiest entries must appear first in the file followed by all metadata entries followed by the data entries. This allows for easy extraction of all entries by type.

The last data block must be followed by a block header with the _Identifier_ value set to zero, the _Length_ set to 32, and the _Type_ value of EOF. The header is immediately followed by the SHA-256 checksum of the entire package and then the actual end of file.

Digarch-block.png

Package Index Format

Each package index starts with a 128 byte header that contains the URI http://umiacs.umd.edu/adapt/package-index/1.0 in UTF-8 padded with null bytes.

The next three 64-bit numbers represent the package offsets to the block headers for the first manifest entry, the first metadata entry, and the first data entry in that order.

The next 64-bit number represents the package offest for the first file and is followed by the offset for each subsquent file in the package. The package offset for the file with an identifier of i can be found at the index offset of 128 + 24 + ((i - 1) * 8).

After all file offsets is a 64-bit value of null bytes followed by the SHA-256 checksum of the index file.

Digarch-block2.png