Swap:Data Access: Difference between revisions
From Adapt
No edit summary |
No edit summary |
||
Line 14: | Line 14: | ||
The url would be http://server1.university.edu:8080/get/processdata/webcrawls/2004crawl/oct2004/35/crawlfile.arc.gz | The url would be http://server1.university.edu:8080/get/processdata/webcrawls/2004crawl/oct2004/35/crawlfile.arc.gz | ||
==Downloading | ==Downloading Files== | ||
'''Whole File''' | '''Whole File''' | ||
Line 20: | Line 20: | ||
'''Partial File''' | '''Partial File''' | ||
==Download Arc Files=== | |||
This function assumes any files that you are pulling is a file containing concatenated arc entries, where each arc entry has been gzip'd. | |||
* offset - offset to start reading within the compressed file | |||
* contentonly - (optional) set to true to strip out arc http header information (default: false) | |||
http://server1.university.edu:8080/arc/processdata/webcrawls/2004crawl/oct2004/35/crawlfile.arc.gz?offset=6789&contentonly=true |
Latest revision as of 17:08, 19 May 2010
HTTP Access
Data can be uploaded and downloaded using http. Data is accessed using a REST-ish mechanism. To construct a URL, you will need to know three things
- The address of any swap server
- The base path of the file group containing the data you want to pull
- The path within the file group to your file
After you know these items, the url is constructed as follows:
http://[server[:port]]/[function]/[group_path]/[file_path]?[function_options]
Let's assume you have a file on server1.university.edu running on port 8080 (the default), the file is in a file group with prefix processdata/webcrawls/2004crawl and the file is located in the directory /oct2004/35/crawlfile.arc.gz
The url would be http://server1.university.edu:8080/get/processdata/webcrawls/2004crawl/oct2004/35/crawlfile.arc.gz
Downloading Files
Whole File
Partial File
Download Arc Files=
This function assumes any files that you are pulling is a file containing concatenated arc entries, where each arc entry has been gzip'd.
- offset - offset to start reading within the compressed file
- contentonly - (optional) set to true to strip out arc http header information (default: false)