Hadoop

From Cbcb-private
Jump to: navigation, search

Deprecation Notice

The szhd Hadoop cluster was retired on 12/4/2013. There are plans for another CBCB Hadoop rollout in the future.

Legacy Details

There is a 57 compute nodes Hadoop in the szhd cluster. Additionaly there are 2 namenode/secondarynamenode/jobtracker nodes.

You can submit jobs from,

  • flicker01
  • flicker02

The hadoop binary can be run as,

 /usr/bin/hadoop

Hadoop Cluster Configuration

HDFS

  • 3 copies of your data is kept in HDFS
  • Permissions are enforced now in HDFS
  • Blocksize is set to 128MB
  • Trash is enabled

Map/Reduce

  • 2 Map slots per node
  • 2 Reduce slots per node

Please note that MapReduce temporary storage is shared in the same file system that HDFS has access to which provides the most flexibility but if you totally fill HDFS you can substantially constrain the workings of the cluster.

Hadoop Cluster Status

You can see the status of the cluster via the following two pages (please note you have to be on the UMIACS networks to see them)

If you are connecting from outside of UMD, you'll have to use a proxy to access the jobtracker:

1) Create a SOCKS proxy tunnel

$ ssh -D 1234 walnut.umiacs.umd.edu

2) Reconfigure your browser to use the tunnel

From within Firefox, open Preferences, Advanced Panel, Network Tab, then click Settings for "Configure how Firefox connects to the Internet". Then switch from "Direct connection" to "Manual Proxy Configuration", and set the SOCKS host to localhost, and port 1234.

Please note you can also use FoxyProxy Firefox plugin to get automatic proxy switching.

3) When you are done, reset the proxy to direct connection.