A few weeks ago we ran a Hadoop hackathon. ApacheCon participants were invited to use our 10-node Hadoop cluster to explore Hadoop and play with some datasets that we had loaded on in advance. One challenge we had to face was, how do we do this in a secure way? Hadoop does not offer much in the way of security. Hadoop provides a rudimentary file permission system on its distributed filesystem, HDFS, but does not verify the appropriateness of the username you are using. (Whatever username you use to start your local Hadoop client process is used as your HDFS username; this account does not necessarily need to exist on the machines which host the HDFS NameNode or DataNodes.)
Even more problematically, anyone who can connect to the JobTracker can submit arbitrary code to run with the authority of the account used to start the Hadoop TaskTrackers on each node.
While there is not a perfect solution to multitenancy in a Hadoop environment, by using a proxying gateway, you can at least control which users have access to your cluster. The rest of this post describes how to set up such a gateway configuration.
Hadoop was created by