This is the documentation for CDH 4.7.1.
Documentation for other versions is available at Cloudera Documentation.

Appendix B - Information about Other Hadoop Security Programs

This section contains information about the following programs:

  • MRv1 and YARN: A binary called jsvc that is in the bigtop-jsvc package and installed in either /usr/lib/bigtop-utils/jsvc or /usr/libexec/bigtop-utils/jsvc depending on the particular Linux flavor. See MRv1 and YARN: The jsvc Program.
  • MRv1 Only: A setuid binary called task-controller that is in the hadoop-0.20-mapreduce package and installed in either /usr/lib/hadoop-0.20-mapreduce/sbin/Linux-amd64-64/task-controller or /usr/lib/hadoop-0.20-mapreduce/sbin/Linux-i386-32/task-controller. See MRv1 Only: The Linux TaskController Program.
  • YARN only: A setuid binary called container-executor that is in the hadoop-yarn package and installed in /usr/lib/hadoop-yarn/bin/container-executor. See YARN Only: The Linux Container Executor Program.

MRv1 and YARN: The jsvc Program

The jsvc (more info) program is used to start the DataNode listening on low port numbers. Its entry point is the SecureDataNodeStarter class, which implements the Daemon interface that jsvc expects. jsvc is run as root, and calls the SecureDataNodeStarter.init(...) method while running as root. Once the SecureDataNodeStarter class has finished initializing, jsvc sets the effective UID to be the hdfs user, and then calls SecureDataNodeStarter.start(...). SecureDataNodeStarter then calls the regular DataNode entry point, passing in a reference to the privileged resources it previously obtained.

MRv1 Only: The Linux TaskController Program

The task-controller program, which is used on MRv1 only, allows the TaskTracker to run tasks under the Unix account of the user who submitted the job in the first place. It is a setuid binary that must have a very specific set of permissions and ownership in order to function correctly. In particular, it must:

  1. Be owned by root
  2. Be owned by a group that contains only the user running the MapReduce daemons
  3. Be setuid
  4. Be group readable and executable

This corresponds to the ownership root:mapred and the permissions 4754.

Here is the output of ls on a correctly-configured Task-controller:

-rwsr-xr-- 1 root mapred 30888 Mar 18 13:03 task-controller

The TaskTracker will check for this configuration on start up, and fail to start if the Task-controller is not configured correctly.

YARN Only: The Linux Container Executor Program

The container-executor program, which is used on YARN only and supported on GNU/Linux only, runs the containers as the user who submitted the application. It requires all user accounts to be created on the cluster nodes where the containers are launched. It uses a setuid executable that is included in the Hadoop distribution. The NodeManager uses this executable to launch and kill containers. The setuid executable switches to the user who has submitted the application and launches or kills the containers. For maximum security, this executor sets up restricted permissions and user/group ownership of local files and directories used by the containers such as the shared objects, jars, intermediate files, log files, and so on. As a result, only the application owner and NodeManager can access any of the local files/directories including those localized as part of the distributed cache.

The container-executor program must have a very specific set of permissions and ownership in order to function correctly. In particular, it must:

  1. Be owned by root
  2. Be owned by a group that contains only the user running the YARN daemons
  3. Be setuid
  4. Be group readable and executable

This corresponds to the ownership root:yarn and the permissions 6050.

---Sr-s--- 1 root yarn 91886 2012-04-01 19:54 container-executor