This is the documentation for CDH 4.6.0.
Documentation for other versions is available at Cloudera Documentation.

Configuring MRv1 Security

If you are using YARN, skip this section and see Configuring YARN Security.

If you are using MRv1, do the following steps to configure, start, and test secure MRv1.

  1. Step 1: Configure Secure MRv1
  2. Step 2: Start up the JobTracker
  3. Step 3: Start up a TaskTracker
  4. Step 4: Try Running a Map/Reduce Job

Step 1: Configure Secure MRv1

Keep the following important information in mind when configuring secure MapReduce:

  • The properties for Job Tracker and Task Tracker must specify the mapred principal, as well as the path to the mapred keytab file.
  • The Kerberos principals for the Job Tracker and Task Tracker are configured in the mapred-site.xml file. The same mapred-site.xml file with both of these principals must be installed on every host machine in the cluster. That is, it is not sufficient to have the Job Tracker principal configured on the Job Tracker host machine only. This is because, for example, the TaskTracker must know the principal name of the JobTracker in order to securely register with the JobTracker. Kerberos authentication is bi-directional.
  • Do not use ${user.name} in the value of the mapred.local.dir or hadoop.log.dir properties in mapred-site.xml. Doing so can prevent tasks from launching on a secure cluster.
  • Make sure that each user who will be running MRv1 jobs exists on all cluster nodes (that is, on every node that hosts any MRv1 daemon).
  • Make sure the value specified for mapred.local.dir is identical in mapred-site.xml and taskcontroller.cfg. If the values are different, this error message is returned.
  • Make sure the value specified in taskcontroller.cfg for hadoop.log.dir is the same as what the Hadoop daemons are using, which is /var/log/hadoop-0.20-mapreduce by default and can be configured in mapred-site.xml. If the values are different, this error message is returned.

To configure secure MapReduce:

  1. Add the following properties to the mapred-site.xml file on every machine in the cluster:
    <!-- JobTracker security configs -->
    <property>
      <name>mapreduce.jobtracker.kerberos.principal</name>
      <value>mapred/_HOST@YOUR-REALM.COM</value>
    </property>
    <property>
      <name>mapreduce.jobtracker.keytab.file</name>
      <value>/etc/hadoop/conf/mapred.keytab</value> <!-- path to the MapReduce keytab -->
    </property>
    
    <!-- TaskTracker security configs -->
    <property>
      <name>mapreduce.tasktracker.kerberos.principal</name>
      <value>mapred/_HOST@YOUR-REALM.COM</value>
    </property>
    <property>
      <name>mapreduce.tasktracker.keytab.file</name>
      <value>/etc/hadoop/conf/mapred.keytab</value> <!-- path to the MapReduce keytab -->
    </property>
    
    <!-- TaskController settings -->
    <property>
      <name>mapred.task.tracker.task-controller</name>
      <value>org.apache.hadoop.mapred.LinuxTaskController</value>
    </property>
    <property>
      <name>mapreduce.tasktracker.group</name>
      <value>mapred</value>
    </property>
  2. Create a file called taskcontroller.cfg that contains the following information:
    hadoop.log.dir=<Path to Hadoop log directory. Should be same value used to start the TaskTracker. This is required to set proper permissions on the log files so that they can be written to by the user's tasks and read by the TaskTracker for serving on the web UI.>
    mapreduce.tasktracker.group=mapred
    banned.users=mapred,hdfs,bin
    min.user.id=1000 
      Note:

    In the taskcontroller.cfg file, the default setting for the banned.users property is mapred, hdfs, and bin to prevent jobs from being submitted via those user accounts. The default setting for the min.user.id property is 1000 to prevent jobs from being submitted with a user ID less than 1000, which are conventionally Unix super users. Note that some operating systems such as CentOS 5 use a default value of 500 and above for user IDs, not 1000. If this is the case on your system, change the default setting for the min.user.id property to 500. If there are user accounts on your cluster that have a user ID less than the value specified for the min.user.id property, the TaskTracker returns an error code of 255.

  3. The path to the taskcontroller.cfg file is determined relative to the location of the task-controller binary. Specifically, the path is <path of task-controller binary>/../../conf/taskcontroller.cfg. If you installed the CDH4 package, this path will always correspond to /etc/hadoop/conf/taskcontroller.cfg.
  Note:

For more information about the task-controller program, see Appendix B - Information about Other Hadoop Security Programs.

  Important:

The same mapred-site.xml file and the same hdfs-site.xml file must both be installed on every host machine in the cluster so that the NameNode, Secondary NameNode, DataNode, Job Tracker and Task Tracker can all connect securely with each other.

Step 2: Start up the JobTracker

You are now ready to start the JobTracker.

If you're using the /etc/init.d/hadoop-0.20-mapreduce-jobtracker script, then you can use the service command to run it now:

$ sudo service hadoop-0.20-mapreduce-jobtracker start

You can verify that the JobTracker is working properly by opening a web browser to http://machine:50030/ where machine is the name of the machine where the JobTracker is running.

Step 3: Start up a TaskTracker

You are now ready to start a TaskTracker.

If you're using the /etc/init.d/hadoop-0.20-mapreduce-tasktracker script, then you can use the service command to run it now:

$ sudo service hadoop-0.20-mapreduce-tasktracker start

Step 4: Try Running a Map/Reduce Job

You should now be able to run Map/Reduce jobs. To confirm, try launching a sleep or a pi job from the provided Hadoop examples (/usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar). Note that you will need Kerberos credentials to do so.

  Important:

Remember that the user who launches the job must exist on every node.