Configuring YARN Security

Step 1: Configure Secure YARN

Before you start:

  • The Kerberos principals for the ResourceManager and NodeManager are configured in the yarn-site.xml file. The same yarn-site.xml file must be installed on every host machine in the cluster.
  • Make sure that each user who will be running YARN jobs exists on all cluster nodes (that is, on every node that hosts any YARN daemon).

To configure secure YARN:

  1. Add the following properties to the yarn-site.xml file on every machine in the cluster:
    <!-- ResourceManager security configs -->
    <property>
      <name>yarn.resourcemanager.keytab</name>
      <value>/etc/hadoop/conf/yarn.keytab</value>
    <!-- path to the YARN keytab -->
    </property>
    <property>
      <name>yarn.resourcemanager.principal</name>
    
      <value>yarn/_HOST@YOUR-REALM.COM</value>
    </property>
    
    <!-- NodeManager security configs -->
    <property>
      <name>yarn.nodemanager.keytab</name>
      <value>/etc/hadoop/conf/yarn.keytab</value>
    <!-- path to the YARN keytab -->
    </property>
    <property>
      <name>yarn.nodemanager.principal</name>
    
      <value>yarn/_HOST@YOUR-REALM.COM</value>
    </property>
    
    <property>
      <name>yarn.nodemanager.container-executor.class</name>
    
      <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
    </property>
    
    <property>
      <name>yarn.nodemanager.linux-container-executor.group</name>
      <value>yarn</value>
    </property>
    
    <!-- To enable SSL -->
    <property>
      <name>yarn.http.policy</name>
      <value>HTTPS_ONLY</value>
    </property>
  2. Add the following properties to the mapred-site.xml file on every machine in the cluster:
    <!-- MapReduce Job History Server security configs -->
    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>host:port</value> <!-- Host and port of the MapReduce Job History Server; default port is 10020  -->
    </property>
    <property>
      <name>mapreduce.jobhistory.keytab</name>
      <value>/etc/hadoop/conf/mapred.keytab</value>
    <!-- path to the MAPRED keytab for the Job History Server -->
    </property>
    
    <property>
      <name>mapreduce.jobhistory.principal</name>
    
      <value>mapred/_HOST@YOUR-REALM.COM</value>
    </property>
    
    <!-- To enable SSL -->
    
    <property>
      <name>mapreduce.jobhistory.http.policy</name>
      <value>HTTPS_ONLY</value>
    </property>
  3. Create a file called container-executor.cfg for the Linux Container Executor program that contains the following information:
    yarn.nodemanager.local-dirs=<comma-separated list of paths to local NodeManager directories. Should be same values specified in yarn-site.xml. Required to validate paths passed to container-executor in order.>
    yarn.nodemanager.linux-container-executor.group=yarn
    yarn.nodemanager.log-dirs=<comma-separated list of paths to local NodeManager log directories. Should be same values specified in yarn-site.xml. Required to set proper permissions on the log files so that they can be written to by the user's containers and read by the NodeManager for log aggregation.
    banned.users=hdfs,yarn,mapred,bin
    
    min.user.id=1000
  4. The path to the container-executor.cfg file is determined relative to the location of the container-executor binary. Specifically, the path is <dirname of container-executor binary>/../etc/hadoop/container-executor.cfg. If you installed the CDH 5 package, this path will always correspond to /etc/hadoop/conf/container-executor.cfg.
  5. Verify that the ownership and permissions of the container-executor program corresponds to:
    ---Sr-s--- 1 root yarn 36264 May 20 15:30 container-executor

Step 2: Start up the ResourceManager

You are now ready to start the ResourceManager.

If you're using the /etc/init.d/hadoop-yarn-resourcemanager script, then you can use the service command to run it now:

$ sudo service hadoop-yarn-resourcemanager start

You can verify that the ResourceManager is working properly by opening a web browser to http://host:8088/ where host is the name of the machine where the ResourceManager is running.

Step 3: Start up the NodeManager

You are now ready to start the NodeManager.

If you're using the /etc/init.d/hadoop-yarn-nodemanager script, then you can use the service command to run it now:

$ sudo service hadoop-yarn-nodemanager start

You can verify that the NodeManager is working properly by opening a web browser to http://host:8042/ where host is the name of the machine where the NodeManager is running.

Step 4: Start up the MapReduce Job History Server

You are now ready to start the MapReduce Job History Server.

If you're using the /etc/init.d/hadoop-mapreduce-historyserver script, then you can use the service command to run it now:

$ sudo service hadoop-mapreduce-historyserver start

You can verify that the MapReduce JobHistory Server is working properly by opening a web browser to http://host:19888/ where host is the name of the machine where the MapReduce JobHistory Server is running.

Step 5: Try Running a Map/Reduce YARN Job

You should now be able to run Map/Reduce jobs. To confirm, try launching a sleep or a pi job from the provided Hadoop examples (/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar). Note that you will need Kerberos credentials to do so.

To try running a MapReduce job using YARN, set the HADOOP_MAPRED_HOME environment variable and then submit the job. For example:

$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
$ /usr/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 10000

(Optional) Step 6: Configuring YARN for long-running applications

Long-running applications such as Spark Streaming jobs will need additional configuration since the default settings only allow the hdfs user's delegation tokens a maximum lifetime of 7 days which is not always sufficient.

You can work around this by configuring the ResourceManager as a proxy user for the corresponding HDFS NameNode so that the ResourceManager can request new tokens when the existing ones are past their maximum lifetime. YARN will then be able to continue performing localization and log-aggregation on behalf of the hdfs user.

Set the following property in yarn-site.xml to true:

<property> 
<name>yarn.resourcemanager.proxy-user-privileges.enabled</name>
<value>true</value>
</property>
Configure the following properties in core-site.xml on the HDFS NameNode. You can use a more restrictive configuration by specifying hosts/groups instead of * as in the example below.
<property> 
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
</property>

<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>*</value>
</property>