Configuring YARN Security

If you are using MRv1, skip this section and see Configuring MRv1 Security.

If you are using YARN, do the following steps to configure, start, and test secure YARN.

Step 1: Configure Secure YARN

Before you start:

The Kerberos principals for the ResourceManager and NodeManager are configured in the yarn-site.xml file. The same yarn-site.xml file must be installed on every host machine in the cluster.
Make sure that each user who will be running YARN jobs exists on all cluster nodes (that is, on every node that hosts any YARN daemon).

To configure secure YARN:

Add the following properties to the yarn-site.xml file on every machine in the cluster:

<!-- ResourceManager security configs -->
<property>
  <name>yarn.resourcemanager.keytab</name>
  <value>/etc/hadoop/conf/yarn.keytab</value>
<!-- path to the YARN keytab -->
</property>
<property>
  <name>yarn.resourcemanager.principal</name>

  <value>yarn/_HOST@YOUR-REALM.COM</value>
</property>

<!-- NodeManager security configs -->
<property>
  <name>yarn.nodemanager.keytab</name>
  <value>/etc/hadoop/conf/yarn.keytab</value>
<!-- path to the YARN keytab -->
</property>
<property>
  <name>yarn.nodemanager.principal</name>

  <value>yarn/_HOST@YOUR-REALM.COM</value>
</property>

<property>
  <name>yarn.nodemanager.container-executor.class</name>

  <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>

<property>
  <name>yarn.nodemanager.linux-container-executor.group</name>
  <value>yarn</value>
</property>

<!-- To enable SSL -->
<property>
  <name>yarn.http.policy</name>
  <value>HTTPS_ONLY</value>
</property>

Add the following properties to the mapred-site.xml file on every machine in the cluster:

<!-- MapReduce Job History Server security configs -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>host:port</value> <!-- Host and port of the MapReduce Job History Server; default port is 10020  -->
</property>
<property>
  <name>mapreduce.jobhistory.keytab</name>
  <value>/etc/hadoop/conf/mapred.keytab</value>
<!-- path to the MAPRED keytab for the Job History Server -->
</property>

<property>
  <name>mapreduce.jobhistory.principal</name>

  <value>mapred/_HOST@YOUR-REALM.COM</value>
</property>

<!-- To enable SSL -->

<property>
  <name>mapreduce.jobhistory.http.policy</name>
  <value>HTTPS_ONLY</value>
</property>

Create a file called container-executor.cfg for the Linux Container Executor program that contains the following information:
```
yarn.nodemanager.local-dirs=<comma-separated list of paths to local NodeManager directories. Should be same values specified in yarn-site.xml. Required to validate paths passed to container-executor in order.>
yarn.nodemanager.linux-container-executor.group=yarn
yarn.nodemanager.log-dirs=<comma-separated list of paths to local NodeManager log directories. Should be same values specified in yarn-site.xml. Required to set proper permissions on the log files so that they can be written to by the user's containers and read by the NodeManager for log aggregation.
banned.users=hdfs,yarn,mapred,bin

min.user.id=1000
```
Note:
In the container-executor.cfg file, the default setting for the banned.users property is hdfs, yarn, mapred, and bin to prevent jobs from being submitted via those user accounts. The default setting for the min.user.id property is 1000 to prevent jobs from being submitted with a user ID less than 1000, which are conventionally Unix super users. Note that some operating systems such as CentOS 5 use a default value of 500 and above for user IDs, not 1000. If this is the case on your system, change the default setting for the min.user.id property to 500. If there are user accounts on your cluster that have a user ID less than the value specified for the min.user.id property, the NodeManager returns an error code of 255.
The path to the container-executor.cfg file is determined relative to the location of the container-executor binary. Specifically, the path is <dirname of container-executor binary>/../etc/hadoop/container-executor.cfg. If you installed the CDH 5 package, this path will always correspond to /etc/hadoop/conf/container-executor.cfg.
Note:
The container-executor program requires that the paths including and leading up to the directories specified in yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs to be set to 755 permissions as shown in this table on permissions on directories.
Verify that the ownership and permissions of the container-executor program corresponds to:
```
---Sr-s--- 1 root yarn 36264 May 20 15:30 container-executor
```
Note:
For more information about the Linux Container Executor program, see Appendix B - Information about Other Hadoop Security Programs.

Step 2: Start up the ResourceManager

You are now ready to start the ResourceManager.

If you're using the /etc/init.d/hadoop-yarn-resourcemanager script, then you can use the service command to run it now:

$ sudo service hadoop-yarn-resourcemanager start

You can verify that the ResourceManager is working properly by opening a web browser to http://host:8088/ where host is the name of the machine where the ResourceManager is running.

Step 3: Start up the NodeManager

You are now ready to start the NodeManager.

If you're using the /etc/init.d/hadoop-yarn-nodemanager script, then you can use the service command to run it now:

$ sudo service hadoop-yarn-nodemanager start

You can verify that the NodeManager is working properly by opening a web browser to http://host:8042/ where host is the name of the machine where the NodeManager is running.

Step 4: Start up the MapReduce Job History Server

You are now ready to start the MapReduce Job History Server.

If you're using the /etc/init.d/hadoop-mapreduce-historyserver script, then you can use the service command to run it now:

$ sudo service hadoop-mapreduce-historyserver start

You can verify that the MapReduce JobHistory Server is working properly by opening a web browser to http://host:19888/ where host is the name of the machine where the MapReduce JobHistory Server is running.

Step 5: Try Running a Map/Reduce YARN Job

You should now be able to run Map/Reduce jobs. To confirm, try launching a sleep or a pi job from the provided Hadoop examples (/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar). Note that you will need Kerberos credentials to do so.

To try running a MapReduce job using YARN, set the HADOOP_MAPRED_HOME environment variable and then submit the job. For example:

$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
$ /usr/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 10000

(Optional) Step 6: Configuring YARN for long-running applications

Long-running applications such as Spark Streaming jobs will need additional configuration since the default settings only allow the hdfs user's delegation tokens a maximum lifetime of 7 days which is not always sufficient.

You can work around this by configuring the ResourceManager as a proxy user for the corresponding HDFS NameNode so that the ResourceManager can request new tokens when the existing ones are past their maximum lifetime. YARN will then be able to continue performing localization and log-aggregation on behalf of the hdfs user.

Set the following property in yarn-site.xml to true:

<property> 
<name>yarn.resourcemanager.proxy-user-privileges.enabled</name>
<value>true</value>
</property>

Configure the following properties in core-site.xml on the HDFS NameNode. You can use a more restrictive configuration by specifying hosts/groups instead of * as in the example below.

<property> 
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
</property>

<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>*</value>
</property>

Configuring MRv1 Security

Flume Authentication