This is the documentation for CDH 4.7.1.
Documentation for other versions is available at Cloudera Documentation.

Step 2: Verify User Accounts and Groups in CDH4 Due to Security

  Note:

CDH4 introduces a new version of MapReduce: MapReduce 2.0 (MRv2) built on the YARN framework. In this document, we refer to this new version as YARN. CDH4 also provides an implementation of the previous version of MapReduce, referred to as MRv1 in this document.

Step 2a (MRv1 only): Verify User Accounts and Groups in MRv1

  Note:

If you are using YARN, skip this step and proceed to Step 2b (YARN only): Verify User Accounts and Groups in YARN.

During CDH4 package installation of MRv1, the following Unix user accounts are automatically created to support security:

This User

Runs These Hadoop Programs

hdfs

HDFS: NameNode, DataNodes, Secondary NameNode (or Standby NameNode if you are using HA)

mapred

MRv1: JobTracker and TaskTrackers

The hdfs user also acts as the HDFS superuser.

The hadoop user no longer exists in CDH4. If you currently use the hadoop user to run applications as an HDFS super-user, you should instead use the new hdfs user, or create a separate Unix account for your application such as myhadoopapp.

MRv1: Directory Ownership in the Local File System

Because the HDFS and MapReduce services run as different users, you must be sure to configure the correct directory ownership of the following files on the local file system of each host:

File System

Directory

Owner

Permissions 1

Local

dfs.namenode.name.dir (dfs.name.dir is deprecated but will also work)

hdfs:hdfs

drwx------

Local

dfs.datanode.data.dir (dfs.data.dir is deprecated but will also work)

hdfs:hdfs

drwx------

Local

mapred.local.dir

mapred:mapred

drwxr-xr-x

See also Deploying MapReduce v1 (MRv1) on a Cluster.

You must also configure the following permissions for the HDFS and MapReduce log directories (the default locations in /var/log/hadoop-hdfs and /var/log/hadoop-0.20-mapreduce), and the $MAPRED_LOG_DIR/userlogs/ directory:

File System

Directory

Owner

Permissions

Local

HDFS_LOG_DIR

hdfs:hdfs

drwxrwxr-x

Local

MAPRED_LOG_DIR

mapred:mapred

drwxrwxr-x

Local

userlogs directory in MAPRED_LOG_DIR

mapred:anygroup

permissions will be set automatically at daemon start time

MRv1: Directory Ownership on HDFS

The following directories on HDFS must also be configured as follows:

File System

Directory

Owner

Permissions 2

HDFS

mapreduce.jobtracker.system.dir (mapred.system.dir is deprecated but will also work)

mapred:hadoop

drwx------

HDFS

/ (root directory)

hdfs:hadoop

drwxr-xr-x

MRv1: Changing the Directory Ownership on HDFS

  • If Hadoop security is enabled, use kinit hdfs to obtain Kerberos credentials for the hdfs user by running the following commands before changing the directory ownership on HDFS:
$ sudo -u hdfs kinit -k -t hdfs.keytab hdfs/fully.qualified.domain.name@YOUR-REALM.COM

If kinit hdfs does not work initially, run kinit -R after running kinit to obtain credentials. (For more information, see Problem 2 in Appendix A - Troubleshooting). To change the directory ownership on HDFS, run the following commands. Replace the example /mapred/system directory in the commands below with the HDFS directory specified by the mapreduce.jobtracker.system.dir (or mapred.system.dir) property in the conf/mapred-site.xml file:

$ sudo -u hdfs hadoop fs -chown mapred:hadoop /mapred/system
$ sudo -u hdfs hadoop fs -chown hdfs:hadoop /
$ sudo -u hdfs hadoop fs -chmod -R 700 /mapred/system
$ sudo -u hdfs hadoop fs -chmod 755 /
  • In addition (whether or not Hadoop security is enabled) create the /tmp directory. For instructions on creating /tmp and setting its permissions, see these instructions.

Step 2b (YARN only): Verify User Accounts and Groups in YARN

  Note:

If you are using MRv1, skip this step and proceed to Step 3: If you are Using AES-256 Encryption, install the JCE Policy File.

During CDH4 package installation of MapReduce 2.0 (YARN), the following Unix user accounts are automatically created to support security:

This User

Runs These Hadoop Programs

hdfs

HDFS: NameNode, DataNodes, Standby NameNode (if you are using HA)

yarn

YARN: ResourceManager, NodeManager

mapred

YARN: MapReduce Job History Server

  Important:

The HDFS and YARN daemons must run as different Unix users; for example, hdfs and yarn. The MapReduce Job History server must run as user mapred. Having all of these users share a common Unix group is recommended; for example, hadoop.

YARN: Directory Ownership in the Local File System

Because the HDFS and MapReduce services run as different users, you must be sure to configure the correct directory ownership of the following files on the local file system of each host:

File System

Directory

Owner

Permissions (see Footnote 1)

Local

dfs.namenode.name.dir (dfs.name.dir is deprecated but will also work)

hdfs:hdfs

drwx------

Local

dfs.datanode.data.dir (dfs.data.dir is deprecated but will also work)

hdfs:hdfs

drwx------

Local

yarn.nodemanager.local-dirs

yarn:yarn

drwxr-xr-x

Local

yarn.nodemanager.log-dirs

yarn:yarn

drwxr-xr-x

Local

container-executor

root:yarn

--Sr-s---

Local

conf/container-executor.cfg

root:yarn

r--------

You must also configure the following permissions for the HDFS, YARN and MapReduce log directories (the default locations in /var/log/hadoop-hdfs, /var/log/hadoop-yarn and /var/log/hadoop-mapreduce):

File System

Directory

Owner

Permissions 3

Local

HDFS_LOG_DIR

hdfs:hdfs

drwxrwxr-x

Local

$YARN_LOG_DIR

yarn:yarn

drwxrwxr-x

Local

MAPRED_LOG_DIR

mapred:mapred

drwxrwxr-x

YARN: Directory Ownership on HDFS

The following directories on HDFS must also be configured as follows:

File System

Directory

Owner

Permissions

HDFS

/ (root directory)

hdfs:hadoop

drwxr-xr-x

HDFS

yarn.nodemanager.remote-app-log-dir

yarn:hadoop

drwxrwxrwxt

HDFS

mapreduce.jobhistory.intermediate-done-dir

mapred:hadoop

drwxrwxrwxt

HDFS

mapreduce.jobhistory.done-dir

mapred:hadoop

drwxr-x---

YARN: Changing the Directory Ownership on HDFS

  • If Hadoop security is enabled, use kinit hdfs to obtain Kerberos credentials for the hdfs user by running the following commands:
$ sudo -u hdfs kinit -k -t hdfs.keytab hdfs/fully.qualified.domain.name@YOUR-REALM.COM
$ hadoop fs -chown hdfs:hadoop /
$ hadoop fs -chmod 755 /

If kinit hdfs does not work initially, run kinit -R after running kinit to obtain credentials. (See Problem 2 in Appendix A - Troubleshooting. To change the directory ownership on HDFS, run the following commands:

$ sudo -u hdfs hadoop fs -chown hdfs:hadoop /
$ sudo -u hdfs hadoop fs -chmod 755 /
$ sudo -u hdfs hadoop fs -chown yarn:hadoop [yarn.nodemanager.remote-app-log-dir]
$ sudo -u hdfs hadoop fs -chmod 1777 [yarn.nodemanager.remote-app-log-dir]
$ sudo -u hdfs hadoop fs -chown mapred:hadoop [mapreduce.jobhistory.intermediate-done-dir]
$ sudo -u hdfs hadoop fs -chmod 1777 [mapreduce.jobhistory.intermediate-done-dir]
$ sudo -u hdfs hadoop fs -chown mapred:hadoop [mapreduce.jobhistory.done-dir]
$ sudo -u hdfs hadoop fs -chmod 750 [mapreduce.jobhistory.done-dir]
  • In addition (whether or not Hadoop security is enabled) create the /tmp directory. For instructions on creating /tmp and setting its permissions, see these instructions
  • In addition (whether or not Hadoop security is enabled), change permissions on the /user/history Directory. See these instructions here.
1 In CDH4, package installation and the Hadoop daemons will automatically configure the correct permissions for you if you configure the directory ownership correctly as shown in the table above.
2 When starting up, MapReduce sets the permissions for the mapreduce.jobtracker.system.dir (or mapred.system.dir) directory in HDFS, assuming the user mapred owns that directory.
3 In CDH4, package installation and the Hadoop daemons will automatically configure the correct permissions for you if you configure the directory ownership correctly as shown in the two tables above. See also Deploying MapReduce v2 (YARN) on a Cluster.