Step 2: Verify User Accounts and Groups in CDH 5 Due to Security

Step 2a (MRv1 only): Verify User Accounts and Groups in MRv1

During CDH 5 package installation of MRv1, the following Unix user accounts are automatically created to support security:

This User

Runs These Hadoop Programs

hdfs

HDFS: NameNode, DataNodes, Secondary NameNode (or Standby NameNode if you are using HA)

mapred

MRv1: JobTracker and TaskTrackers

The hdfs user also acts as the HDFS superuser.

The hadoop user no longer exists in CDH 5. If you currently use the hadoop user to run applications as an HDFS super-user, you should instead use the new hdfs user, or create a separate Unix account for your application such as myhadoopapp.

MRv1: Directory Ownership in the Local File System

Because the HDFS and MapReduce services run as different users, you must be sure to configure the correct directory ownership of the following files on the local filesystem of each host:

File System

Directory

Owner

Permissions

Local

dfs.namenode.name.dir (dfs.name.dir is deprecated but will also work)

hdfs:hdfs

drwx------

Local

dfs.datanode.data.dir (dfs.data.dir is deprecated but will also work)

hdfs:hdfs

drwx------

Local

mapred.local.dir

mapred:mapred

drwxr-xr-x

See also Setting Up MapReduce v1 (MRv1) Using the Command Line.

You must also configure the following permissions for the HDFS and MapReduce log directories (the default locations in /var/log/hadoop-hdfs and /var/log/hadoop-0.20-mapreduce), and the $MAPRED_LOG_DIR/userlogs/ directory:

File System

Directory

Owner

Permissions

Local

HDFS_LOG_DIR

hdfs:hdfs

drwxrwxr-x

Local

MAPRED_LOG_DIR

mapred:mapred

drwxrwxr-x

Local

userlogs directory in MAPRED_LOG_DIR

mapred:anygroup

permissions will be set automatically at daemon start time

MRv1: Directory Ownership on HDFS

The following directories on HDFS must also be configured as follows:

File System

Directory

Owner

Permissions

HDFS

mapreduce.jobtracker.system.dir (mapred.system.dir is deprecated but will also work)

mapred:hadoop

drwx------

HDFS

/ (root directory)

hdfs:hadoop

drwxr-xr-x

MRv1: Changing the Directory Ownership on HDFS

  • If Hadoop security is enabled, use kinit hdfs to obtain Kerberos credentials for the hdfs user by running the following commands before changing the directory ownership on HDFS:
$ sudo -u hdfs kinit -k -t hdfs.keytab hdfs/fully.qualified.domain.name@YOUR-REALM.COM

If kinit hdfs does not work initially, run kinit -R after running kinit to obtain credentials. (For more information, see Error Messages and Various Failures). To change the directory ownership on HDFS, run the following commands. Replace the example /mapred/system directory in the commands below with the HDFS directory specified by the mapreduce.jobtracker.system.dir (or mapred.system.dir) property in the conf/mapred-site.xml file:

$ sudo -u hdfs hadoop fs -chown mapred:hadoop /mapred/system
$ sudo -u hdfs hadoop fs -chown hdfs:hadoop /
$ sudo -u hdfs hadoop fs -chmod -R 700 /mapred/system
$ sudo -u hdfs hadoop fs -chmod 755 /
  • In addition (whether or not Hadoop security is enabled) create the /tmp directory. For instructions on creating /tmp and setting its permissions, see these instructions.

Step 2b (YARN only): Verify User Accounts and Groups in YARN

During CDH 5 package installation of MapReduce 2.0 (YARN), the following Unix user accounts are automatically created to support security:

This User

Runs These Hadoop Programs

hdfs

HDFS: NameNode, DataNodes, Standby NameNode (if you are using HA)

yarn

YARN: ResourceManager, NodeManager

mapred

YARN: MapReduce JobHistory Server

YARN: Directory Ownership in the Local Filesystem

Because the HDFS and MapReduce services run as different users, you must be sure to configure the correct directory ownership of the following files on the local filesystem of each host:

File System

Directory

Owner

Permissions (see Footnote 1)

Local

dfs.namenode.name.dir (dfs.name.dir is deprecated but will also work)

hdfs:hdfs

drwx------

Local

dfs.datanode.data.dir (dfs.data.dir is deprecated but will also work)

hdfs:hdfs

drwx------

Local

yarn.nodemanager.local-dirs

yarn:yarn

drwxr-xr-x

Local

yarn.nodemanager.log-dirs

yarn:yarn

drwxr-xr-x

Local

container-executor

root:yarn

--Sr-s---

Local

conf/container-executor.cfg

root:yarn

r--------

You must also configure the following permissions for the HDFS, YARN and MapReduce log directories (the default locations in /var/log/hadoop-hdfs, /var/log/hadoop-yarn and /var/log/hadoop-mapreduce):

File System

Directory

Owner

Permissions

Local

HDFS_LOG_DIR

hdfs:hdfs

drwxrwxr-x

Local

$YARN_LOG_DIR

yarn:yarn

drwxrwxr-x

Local

MAPRED_LOG_DIR

mapred:mapred

drwxrwxr-x

YARN: Directory Ownership on HDFS

The following directories on HDFS must also be configured as follows:

File System

Directory

Owner

Permissions

HDFS

/ (root directory)

hdfs:hadoop

drwxr-xr-x

HDFS

yarn.nodemanager.remote-app-log-dir

yarn:hadoop

drwxrwxrwxt

HDFS

mapreduce.jobhistory.intermediate-done-dir

mapred:hadoop

drwxrwxrwxt

HDFS

mapreduce.jobhistory.done-dir

mapred:hadoop

drwxr-x---

YARN: Changing the Directory Ownership on HDFS

If Hadoop security is enabled, use kinit hdfs to obtain Kerberos credentials for the hdfs user by running the following commands:
$ sudo -u hdfs kinit -k -t hdfs.keytab hdfs/fully.qualified.domain.name@YOUR-REALM.COM
$ hadoop fs -chown hdfs:hadoop /
$ hadoop fs -chmod 755 /

If kinit hdfs does not work initially, run kinit -R after running kinit to obtain credentials. See Error Messages and Various Failures. To change the directory ownership on HDFS, run the following commands:

$ sudo -u hdfs hadoop fs -chown hdfs:hadoop /
$ sudo -u hdfs hadoop fs -chmod 755 /
$ sudo -u hdfs hadoop fs -chown yarn:hadoop [yarn.nodemanager.remote-app-log-dir]
$ sudo -u hdfs hadoop fs -chmod 1777 [yarn.nodemanager.remote-app-log-dir]
$ sudo -u hdfs hadoop fs -chown mapred:hadoop [mapreduce.jobhistory.intermediate-done-dir]
$ sudo -u hdfs hadoop fs -chmod 1777 [mapreduce.jobhistory.intermediate-done-dir]
$ sudo -u hdfs hadoop fs -chown mapred:hadoop [mapreduce.jobhistory.done-dir]
$ sudo -u hdfs hadoop fs -chmod 750 [mapreduce.jobhistory.done-dir]
1 In CDH 5, package installation and the Hadoop daemons will automatically configure the correct permissions for you if you configure the directory ownership correctly as shown in the table above.
2 When starting up, MapReduce sets the permissions for the mapreduce.jobtracker.system.dir (or mapred.system.dir) directory in HDFS, assuming the user mapred owns that directory.
3 In CDH 5, package installation and the Hadoop daemons will automatically configure the correct permissions for you if you configure the directory ownership correctly as shown in the two tables above. See also Deploying MapReduce v2 (YARN) on a Cluster.