Step 1: Install Cloudera Manager and CDH

If you have not already done so, Cloudera strongly recommends that you install and configure the Cloudera Manager Server and Cloudera Manager Agents and CDH to set up a fully-functional CDH cluster before you begin doing the following steps to implement Hadoop security features.

Overview of the User Accounts and Groups in CDH and Cloudera Manager to Support Security

When you install the CDH packages and the Cloudera Manager Agents on your cluster hosts, Cloudera Manager takes some steps to provide system security such as creating the following Unix accounts and setting directory permissions as shown in the following table. These Unix accounts and directory permissions work with the Hadoop Kerberos security requirements.

This User Runs These Roles
hdfs NameNode, DataNodes, and Secondary Node
mapred JobTracker and TaskTrackers (MR1) and Job History Server (YARN)
yarn ResourceManager and NodeManagers (YARN)
oozie Oozie Server
hue Hue Server, Beeswax Server, Authorization Manager, and Job Designer

The hdfs user also acts as the HDFS superuser.

When you install the Cloudera Manager Server on the server host, a new Unix user account called cloudera-scm is created automatically to support security. The Cloudera Manager Server uses this account to create and deploy the host principals and keytabs on your cluster.

If you installed CDH and Cloudera Manager at the Same Time

If you have a new installation and you installed CDH and Cloudera Manager at the same time, when you started the Cloudera Manager Agents on your cluster hosts, the Cloudera Manager Agent on each host automatically configured the directory owners shown in the following table to support security. Assuming the owners are configured as shown, the Hadoop daemons can then automatically set the permissions for each of the directories specified by the properties shown below to make sure they are properly restricted. It's critical that the owners are configured exactly as shown below, so don't change them:

Directory Specified in this Property Owner
dfs.name.dir hdfs:hadoop
dfs.data.dir hdfs:hadoop
mapred.local.dir mapred:hadoop
mapred.system.dir in HDFS mapred:hadoop
yarn.nodemanager.local-dirs yarn:yarn
yarn.nodemanager.log-dirs yarn:yarn
oozie.service.StoreService.jdbc.url (if using Derby) oozie:oozie
[[database]] name hue:hue
javax.jdo.option.ConnectionURL hue:hue

If you Installed and Used CDH Before Installing Cloudera Manager

If you have been using HDFS and running MapReduce jobs in an existing installation of CDH before you installed Cloudera Manager, you must manually configure the owners of the directories shown in the table above. Doing so enables the Hadoop daemons to automatically set the permissions for each of the directories. It's critical that you manually configure the owners exactly as shown above.