How to Configure Clusters to Use Kerberos for Authentication

Cloudera clusters can use Kerberos to authenticate services running on the cluster and the users who need access to those services. This How To guide provides the requirements, pre-requisites, and high-level summary of the steps needed to integrate clusters with Kerberos for authentication.

The following are the general steps for integrating Kerberos with Cloudera clusters without using the Cloudera Manager configuration wizard.

Step 1: Verify Requirements and Assumptions

The steps outlined below assume that:
  • The Kerberos instance has been setup, is running, and is available during the configuration process.
  • The Cloudera cluster has been installed and is operational, with all services fully-functional—Cloudera Manager Server, CDH, and Cloudera Manager Agent processes on all cluster nodes.

Hosts Configured for AES-256 Encryption

By default, CentOS and RHEL 5.5 (and higher) use AES-256 encryption for Kerberos tickets. If you use either of these platforms for your cluster, the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy File must be installed on all cluster hosts.

To install the JCE Policy file on the host system at the OS layer:
  1. Download the jce_policy-x.zip
  2. Unzip the file
  3. Follow the steps in the README.txt to install it.

Required Administrator Privileges

Setting up the Cloudera cluster to use Kerberos for authentication requires complete administrator access to the cluster and administrator privileges on the Kerberos instance:

If you do not have administrator privileges on the Kerberos instance, you will need help from the Kerberos administrator before you can complete the process.

Required User (Service Account) Directories

During installation, the cloudera-scm account is created on the host system. When Cloudera Manager and CDH services are installed at the same time, Cloudera Manager creates other accounts as needed to support the service role daemons. However, if the CDH services and Cloudera Manager are installed separately, you may need to specifically set directory permissions for certain Hadoop user (service daemon) accounts for successful integration with Kerberos. The following table shows the accounts used for core service roles. Note that hdfs acts as superuser for the system.

User Service Roles
hdfs NameNode, DataNodes, Secondary NameNode (and HDFS superuser)
mapred JobTracker, TaskTrackers (MR1), Job History Server (YARN)
yarn ResourceManager, NodeManager (YARN)
oozie Oozie Server
hue Hue Server, Beeswax Server, Authorization Manager, Job Designer
These accounts require ownership control over specific directories.
  • For newly installed Cloudera clusters (Cloudera Manager and CDH installed at the same time)—The Cloudera Manager Agent process on each cluster host automatically configures the appropriate directory ownership when the cluster launches.
  • For existing CDH clusters using HDFS and running MapReduce jobs prior to Cloudera Manager installation—The directory ownership must be manually configured, as shown in the table below. The directory owners cannot differ from those shown in the table to ensure that the service daemons can set permissions as needed on each directory.
Directory Specified in this Property Owner
dfs.name.dir hdfs:hadoop
dfs.data.dir hdfs:hadoop
mapred.local.dir mapred:hadoop
mapred.system.dir in HDFS mapred:hadoop
yarn.nodemanager.local-dirs yarn:yarn
yarn.nodemanager.log-dirs yarn:yarn
oozie.service.StoreService.jdbc.url (if using Derby) oozie:oozie
[[database]] name hue:hue
javax.jdo.option.ConnectionURL hue:hue

Step 2. Create Principal for Cloudera Manager Server in the Kerberos KDC

Cloudera Manager Server has its own principal to connect to the Kerberos KDC and import user and service principals for use by the cluster.

The steps below summarize the process of adding a principal specifically for Cloudera Manager Server to an MIT KDC and an Active Directory KDC. See documentation from MIT, Microsoft, or the appropriate vendor for more detailed information.

Creating a Principal in Active Directory

Check your Microsoft documentation for specific details for your Active Directory KDC. The general process is as follows:
  1. Create an Organizational Unit (OU) in your Active Directory KDC service that will contain the principals for use by the CDH cluster.
  2. Add a new user account to Active Directory, for example, username@YOUR-REALM.EXAMPLE.COM. Set the password for the user to never expire.
  3. Use the Delegate Control wizard of Active Directory and grant this new user permission to Create, Delete, and Manage User Accounts.

Creating a Principal in an MIT KDC

For MIT Kerberos, user principals that include the instance name admin designate a user account with administrator privileges. For example:
username/admin@YOUR-REALM.EXAMPLE.COM 

Create the Cloudera Manager Server principal as shown in one of the examples below, appropriate for the location of the Kerberos instance and using the correct REALM name for your setup.

For MIT Kerberos KDC on a remote host:

kadmin: addprinc -pw password cloudera-scm/admin@YOUR-REALM.EXAMPLE.COM
For MIT Kerberos KDC on a local host:
kadmin.local: addprinc -pw password cloudera-scm/admin@YOUR-REALM.EXAMPLE.COM
        

Step 3: Add the Credentials for the Principal to the Cluster

Assuming the principal was successfully added to the Kerberos KDC, it can be added to the cluster as follows:
  1. Log in to the Cloudera Manager Admin Console.
  2. Select Administration > Security.
  3. Click the Kerberos Credentials tab.
  4. Click the Import Kerberos Account Manager Credentials button:
  5. Enter the credentials for the principal added to the Kerberos KDC in the previous step:
    • For Username, enter the primary and realm portions of the Kerberos principal. Enter the realm name in all upper-case only (YOUR-REALM.EXAMPLE.COM) as shown below.
    • Enter the Password for the principal.

  6. Click Import.

Cloudera Manager encrypts the username and password into a keytab and uses it to create new principals in the KDC as needed.

Click Close when complete.

Step 4: Identify Default Kerberos Realm for the Cluster

Each host in the cluster must have the default realm property (default_realm) specified in the libdefaults section of its Kerberos configuration file (/etc/krb5.conf).
[libdefaults]
default_realm = FQDN.EXAMPLE.COM

After adding the default realm to the configuration file for all hosts in the cluster, configure the same default realm for Cloudera Manager Server.

In the Cloudera Manager Admin Console:

  1. Select Administration > Settings.
  2. Select Kerberos for the Category filter.
  3. In the Kerberos Security Realm field, enter the default realm name set in the Kerberos configuration file (/etc/krb5.conf) of each host in the cluster. For example:

  4. Click Save Changes.

Step 5: Stop all Services

All service daemons in the cluster must be stopped so that they can be restarted at the same time and start as authenticated services in the cluster. Service daemons running without authenticating to Kerberos first will not be able to communicate with other daemons in the cluster that have authenticated to Kerberos, so they must be shut down and restarted at the end of the configuration process, as a unit.

Stop all running services and the Cloudera Management Service as follows:

In the Cloudera Manager Admin Console:

  1. Select Clusters > Cluster-n.
  2. Click the Actions drop-down menu and select Stop to stop all services on the cluster.
  3. Click Stop on the warning message to stop all services on the cluster. The Command Details window displays the progress as each service shuts down. When the message All services successfully stopped displays, close the Command Details window.
  4. Select > Clusters > Cloudera Management Service.
  5. Click the Actions drop-down menu and select Stop to stop the Cloudera Management Service. The Command Details window displays the progress as each role instance running on the Cloudera Management Service shuts down. The process is completed when the message Command completed with n/n successful subcommands displays.
  6. Click Close.

Step 6. Specify Kerberos for Security

Kerberos must be specified as the security mechanism for Hadoop infrastructure, starting with the HDFS service. Cloudera Manager Server security for the cluster, you enable it on an HDFS service. After you do so, the Cloudera Manager Server automatically enables Hadoop security on the MapReduce and YARN services associated with that HDFS service.

In the Cloudera Manager Admin Console:

  1. Select Clusters > HDFS-n.
  2. Click the Configuration tab.
  3. Select HDFS-n for the Scope filter.
  4. Select Security for the Category filter.
  5. Scroll (or search) to find the Hadoop Secure Authentication property.
  6. Click the kerberos button to select Kerberos:

  7. Click the value for the Hadoop Secure Authorization property and select the checkbox to enable service-level authorization on the selected HDFS service. You can specify comma-separated lists of users and groups authorized to use Hadoop services or perform admin operations using the following properties under the Service-Wide Security section:
    • Authorized Users: Comma-separated list of users authorized to use Hadoop services.
    • Authorized Groups: Comma-separated list of groups authorized to use Hadoop services.
    • Authorized Admin Users: Comma-separated list of users authorized to perform admin operations on Hadoop.
    • Authorized Admin Groups: Comma-separated list of groups authorized to perform admin operations on Hadoop.
  8. In the Search field, type DataNode Transceiver to find the DataNode Transceiver Port property.
  9. Click the value for the DataNode Transceiver Port property and specify a privileged port number (below 1024). Cloudera recommends 1004.
  10. In the Search field, type DataNode HTTP to find the DataNode HTTP Web UI Port property and specify a privileged port number (below 1024). Cloudera recommends 1006.
  11. In the Search field type Data Directory Permissions to find the DataNode Data Directory Permissions property.
  12. Reset the value for the DataNode Data Directory Permissions property to the default value of 700 if not already set to that.
  13. Make sure you have changed the DataNode Transceiver Port, DataNode Data Directory Permissions and DataNode HTTP Web UI Port properties for every DataNode role group.
  14. Click Save Changes to save the configuration settings.

To enable ZooKeeper security:

  1. Go to the ZooKeeper Service Configuration tab and click View and Edit.
  2. Click the value for Enable Kerberos Authentication property.
  3. Click Save Changes to save the configuration settings.

To enable HBase security:

  1. Go to the HBase Service > Configuration tab and click View and Edit.
  2. In the Search field, type HBase Secure to show the Hadoop security properties (found under the Service-Wide > Security category).
  3. Click the value for the HBase Secure Authorization property and select the checkbox to enable authorization on the selected HBase service.
  4. Click the value for the HBase Secure Authentication property and select kerberos to enable authorization on the selected HBase service.
  5. Click Save Changes to save the configuration settings.
To enable Solr security:
  1. Go to the Solr Service > Configuration tab and click View and Edit.
  2. In the Search field, type Solr Secure to show the Solr security properties (found under the Service-WideSecurity category).
  3. Click the value for the Solr Secure Authentication property and select kerberos to enable authorization on the selected Solr service.
  4. Click Save Changes to save the configuration settings.

Credentials Generated

After you enable security for any of the services in Cloudera Manager, a command called Generate Credentials will be triggered automatically. You can watch the progress of the command on the top right corner of the screen that shows the running commands. Wait for this command to finish (indicated by a grey box containing "0" in it).

Step 7: Restart All Services

Start all services on the cluster using the Cloudera Manager Admin Console:

  1. Select Clusters > Cluster-n.
  2. Click the Actions drop-down button menu and select Start. The confirmation prompt displays.
  3. Click Start to confirm and continue. The Command Details window displays progress. When All services successfully started displays, close the Command Detailswindow.
  4. Select > Clusters > Cloudera Management Service.
  5. Click the Actions drop-down menu and select Start. The Command Details window displays the progress as each role instance running on the Cloudera Management Service starts up. The process is completed when the message Command completed with n/n successful subcommands displays.

Step 8: Deploy Client Configurations

Deploy client configurations for services supported on the cluster using the Cloudera Manager Admin Console:

  1. Select Clusters > Cluster-n.
  2. Click the Actions drop-down button menu and select Deploy Client Configuration.

Step 9: Create the HDFS Superuser Principal

To be able to create home directories for users, you will need access to the HDFS superuser account. (CDH automatically created the HDFS superuser account on each cluster host during CDH installation.) When you enabled Kerberos for the HDFS service, you lost access to the default HDFS superuser account using sudo -u hdfs commands. Cloudera recommends you use a different user account as the superuser, not the default hdfs account.

Designating a Non-Default Superuser Group

To designate a different group of superusers instead of using the default hdfs account, follow these steps:

  1. Go to the Cloudera Manager Admin Console and navigate to the HDFS service.
  2. Click the Configuration tab.
  3. Select Scope > HDFS (Service-Wide).
  4. Select Category > Security.
  5. Locate the Superuser Group property and change the value to the appropriate group name for your environment. For example, superuser.
  6. Click Save Changes.
  7. Restart the HDFS service.

    To enable your access to the superuser account now that Kerberos is enabled, you must now create a Kerberos principal or an Active Directory user whose first component is superuser:

For Active Directory

Add a new user account to Active Directory, superuser@YOUR-REALM.EXAMPLE.COM . The password for this account should be set to never expire.

For MIT KDC

  1. In the kadmin.local or kadmin shell, type the following command to create a Kerberos principal called superuser:
    kadmin:  addprinc superuser@YOUR-REALM.EXAMPLE.COM
    This command prompts you to create a password for the superuser principal. You should use a strong password because having access to this principal provides superuser access to all of the files in HDFS.
  2. To run commands as the HDFS superuser, you must obtain Kerberos credentials for the superuser principal. To do so, run the following command and provide the appropriate password when prompted.
    $ kinit superuser@YOUR-REALM.EXAMPLE.COM

Step 10: Get or Create a Kerberos Principal for Each User Account

Now that Kerberos is configured and enabled on your cluster, you and every other Hadoop user must have a Kerberos principal or keytab to obtain Kerberos credentials to be allowed to access the cluster and use the Hadoop services. In the next step of this procedure, you need to create your own Kerberos principals to verify that Kerberos security is working on your cluster. If you and the other Hadoop users already have a Kerberos principal or keytab, or if your Kerberos administrator can provide them, you can skip ahead to the next step.

To create Kerberos principals for all users:

Active Directory

Add a new AD user account, <username>@EXAMPLE.COM for each Cloudera Manager service that should use Kerberos authentication. The password for these service accounts should be set to never expire.

MIT KDC
  1. In the kadmin.local or kadmin shell, use the following command to create a principal for your account by replacing EXAMPLE.COM with the name of your realm, and replacing username with a username:
    kadmin:  addprinc username@EXAMPLE.COM 
  2. When prompted, enter the password twice.

Step 11: Prepare the Cluster for Each User

Before you and other business users can access the cluster, the hosts must be prepared for each are a few tasks you must do to prepare the hosts for each user.
  1. Each host in the cluster must have a Unix user account with the same name as primary component of the user's principal name. For example, the principal jcarlos@YOUR-REALM.EXAMPLE.COM needs the Linux account jcarlos on each host system. Use LDAP (OpenLDAP, Microsoft Active Directory) for this step if possible.
  2. Create a subdirectory under /user on HDFS for each user account (for example, /user/jcarlos). Change the owner and group of that directory to be the user.
    $ hadoop fs -mkdir /user/jcarlos
    $ hadoop fs -chown jcarlos /user/jcarlos

Step 12: Verify Successful Kerberos Integration

To verify that Kerberos has been successfully integrated for the cluster, try running one of the sample MapReduce jobs (sleep or pi, for example) provided at:
/usr/lib/hadoop/hadoop-examples.jar

This assumes you have the fully-functional cluster as recommended in Step 1: Verify Requirements and Assumptions and that the client configure files have been generated as detailed in Step 8: Deploying Client Configuration Files.

To verify that Kerberos security is working:

  1. Obtain Kerberos credentials for your user account.
    $ kinit
                youruserid@YOUR-REALM.EXAMPLE.COM
  2. Enter a password when prompted.
  3. Submit a sample pi calculation as a test MapReduce job. Use the following command if you use a package-based setup for Cloudera Manager:
    $ hadoop jar /usr/lib/hadoop-0.20/hadoop-0.20.2*examples.jar pi 10 10000
    Number of Maps = 10
    Samples per Map = 10000
    ...
    Job Finished in 38.572 seconds
    Estimated value of Pi is 3.14120000000000000000
    If you have a parcel-based setup, use the following command instead:
    $ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 10 10000
    Number of Maps = 10
    Samples per Map = 10000
    ...
    Job Finished in 30.958 seconds
    Estimated value of Pi is 3.14120000000000000000

You have now verified that Kerberos security is working on your cluster.