Integrating Hadoop Security with Active Directory

Several different subsystems are involved in servicing authentication requests, including the Key Distribution Center (KDC), Authentication Service (AS), and Ticket Granting Service (TGS). The more nodes in the cluster and the more services provided, the heavier the traffic between these services and the services running on the cluster.

As a general guideline, Cloudera recommends using a dedicated Active Directory instance (Microsoft Server Domain Services) for every 100-200 nodes in the cluster. However, this is just a loose guideline. Monitor utilization and deploy additional instances as needed to meet the demand.

By default, Kerberos uses UDP for client/server communication which is typically faster at delivering packets than TCP, but does not guarantee delivery. Additionally, using UDP packets that get too large are frequently dropped, as is the case when a user is a member of a large number of groups. To avoid this problem, force Kerberos to use TCP by modifying the Kerberos configuration file (krb5.conf) as follows:
[libdefaults]
udp_preference_limit = 1
...

This is especially useful if the domain controllers are not on the same subnet as the cluster or are separated by firewalls.

In general, Cloudera recommends setting up the Active Directory domain controller (Microsoft Server Domain Services) on the same subnet as the cluster and never over a WAN connection which results in considerable latency and affects cluster performance.

Troubleshooting cluster operations when Active Directory is being used for Kerberos authentication requires administrative access to the Microsoft Server Domain Services instance. Administrators may need to enable Kerberos event logging on the Microsoft Server KDC to resolve issues.

Deleting Cloudera Manager roles or nodes requires manually deleting the associate Active Directory accounts. Cloudera Manager cannot delete entries from Active Directory.

Configuring a Local MIT Kerberos Realm to Trust Active Directory

On the Active Directory Server

  1. Add the local realm trust to Active Directory with this command:
    netdom trust YOUR-LOCAL-REALM.COMPANY.COM /Domain:AD-REALM.COMPANY.COM /add /realm /passwordt:<TrustPassword>
  2. Set the proper encryption type with this command:

    On Windows 2003 RC2:

    ktpass /MITRealmName YOUR-LOCAL-REALM.COMPANY.COM /TrustEncryp <enc_type>

    On Windows 2008:

    ksetup /SetEncTypeAttr YOUR-LOCAL-REALM.COMPANY.COM <enc_type>

    The <enc_type> parameter specifies AES, DES, or RC4 encryption. Refer to the documentation for your version of Windows Active Directory to find the <enc_type> parameter string to use.

  3. Get and verify the list of encryption types set with this command:

    On Windows 2008:

    ksetup /GetEncTypeAttr YOUR-LOCAL-REALM.COMPANY.COM

On the MIT KDC Server

Type the following command in the kadmin.local or kadmin shell to add the cross-realm krbtgt principal. Use the same password you used in the netdom command on the Active Directory Server.

kadmin:  addprinc -e "<enc_type_list>" krbtgt/YOUR-LOCAL-REALM.COMPANY.COM@AD-REALM.COMPANY.COM

where the <enc_type_list> parameter specifies the types of encryption this cross-realm krbtgt principal will support: either AES, DES, or RC4 encryption. You can specify multiple encryption types using the parameter in the command above, what's important is that at least one of the encryption types corresponds to the encryption type found in the tickets granted by the KDC in the remote realm. For example:

kadmin:  addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal" krbtgt/YOUR-LOCAL-REALM.COMPANY.COM@AD-REALM.COMPANY.COM

On All of the Cluster Hosts

  1. Verify that both Kerberos realms are configured on all of the cluster hosts. Note that the default realm and the domain realm should remain set as the MIT Kerberos realm which is local to the cluster.
    [realms]
      AD-REALM.CORP.FOO.COM = {
        kdc = ad.corp.foo.com:88
        admin_server = ad.corp.foo.com:749
        default_domain = foo.com
      }
      CLUSTER-REALM.CORP.FOO.COM = {
        kdc = cluster01.corp.foo.com:88
        admin_server = cluster01.corp.foo.com:749
        default_domain = foo.com
      }
  2. To properly translate principal names from the Active Directory realm into local names within Hadoop, you must configure the hadoop.security.auth_to_local setting in the core-site.xml file on all of the cluster machines. The following example translates all principal names with the realm AD-REALM.CORP.FOO.COM into the first component of the principal name only. It also preserves the standard translation for the default realm (the cluster realm).
    <property>
      <name>hadoop.security.auth_to_local</name>
      <value>
        RULE:[1:$1@$0](^.*@AD-REALM\.CORP\.FOO\.COM$)s/^(.*)@AD-REALM\.CORP\.FOO\.COM$/$1/g
        RULE:[2:$1@$0](^.*@AD-REALM\.CORP\.FOO\.COM$)s/^(.*)@AD-REALM\.CORP\.FOO\.COM$/$1/g
        DEFAULT
      </value>
    </property>

For more information about name mapping rules, see Configuring the Mapping from Kerberos Principals to Short Names.