Enabling Kerberos Authentication for Impala
Impala supports Kerberos authentication. For more information on enabling Kerberos authentication, see the topic on Configuring Hadoop Security in the CDH 5 Security Guide.
When using Impala in a managed environment, Cloudera Manager automatically completes Kerberos configuration. In an unmanaged environment, create a Kerberos principal for each host running impalad or statestored. Cloudera recommends using a consistent format, such as impala/_HOST@Your-Realm, but you can use any three-part Kerberos server principal.
In Impala 2.0 and later, user() returns the full Kerberos principal string, such as firstname.lastname@example.org, in a Kerberized environment.
An alternative form of authentication you can use is LDAP, described in Enabling LDAP Authentication for Impala.
Requirements for Using Impala with Kerberos
sudo yum install python-devel openssl-devel python-pip sudo pip-python install ssl
Start all impalad and statestored daemons with the --principal and --keytab-file flags set to the principal and full path name of the keytab file containing the credentials for the principal.
Impala supports the Cloudera ODBC driver and the Kerberos interface provided. To use Kerberos through the ODBC driver, the host type must be set depending on the level of the ODBD driver:
- SecImpala for the ODBC 1.0 driver.
- SecBeeswax for the ODBC 1.2 driver.
- Blank for the ODBC 2.0 driver or higher, when connecting to a secure cluster.
- HS2NoSasl for the ODBC 2.0 driver or higher, when connecting to a non-secure cluster.
To enable Kerberos in the Impala shell, start the impala-shell command using the -k flag.
To enable Impala to work with Kerberos security on your Hadoop cluster, make sure you perform the installation and configuration steps in Authentication in the CDH 5 Security Guide.
Configuring Impala to Support Kerberos Security
Enabling Kerberos authentication for Impala involves steps that can be summarized as follows:
- Creating service principals for Impala and the HTTP service. Principal names take the form: serviceName/fully.qualified.domain.name@KERBEROS.REALM
- Creating, merging, and distributing key tab files for these principals.
- Editing /etc/default/impala (in cluster not managed by Cloudera Manager), or editing the Security settings in the Cloudera Manager interface, to accommodate Kerberos authentication.
Enabling Kerberos for Impala
- Create an Impala service principal, specifying the name of the OS user that the Impala daemons run under, the fully qualified domain name of each node running impalad, and the realm name. For example:
$ kadmin kadmin: addprinc -requires_preauth -randkey impala/impala_host.example.com@TEST.EXAMPLE.COM
- Create an HTTP service principal. For example:
kadmin: addprinc -randkey HTTP/impala_host.example.com@TEST.EXAMPLE.COM
- Create keytab files with both principals. For example:
kadmin: xst -k impala.keytab impala/impala_host.example.com kadmin: xst -k http.keytab HTTP/impala_host.example.com kadmin: quit
- Use ktutil to read the contents of the two keytab files and then write those contents to a new file. For example:
$ ktutil ktutil: rkt impala.keytab ktutil: rkt http.keytab ktutil: wkt impala-http.keytab ktutil: quit
- (Optional) Test that credentials in the merged keytab file are valid, and that the "renew until" date is in the future. For example:
$ klist -e -k -t impala-http.keytab
- Copy the impala-http.keytab file to the Impala configuration directory. Change the permissions to be only read for the file owner and change the file
owner to the impala user. By default, the Impala user and group are both named impala. For example:
$ cp impala-http.keytab /etc/impala/conf $ cd /etc/impala/conf $ chmod 400 impala-http.keytab $ chown impala:impala impala-http.keytab
- Add Kerberos options to the Impala defaults file, /etc/default/impala. Add the options for both the impalad and
statestored daemons, using the IMPALA_SERVER_ARGS and IMPALA_STATE_STORE_ARGS variables. For
example, you might add:
-kerberos_reinit_interval=60 -principal=impala_1/impala_host.example.com@TEST.EXAMPLE.COM -keytab_file=/var/run/cloudera-scm-agent/process/3212-impala-IMPALAD/impala.keytab
For more information on changing the Impala defaults specified in /etc/default/impala, see Modifying Impala Startup Options.
Enabling Kerberos for Impala with a Proxy Server
A common configuration for Impala with High Availability is to use a proxy server to submit requests to the actual impalad daemons on different hosts in the cluster. This configuration avoids connection problems in case of machine failure, because the proxy server can route new requests through one of the remaining hosts in the cluster. This configuration also helps with load balancing, because the additional overhead of being the "coordinator node" for each query is spread across multiple hosts.
Although you can set up a proxy server with or without Kerberos authentication, typically users set up a secure Kerberized configuration. For information about setting up a proxy server for Impala, including Kerberos-specific steps, see Using Impala through a Proxy for High Availability.
Enabling Impala Delegation for Kerberos Users
See Configuring Impala Delegation for Hue and BI Tools for details about the delegation feature that lets certain users submit queries using the credentials of other users.
Using TLS/SSL with Business Intelligence Tools
You can use Kerberos authentication, TLS/SSL encryption, or both to secure connections from JDBC and ODBC applications to Impala. See Configuring Impala to Work with JDBC and Configuring Impala to Work with ODBC for details.
Prior to CDH 5.7 / Impala 2.5, the Hive JDBC driver did not support connections that use both Kerberos authentication and SSL encryption. If your cluster is running an older release that has this restriction, to use both of these security features with Impala through a JDBC application, use the Cloudera JDBC Connector as the JDBC driver.
Enabling Access to Internal Impala APIs for Kerberos Users
For applications that need direct access to Impala APIs, without going through the HiveServer2 or Beeswax interfaces, you can specify a list of Kerberos users who are allowed to call those APIs. By default, the impala and hdfs users are the only ones authorized for this kind of access. Any users not explicitly authorized through the internal_principals_whitelist configuration setting are blocked from accessing the APIs. This setting applies to all the Impala-related daemons, although currently it is primarily used for HDFS to control the behavior of the catalog server.