Kudu Security Overview

Kudu includes security features that allow Kudu clusters to be hardened against access from unauthorized users. Kudu uses strong authentication with Kerberos, while communication between Kudu clients and servers can now be encrypted with TLS. Kudu also allows you to use HTTPS encryption to connect to the web UI.

The rest of this topic describes the security capabilities of Apache Kudu and how to configure a secure Kudu cluster. Currently, there are a few known limitations in Kudu security that might impact your cluster. For the list, see Security Limitations.

Kudu Authentication with Kerberos

Kudu can be configured to enforce secure authentication among servers, and between clients and servers. Authentication prevents untrusted actors from gaining access to Kudu, and securely identifies connecting users or services for authorization checks. Authentication in Kudu is designed to interoperate with other secure Hadoop components by utilizing Kerberos.

Configure authentication on Kudu servers using the --rpc_authentication flag, which can be set to one of the following options:
  • required - Kudu will reject connections from clients and servers who lack authentication credentials.
  • optional - Kudu will attempt to use strong authentication, but will allow unauthenticated connections.
  • disabled - Kudu will only allow unauthenticated connections.
By default, the flag is set to optional. To secure your cluster, set --rpc_authentication to required.

Internal Private Key Infrastructure (PKI)

Kudu uses an internal PKI to issue X.509 certificates to servers in the cluster. Connections between peers who have both obtained certificates will use TLS for authentication. In such cases, neither peer needs to contact the Kerberos KDC.

X.509 certificates are only used for internal communication among Kudu servers, and between Kudu clients and servers. These certificates are never presented in a public facing protocol. By using internally-issued certificates, Kudu offers strong authentication which scales to huge clusters, and allows TLS encryption to be used without requiring you to manually deploy certificates on every node.

Authentication Tokens

After authenticating to a secure cluster, the Kudu client will automatically request an authentication token from the Kudu master. An authentication token encapsulates the identity of the authenticated user and carries the Kudu master's RSA signature so that its authenticity can be verified. This token will be used to authenticate subsequent connections. By default, authentication tokens are only valid for seven days, so that even if a token were compromised, it cannot be used indefinitely. For the most part, authentication tokens should be completely transparent to users. By using authentication tokens, Kudu is able to take advantage of strong authentication, without paying the scalability cost of communicating with a central authority for every connection.

When used with distributed compute frameworks such as Apache Spark, authentication tokens can simplify configuration and improve security. For example, the Kudu Spark connector will automatically retrieve an authentication token during the planning stage, and distribute the token to tasks. This allows Spark to work against a secure Kudu cluster where only the planner node has Kerberos credentials.

Client Authentication to Secure Kudu Clusters

Users running client Kudu applications must first run the kinit command to obtain a Kerberos ticket-granting ticket. For example:

kinit admin@EXAMPLE-REALM.COM

Once authenticated, you use the same client code to read from and write to Kudu servers with and without the Kerberos configuration.

Scalability

Kudu authentication is designed to scale to thousands of nodes, which means it must avoid unnecessary coordination with a central authentication authority (such as the Kerberos KDC) for each connection. Instead, Kudu servers and clients use Kerberos to establish initial trust with the Kudu master, and then use alternate credentials for subsequent connections. As described previously, the Kudu master issues internal X.509 certificates to tablet servers on startup, and temporary authentication tokens to clients on first contact.

Coarse-grained Authorization

Kudu supports coarse-grained authorization checks for client requests based on the client's authenticated Kerberos principal (user or service). Access levels are granted based on whitelist-style Access Control Lists (ACLs), one for each level. Each ACL specifies a comma-separated list of users, or may be set to '*' to indicate that all authenticated users have access rights at the specified level.

The two levels of access which can be configured are:

  • Superuser - Principals authorized as a superuser can perform certain administrative functions such as using the kudu command line tool to diagnose and repair cluster issues.
  • User - Principals authorized as a user are able to access and modify all data in the Kudu cluster. This includes the ability to create, drop, and alter tables, as well as read, insert, update, and delete data. The default value for the User ACL is '*', which allows all users access to the cluster. However, if authentication is enabled, this will restrict access to only those users who are able to successfully authenticate using Kerberos. Unauthenticated users on the same network as the Kudu servers will be unable to access the cluster.

Fine-Grained Authorization

As of Kudu 1.10.0, Kudu can be configured to enforce fine-grained authorization across servers. This ensures that users can see only the data they are explicitly authorized to see. Kudu currently supports this by leveraging policies defined in Apache Sentry 2.2 and later.

Apache Sentry

Apache Sentry models tabular objects in the following hierarchy:

  • Server: is indicated by the Kudu configuration flag --server_name. Everything stored in a Kudu cluster falls within the given "server"
  • Database: is indicated as a prefix of table names with the format <database>.<table>
  • Table: is a single Kudu table.
  • Column: is a column within a Kudu table.

Each level of this hierarchy defines a scope on which the privileges can be granted. Privileges granted on a higher scope imply privileges on a lower scope. For example, if a user has a SELECT privilege on a database, then that user implicitly has the SELECT privileges on every table belonging to that database.

Privileges are also associated with specific actions. Access to Kudu tables may rely on privileges on the following actions:
  • ALTER
  • CREATE
  • DELETE
  • DROP
  • INSERT
  • UPDATE
  • SELECT

Additionally, there are three special actions recognized by Kudu: ALL, OWNER, and METADATA. If a user has the ALL or OWNER privileges on a given table, that user has all of the above privileges on the table. METADATA privilege is not an actual privilege per se, rather, it is a conceptual privilege with which Kudu models any privilege. If a user has any privilege on a given table, that user has METADATA privileges on the table, i.e. a privilege granted on any action on a table implies that the user has the METADATA privilege on that table.

For more details about Sentry privileges, see the Apache Sentry documentation.

When a Kudu master receives a request, it consults Sentry to determine what privileges a user has. If the user is not authorized to perform the requested action, the request is rejected. Kudu leverages the authenticated identity of a user to decide whether to perform or reject a request.

Authorization Tokens

Rather than having every tablet server communicate directly with Sentry, privileges are propagated and checked via authorization tokens. These tokens encapsulate what privileges a user has on a given table. Tokens are generated by the master and returned to Kudu clients upon opening a Kudu table. Kudu clients automatically attach authorization tokens when sending requests to tablet servers.

Authorization tokens are a means to limit the number of nodes directly accessing Sentry to retrieve privileges. As such, since the expected number of tablet servers in a cluster is much higher than the number of Kudu masters, they are only used to authorize requests sent to tablet servers. Kudu masters fetch privileges directly from Sentry or cache. See <<privilege-caching>> for more details of Kudu's privilege cache.

Similar to the validity interval for authentication tokens, to limit the window of potential unwanted access if a token becomes compromised, authorization tokens are valid for five minutes by default. The acquisition and renewal of a token is hidden from the user, as Kudu clients automatically retrieve new tokens when existing tokens expire.

When a tablet server that has been configured to enforce fine-grained access control receives a request, it checks the privileges in the attached token, rejecting it if the privileges are not sufficient to perform the requested operation, or if it is invalid (e.g. expired).

Trusted Users

It may be desirable to allow certain users to view and modify any data stored in Kudu. Such users can be specified via the --trusted_user_acl master configuration. Trusted users can perform any operation that would otherwise require fine-grained privileges, without Kudu consulting Sentry.

Additionally, some services that interact with Kudu may authorize requests on behalf of their end users. For example, Apache Impala authorizes queries on behalf of its users, and sends requests to Kudu as the Impala service user, commonly "impala". Since Impala authorizes requests on its own, to avoid extraneous communication between Sentry and Kudu, the Impala service user should be listed as a trusted user.

Configuring the Integration with Apache Sentry

Sentry is often configured with Kerberos authentication. In order to enable integration with Sentry, a cluster must first be integrated with the Apache Hive Metastore. See the Enabling the Hive Metastore Integration section to configure Kudu to synchronize its internal catalog with the Hive Metastore.

The following configurations must be set on the master:
--sentry_service_rpc_addresses=<Sentry RPC address>
--server_name=<value of HiveServer2's hive.sentry.server configuration>
--kudu_service_name=kudu
--sentry_service_kerberos_principal=sentry
--sentry_service_security_mode=kerberos

# This example ACL setup allows the 'impala' user to access all data stored in
# Kudu, assuming Impala will authorize requests on its own. The 'hadoopadmin'
# user is also granted access to all Kudu data, which may facilitate testing
# and debugging.
--trusted_user_acl=impala,hadoopadmin
The following configurations must be set on the tablet servers:
--tserver_enforce_access_control=true

Caching

To avoid overwhelming Sentry with requests to fetch user privileges, the Kudu master can be configured to cache user privileges. A by-product of this caching is that when privileges are changed in Sentry, they may not be reflected in Kudu for a configurable amount of time, defined by the following Kudu master configurations:
--sentry_privileges_cache_ttl_factor * --authz_token_validity_interval_secs
The default value is fifty minutes. If privilege updates need to be reflected in Kudu sooner than this, the Kudu CLI tool can be used to invalidate the cached privileges to force Kudu to fetch new ones from Sentry:
kudu master authz_cache reset <master-addresses>

Policy for Kudu Masters

The following authorization policy is enforced by Kudu masters:

Authorization Policy for Masters
Operation Required Privilege
CreateTable CREATE ON DATABASE
CreateTable with a different owner specified than the requesting user ALL ON DATABASE with the Sentry GRANT OPTION. See GRANT <Privilege> ... WITH GRANT OPTION.
DeleteTable DROP ON TABLE
AlterTable (with no rename) ALTER ON TABLE
AlterTable (with rename) ALL ON TABLE <old-table> and CREATE ON DATABASE <new-database>
IsCreateTableDone METADATA ON TABLE
IsAlterTableDone METADATA ON TABLE
ListTables METADATA ON TABLE
GetTableLocations METADATA ON TABLE
GetTableSchema METADATA ON TABLE
GetTabletLocations METADATA ON TABLE

Policy for Kudu Tablet Servers

The following authorization policy is enforced by Kudu tablet servers:

Authorization Policy for Tablet Servers
Operation Required Privilege
Scan

SELECT ON TABLE, or

METADATA ON TABLE and SELECT ON COLUMN for each projected column and each predicate column

Scan (no projected columns, equivalent to COUNT(*))

SELECT ON TABLE, or

SELECT ON COLUMN for each column in the table

Scan (with virtual columns)

SELECT ON TABLE, or

SELECT ON COLUMN for each column in the table

Scan (in ORDERED mode) <privileges required for a Scan> and SELECT ON COLUMN for each primary key column
Insert INSERT ON TABLE
Update UPDATE ON TABLE
Upsert INSERT ON TABLE and UPDATE ON TABLE
Delete DELETE ON TABLE
SplitKeyRange SELECT ON COLUMN for each primary key column and SELECT ON COLUMNfor each projected column
Checksum User must be configured in --superuser_acl
ListTablets User must be configured in --superuser_acl

Encryption

Kudu allows you to use TLS to encrypt all communications among servers, and between clients and servers. Configure TLS encryption on Kudu servers using the --rpc_encryption flag, which can be set to one of the following options:
  • required - Kudu will reject unencrypted connections.
  • optional - Kudu will attempt to use encryption, but will allow unencrypted connections.
  • disabled - Kudu will not use encryption.
By default, the flag is set to optional. To secure your cluster, set --rpc_encryption to required.

Web UI Encryption

The Kudu web UI can be configured to use secure HTTPS encryption by providing each server with TLS certificates. Use the --webserver_certificate_file and --webserver_private_key_file properties to specify the certificate and private key to be used for communication.

Alternatively, you can choose to completely disable the web UI by setting --webserver_enabled flag to false on the Kudu servers.

Web UI Redaction

To prevent sensitive data from being included in the web UI, all row data is redacted. Table metadata, such as table names, column names, and partitioning information is not redacted. Alternatively, you can choose to completely disable the web UI by setting the --webserver_enabled flag to false on the Kudu servers.

Log Redaction

To prevent sensitive data from being included in Kudu server logs, all row data will be redacted. You can turn off log redaction using the --redact flag.

Configuring a Secure Kudu Cluster using Cloudera Manager

Enabling Kerberos Authentication and RPC Encryption

To enable Kerberos authentication for Kudu:
  1. Go to the Kudu service.
  2. Click the Configuration tab.
  3. Select Category > Main.
  4. In the Search field, type Kerberos to show the relevant properties.
  5. Edit the following properties according to your cluster configuration:
    Field Usage Notes
    Kerberos Principal Set to the default principal, kudu. Currently, Kudu does not support configuring a custom service principal for Kudu processes.
    Enable Secure Authentication And Encryption Select this checkbox to enable authentication and RPC encryption between all Kudu clients and servers, as well as between individual servers. Only enable this property after you have configured Kerberos.
  6. Click Save Changes.
  7. You will see an error message that tells you the Kudu keytab is missing. To generate the keytab, go to the top navigation bar and click Administration > Security.
  8. Go to the Kerberos Credentials tab. On this page you will see a list of the existing Kerberos principals for services running on the cluster.
  9. Click Generate Missing Credentials. Once the Generate Missing Credentials command has finished running, you will see the Kudu principal added to the list.

Configuring Coarse-grained Authorization with ACLs

  1. Go to the Kudu service.
  2. Click the Configuration tab.
  3. Select Category > Security.
  4. In the Search field, type ACL to show the relevant properties.
  5. Edit the following properties according to your cluster configuration:
    Field Usage Notes
    Superuser Access Control List Add a comma-separated list of superusers who can access the cluster. By default, this property is left blank.

    '*' indicates that all authenticated users will be given superuser access.

    User Access Control List Add a comma-separated list of users who can access the cluster. By default, this property is set to '*'.

    The default value of '*' allows all users access to the cluster. However, if authentication is enabled, this will restrict access to only those users who are able to successfully authenticate using Kerberos. Unauthenticated users on the same network as the Kudu servers will be unable to access the cluster.

    Add the impala user to this list to allow Impala to query data in Kudu. You might choose to add any other relevant usernames if you want to give access to Spark Streaming jobs.

  6. Click Save Changes.

Configuring HTTPS Encryption for the Kudu Master and Tablet Server Web UIs

Use the following steps to enable HTTPS for encrypted connections to the Kudu master and tablet server web UIs.
  1. Go to the Kudu service.
  2. Click the Configuration tab.
  3. Select Category > Security.
  4. In the Search field, type TLS/SSL to show the relevant properties.
  5. Edit the following properties according to your cluster configuration:
    Field Usage Notes
    Master TLS/SSL Server Private Key File (PEM Format)

    Set to the path containing the Kudu master host's private key (PEM-format). This is used to enable TLS/SSL encryption (over HTTPS) for browser-based connections to the Kudu master web UI.

    Tablet Server TLS/SSL Server Private Key File (PEM Format)

    Set to the path containing the Kudu tablet server host's private key (PEM-format). This is used to enable TLS/SSL encryption (over HTTPS) for browser-based connections to Kudu tablet server web UIs.

    Master TLS/SSL Server Certificate File (PEM Format)

    Set to the path containing the signed certificate (PEM-format) for the Kudu master host's private key (set in Master TLS/SSL Server Private Key File). The certificate file can be created by concatenating all the appropriate root and intermediate certificates required to verify trust.

    Tablet Server TLS/SSL Server Certificate File (PEM Format)

    Set to the path containing the signed certificate (PEM-format) for the Kudu tablet server host's private key (set in Tablet Server TLS/SSL Server Private Key File). The certificate file can be created by concatenating all the appropriate root and intermediate certificates required to verify trust.

    Enable TLS/SSL for Master Server Enables HTTPS encryption on the Kudu master web UI.
    Enable TLS/SSL for Tablet Server Enables HTTPS encryption on the Kudu tablet server Web UIs.
  6. Click Save Changes.

Enabling Sentry Authorization

To enable Kudu’s integration with Sentry:

  1. First, ensure that Kudu has been configured to synchronize its catalog with the Hive Metastore. See the steps described in Enabling the Hive Metastore Integration.
  2. Go to the Kudu service.
  3. Click the Configuration tab.
  4. Select the Sentry Service with which Kudu should authorize requests.

Configuring a Secure Kudu Cluster using the Command Line

The following configuration parameters should be set on all servers (master and tablet servers) to ensure that a Kudu cluster is secure:

# Connection Security
#--------------------
--rpc_authentication=required
--rpc_encryption=required
--keytab_file=<path-to-kerberos-keytab>

# Web UI Security
#--------------------
--webserver_certificate_file=<path-to-cert-pem>
--webserver_private_key_file=<path-to-key-pem>
# optional
--webserver_private_key_password_cmd=<password-cmd>

# If you prefer to disable the web UI entirely:
--webserver_enabled=false

# Coarse-grained authorization
#--------------------------------

# This example ACL setup allows the 'impala' user as well as the
# 'etl_service_account' principal access to all data in the
# Kudu cluster. The 'hadoopadmin' user is allowed to use administrative
# tooling. Note that by granting access to 'impala', other users
# may access data in Kudu via the Impala service subject to its own
# authorization rules.
--user_acl=impala,etl_service_account
--admin_acl=hadoopadmin

More information about these flags can be found in the configuration reference documentation.

See Configuring the Integration with Apache Sentry to see an example of how to enable fine-grained authorization via Apache Sentry.