CDH 6 includes Apache Kafka as part of the core package. The documentation includes improved contents for how to set up, install, and administer your Kafka ecosystem. For more information, see the Cloudera Enterprise 6.0.x Apache Kafka Guide. We look forward to your feedback on both the existing and new documentation.

Configuring Apache Kafka Security

This topic describes additional steps you can take to ensure the safety and integrity of your data stored in Apache Kafka, with features available in CDK 2.0.0 and higher Powered By Apache Kafka:

Deploying SSL for Kafka

Kafka allows clients to connect over SSL. By default, SSL is disabled, but can be turned on as needed.

Step 1. Generating Keys and Certificates for Kafka Brokers

First, generate the key and the certificate for each machine in the cluster using the Java keytool utility. See Creating Certificates.

keystore is the keystore file that stores your certificate. validity is the valid time of the certificate in days.

$ keytool -keystore {tmp.server.keystore.jks} -alias localhost -validity {validity} -genkey      

Make sure that the common name (CN) matches the fully qualified domain name (FQDN) of your server. The client compares the CN with the DNS domain name to ensure that it is connecting to the correct server.

Step 2. Creating Your Own Certificate Authority

You have generated a public-private key pair for each machine, and a certificate to identify the machine. However, the certificate is unsigned, so an attacker can create a certificate and pretend to be any machine. Sign certificates for each machine in the cluster to prevent unauthorized access.

A Certificate Authority (CA) is responsible for signing certificates. A CA is similar to a government that issues passports. A government stamps (signs) each passport so that the passport becomes difficult to forge. Similarly, the CA signs the certificates, and the cryptography guarantees that a signed certificate is computationally difficult to forge. If the CA is a genuine and trusted authority, the clients have high assurance that they are connecting to the authentic machines.
openssl req -new -x509 -keyout ca-key -out ca-cert -days 365
The generated CA is a public-private key pair and certificate used to sign other certificates.
Add the generated CA to the client truststores so that clients can trust this CA:
keytool -keystore {client.truststore.jks} -alias CARoot -import -file {ca-cert}
The keystore created in step 1 stores each machine’s own identity. In contrast, the truststore of a client stores all the certificates that the client should trust. Importing a certificate into a truststore means trusting all certificates that are signed by that certificate. This attribute is called the chain of trust. It is particularly useful when deploying SSL on a large Kafka cluster. You can sign all certificates in the cluster with a single CA, and have all machines share the same truststore that trusts the CA. That way, all machines can authenticate all other machines.

Step 3. Signing the certificate

Now you can sign all certificates generated by step 1 with the CA generated in step 2.
  1. Export the certificate from the keystore:
    keytool -keystore server.keystore.jks -alias localhost -certreq -file cert-file
  2. Sign it with the CA:
    openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed -days {validity} -CAcreateserial -passin pass:{ca-password}
  3. Import both the certificate of the CA and the signed certificate into the keystore:
    keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert keytool -keystore server.keystore.jks -alias localhost -import -file cert-signed
    The definitions of the variables are as follows:
    • keystore: the location of the keystore
    • ca-cert: the certificate of the CA
    • ca-key: the private key of the CA
    • ca-password: the passphrase of the CA
    • cert-file: the exported, unsigned certificate of the server
    • cert-signed: the signed certificate of the server
The following Bash script demonstrates the steps described above. One of the commands assumes a password of test1234, so either use that password or edit the command before running it.
#!/bin/bash
#Step 1
keytool -keystore server.keystore.jks -alias localhost -validity 365 -genkey
#Step 2
openssl req -new -x509 -keyout ca-key -out ca-cert -days 365
keytool -keystore server.truststore.jks -alias CARoot -import -file ca-cert
keytool -keystore client.truststore.jks -alias CARoot -import -file ca-cert
#Step 3
keytool -keystore server.keystore.jks -alias localhost -certreq -file cert-file
openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed -days 365 -CAcreateserial -passin pass:test1234
keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert
keytool -keystore server.keystore.jks -alias localhost -import -file cert-signed  

Step 4. Configuring Kafka Brokers

Kafka Brokers support listening for connections on multiple ports. If SSL is enabled for inter-broker communication (see below for how to enable it), both PLAINTEXT and SSL ports are required.

To configure the listeners from Cloudera Manager, perform the following steps:
  1. In Cloudera Manager, click Kafka > Instances, and then click on "Kafka Broker" > Configurations > Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties. Enter the following information:
    listeners=PLAINTEXT://<kafka-broker-host-name>:9092,SSL://<kafka-broker-host-name>:9093
    advertised.listeners=PLAINTEXT://<kafka-broker-host-name>:9092,SSL://<kafka-broker-host-name>:9093

    where kafka-broker-host-name is the FQDN of the broker that you selected from the Instances page Cloudera Manager. In the above sample configurations we used PLAINTEXT and SSL protocols for the SSL enabled brokers. For information about other supported security protocols, see Using Kafka Supported Protocols

  2. Repeat the above step for all the brokers. The advertised.listeners configuration above is needed to connect the brokers from external clients.
  3. Deploy the above client configurations and rolling restart the Kafka service from Cloudera Manager.
Kafka CSD auto-generates listeners for Kafka brokers, depending on your SSL and Kerberos configuration. To enable SSL for Kafka installations, do the following:
  1. Turn on SSL for the Kafka service by turning on the ssl_enabled configuration for the Kafka CSD.
  2. Set security.inter.broker.protocol as SSL, if Kerberos is disabled; otherwise, set it as SASL_SSL.
The following SSL configurations are required on each broker. Each of these values can be set in Cloudera Manager. See Modifying Configuration Properties Using Cloudera Manager:
ssl.keystore.location=/var/private/ssl/kafka.server.keystore.jks
ssl.keystore.password=test1234
ssl.key.password=test1234
ssl.truststore.location=/var/private/ssl/kafka.server.truststore.jks
ssl.truststore.password=test1234

Other configuration settings might also be needed, depending on your requirements:

  • ssl.client.auth=none: Other options for client authentication are required, or requested, where clients without certificates can still connect. The use of requested is discouraged, as it provides a false sense of security and misconfigured clients can still connect.
  • ssl.cipher.suites: A cipher suite is a named combination of authentication, encryption, MAC, and a key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. This list is empty by default.
  • ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1: Provide a list of SSL protocols that your brokers accept from clients.
  • ssl.keystore.type=JKS
  • ssl.truststore.type=JKS
To enable SSL for inter-broker communication, add the following line to the broker properties file. The default value is PLAINTEXT. See Using Kafka Supported Protocols.
security.inter.broker.protocol=SSL
Due to import regulations in some countries, the Oracle implementation limits the strength of cryptographic algorithms available by default. If you need stronger algorithms (for example, AES with 256-bit keys), you must obtain the JCE Unlimited Strength Jurisdiction Policy Files and install them in the JDK/JRE. For more information, see the JCA Providers Documentation.
Once you start the broker, you should see the following message in the server.log:
with addresses: PLAINTEXT -> EndPoint(192.168.64.1,9092,PLAINTEXT),SSL -> EndPoint(192.168.64.1,9093,SSL)
To check whether the server keystore and truststore are set up properly, run the following command:
openssl s_client -debug -connect localhost:9093 -tls1
In the output of this command, you should see the server certificate:
-----BEGIN CERTIFICATE-----
{variable sized random bytes}
-----END CERTIFICATE-----
subject=/C=US/ST=CA/L=Santa Clara/O=org/OU=org/CN=John Smith
issuer=/C=US/ST=CA/L=Santa Clara/O=org/OU=org/CN=kafka/emailAddress=test@test.com
If the certificate does not appear, or if there are any other error messages, your keystore is not set up properly.

Step 5. Configuring Kafka Clients

SSL is supported only for the new Kafka Producer and Consumer APIs. The configurations for SSL are the same for both the producer and consumer.

If client authentication is not required in the broker, the following shows a minimal configuration example:

security.protocol=SSL
ssl.truststore.location=/var/private/ssl/kafka.client.truststore.jks
ssl.truststore.password=test1234

If client authentication is required, a keystore must be created as in step 1, and you must also configure the following properties:

ssl.keystore.location=/var/private/ssl/kafka.client.keystore.jks
ssl.keystore.password=test1234
ssl.key.password=test1234
Other configuration settings might also be needed, depending on your requirements and the broker configuration:
  • ssl.provider (Optional). The name of the security provider used for SSL connections. Default is the default security provider of the JVM.
  • ssl.cipher.suites (Optional). A cipher suite is a named combination of authentication, encryption, MAC, and a key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol.
  • ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1. This property should list at least one of the protocols configured on the broker side
  • ssl.truststore.type=JKS
  • ssl.keystore.type=JKS

Using Kafka Supported Protocols

Kafka can expose multiple communication endpoints, each supporting a different protocol. Supporting multiple communication endpoints enables you to use different communication protocols for client-to-broker communications and broker-to-broker communications. Set the Kafka inter-broker communication protocol using the security.inter.broker.protocol property. Use this property primarily for the following scenarios:
  • Enabling SSL encryption for client-broker communication but keeping broker-broker communication as PLAINTEXT. Because SSL has performance overhead, you might want to keep inter-broker communication as PLAINTEXT if your Kafka brokers are behind a firewall and not susceptible to network snooping.
  • Migrating from a non-secure Kafka configuration to a secure Kafka configuration without requiring downtime. Use a rolling restart and keep security.inter.broker.protocol set to a protocol that is supported by all brokers until all brokers are updated to support the new protocol.
    For example, if you have a Kafka cluster that needs to be configured to enable Kerberos without downtime, follow these steps:
    1. Set security.inter.broker.protocol to PLAINTEXT.
    2. Update the Kafka service configuration to enable Kerberos.
    3. Perform a rolling restart.
    4. Set security.inter.broker.protocol to SASL_PLAINTEXT.
CDK 2.0 and higher Powered By Apache Kafka supports the following combinations of protocols.
  SSL Kerberos
PLAINTEXT No No
SSL Yes No
SASL_PLAINTEXT No Yes
SASL_SSL Yes Yes
These protocols can be defined for broker-to-client interaction and for broker-to-broker interaction. security.inter.broker.protocol allows the broker-to-broker communication protocol to be different than the broker-to-client protocol. It was added to ease the upgrade from non-secure to secure clusters while allowing rolling upgrades.

In most cases, set security.inter.broker.protocol to the protocol you are using for broker-to-client communication. Set security.inter.broker.protocol to a protocol different than the broker-to-client protocol only when you are performing a rolling upgrade from a non-secure to a secure Kafka cluster.

Enabling Kerberos Authentication

CDK 2.0 and higher Powered By Apache Kafka supports Kerberos authentication, but it is supported only for the new Kafka Producer and Consumer APIs. If you already have a Kerberos server, you can add Kafka to your current configuration. If you do not have a Kerberos server, install it before proceeding. See Enabling Kerberos Authentication Using the Wizard.

If you already have configured the mapping from Kerberos principals to short names using the hadoop.security.auth_to_local HDFS configuration property, configure the same rules for Kafka by adding the sasl.kerberos.principal.to.local.rules property to the Advanced Configuration Snippet for Kafka Broker Advanced Configuration Snippet using Cloudera Manager. Specify the rules as a comma separated list.

To enable Kerberos authentication for Kafka:

  1. From Cloudera Manager, navigate to Kafka > Configurations. Set SSL client authentication to none. Set Inter Broker Protocol to SASL_PLAINTEXT.
  2. Click Save Changes.
  3. Restart the Kafka service.
  4. Make sure that listeners = SASL_PLAINTEXT is present in the Kafka broker logs /var/log/kafka/server.log.
  5. Create a jaas.conf file with the following contents to use with cached Kerberos credentials (you can modify this to use keytab files instead of cached credentials. To generate keytabs, see Step 6: Get or Create a Kerberos Principal for Each User Account).

    If you use kinit first, use this configuration.

    KafkaClient {
    com.sun.security.auth.module.Krb5LoginModule required
    useTicketCache=true;
    };
    If you use keytab, use this configuration:
    KafkaClient {
    com.sun.security.auth.module.Krb5LoginModule required
    useKeyTab=true
    keyTab="/etc/security/keytabs/kafka_server.keytab"
    principal="kafka/kafka1.hostname.com@EXAMPLE.COM";
    };
  6. Create the client.properties file containing the following properties.
    security.protocol=SASL_PLAINTEXT
    sasl.kerberos.service.name=kafka
  7. Test with the Kafka console producer and consumer. To obtain a Kerberos ticket-granting ticket (TGT):
    $ kinit <user>
  8. Verify that your topic exists. (This does not use security features, but it is a best practice.)
    $ kafka-topics --list --zookeeper <zkhost>:2181
  9. Verify that the jaas.conf file is used by setting the environment.
    $ export KAFKA_OPTS="-Djava.security.auth.login.config=/home/user/jaas.conf"
  10. Run a Kafka console producer.
    $ kafka-console-producer --broker-list <anybroker>:9092 --topic test1 
    --producer.config client.properties
  11. Run a Kafka console consumer.
    $ kafka-console-consumer --new-consumer --topic test1 --from-beginning 
    --bootstrap-server <anybroker>:9092 --consumer.config client.properties

Enabling Encryption at Rest

Data encryption is increasingly recognized as an optimal method for protecting data at rest.

Perform the following steps to encrypt Kafka data that is not in active use.
  1. Stop the Kafka service.
  2. Archive the Kafka data to an alternate location, using TAR or another archive tool.
  3. Unmount the affected drives.
  4. Install and configure Navigator Encrypt.
  5. Expand the TAR archive into the encrypted directories.

Using Kafka with Sentry Authorization

Starting with CDK 2.1.x on CDH 5.9.x and higher Powered By Apache Kafka, Apache Sentry includes Kafka binding you can use to enable authorization in Kafka with Sentry. For more information, see Authorization With Apache Sentry.

Configuring Kafka to Use Sentry Authorization

The following steps describe how to configure Kafka to use Sentry authorization. These steps assume you have installed Kafka and Sentry on your cluster.

Sentry requires that your cluster include HDFS. After you install and start Sentry with the correct configuration, you can stop the HDFS service.

For more information, see Installing or Upgrading CDK Powered By Apache Kafka® and Installing and Upgrading the Sentry Service.

To configure Sentry authentication for Kafka:

  1. Go to Kafka > Configuration.
  2. Select the checkbox Enable Kerberos Authentication.
  3. Select a Sentry service in the Kafka service configuration.
  4. Add Super users. Super users can perform any action on any resource in the Kafka cluster. The kafka user is added as a super user by default. Super user requests are authorized without going through Sentry, which provides enhanced performance.
  5. Select the checkbox Enable Sentry Privileges Caching to enhance performance.

Authorizable Resources

Authorizable resources are resources or entities in a Kafka cluster that require special permissions for a user to be able to perform actions on them. Kafka has four authorizable resources.

  • Cluster, which controls who can perform cluster-level operations such as creating or deleting a topic. This can only have one value, kafka-cluster, as one Kafka cluster cannot have more than one cluster resource.
  • Topic, which controls who can perform topic-level operations such as producing and consuming topics. Its value must match exactly the topic name in the Kafka cluster. With CDK 3.1.0 and CDH 5.14.2 and later, wildcards (*) can be used to refer to any topic in the privilege.
  • Consumergroup, which controls who can perform consumergroup-level operations such as joining or describing a consumergroup. Its value must exactly match the group.id of a consumergroup. With CDK 3.1.0 and CDH 5.14.2 and later, you can use a wildcard (*) to refer to any consumer groups in the privilege. This is useful when used with Spark Streaming, where a generated group.id may be needed.

  • Host, which controls from where specific operations can be performed. Think of this as a way to achieve IP filtering in Kafka. You can set the value of this resource to the wildcard (*), which represents all hosts.

Authorized Actions

You can perform multiple actions on each resource. The following operations are supported by Kafka, though not all actions are valid on all resources.

  • ALL, this is a wildcard action, and represents all possible actions on a resource.
  • read
  • write
  • create
  • delete
  • alter
  • describe
  • clusteraction

Authorizing Privileges

Privileges define what actions are allowed on a resource. A privilege is represented as a string in Sentry. The following rules apply to a valid privilege.

  • Can have at most one Host resource. If you do not specify a Host resource in your privilege string, Host=* is assumed.
  • Must have exactly one non-Host resource.
  • Must have exactly one action specified at the end of the privilege string.

For example, the following are valid privilege strings:

Host=*->Topic=myTopic->action=ALL
Topic=test->action=ALL

Granting Privileges to a Role

The following examples grant privileges to the role test, so that users in testGroup can create a topic named testTopic and produce to it.

The user executing these commands must be added to the Sentry parameter sentry.service.allow.connect and also be a member of a group defined in sentry.service.admin.group.

Before you can assign the test role, you must first create it. To create the test role:

$kafka-sentry -cr -r test

To confirm that the role was created, list the roles:

$ kafka-sentry -lr

If Sentry privileges caching is enabled, as recommended, the new privileges you assign take some time to appear in the system. The time is the time-to-live interval of the Sentry privileges cache, which is set using sentry.kafka.caching.ttl.ms. By default, this interval is set to 30 seconds. For test clusters, it is beneficial to have changes appear within the system as fast as possible, therefore, Cloudera recommends that you either use a lower time interval, or disable caching with sentry.kafka.caching.enable.

  • Allow users in testGroup to write to testTopic from localhost, which allows users to produce to testTopic. They need both write and describe permissions.
    $ kafka-sentry -gpr -r test -p "Host=127.0.0.1->Topic=testTopic->action=write"
    $ kafka-sentry -gpr -r test -p "Host=127.0.0.1->Topic=testTopic->action=describe"
  • Assign the test role to the group testGroup:
    kafka-sentry -arg -r test -g testGroup
  • Verify that the test role is part of the group testGroup:
    kafka-sentry -lr -g testGroup
  • Create testTopic.
    $ kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 \
      --partitions 1 --topic testTopic
    $ kafka-topics --list --zookeeper localhost:2181 testTopic
  • Produce to testTopic. Note that you have to pass a configuration file, producer.properties, with information on JAAS configuration and other Kerberos authentication related information. See SASL Configuration for Kafka Clients.
    $ kafka-console-producer --broker-list localhost:9092 --topic testTopic \
      --producer.config producer.properties
    This is a message
    This is another message
  • Grant the create privilege to the test role.
    $ kafka-sentry -gpr -r test -p "Host=127.0.0.1->Cluster=kafka-cluster->action=create"   
  • Allow users in testGroup to describe testTopic from localhost, which the user creates and uses.
    $ kafka-sentry -gpr -r test -p "Host=127.0.0.1->Topic=testTopic->action=describe"
  • Grant the describe privilege to the test role.
    $ kafka-sentry -gpr -r test -p "Host=127.0.0.1->Consumergroup=testconsumergroup->action=describe"
  • Allow users in testGroup to read from a consumer group, testconsumergroup, that it will start and join.
    $ kafka-sentry -gpr -r test -p "Host=127.0.0.1->Consumergroup=testconsumergroup->action=read"
  • Allow users in testGroup to read from testTopic from localhost and to consume from testTopic.
    $ kafka-sentry -gpr -r test -p "Host=127.0.0.1->Topic=testTopic->action=read"
  • Consume from testTopic. Note that you have to pass a configuration file, consumer.properties, with information on JAAS configuration and other Kerberos authentication related information. The configuration file must also specify group.id as testconsumergroup.
    kafka-console-consumer --new-consumer --topic test1 --from-beginning --bootstrap-server <anybroker>:9092 --consumer.config consumer.properties
    This is a message
    This is another message

Troubleshooting

If Kafka requests are failing due to authorization, the following steps can provide insight into the error:

  • Make sure you are kinit'd as a user who has privileges to perform an operation.
  • Identify which broker is hosting leader of the partition you are trying to produce to or consume from, as this leader is going to authorize your request against Sentry. One easy way of debugging is to just have one Kafka broker. Change log level of the Kafka broker to debug and restart the broker.
  • Run the Kafka client or Kafka CLI with required arguments and capture the Kafka log, which should be something like /var/log/kafka/kafka-broker-<HOST_ID>.log on kafka broker's host.
  • There will be many Jetty logs, and filtering that out usually helps in reducing noise. Look for log messages from org.apache.sentry.
  • Look for following information in the filtered logs:
    • Groups the user Kafka client or CLI is running as.
    • Required privileges for the operation.
    • Retrieved privileges from Sentry.
    • Required and retrieved privileges comparison result.

This log information can provide insight into which privilege is not assigned to a user, causing a particular operation to fail.