Installing or Upgrading Kafka

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

Kafka is distributed as a parcel, separate from the CDH parcel. It is also distributed as a package. The steps to install Kafka vary, depending on whether you choose to install from a parcel or a package.

General Information Regarding Installation and Upgrade

Cloudera Manager 5.4 or higher includes the Kafka service. To install, download Kafka using Cloudera Manager, distribute Kafka to the cluster, activate the new parcel, and add the service to the cluster. For a list of available parcels and packages, see Cloudera Distribution of Apache Kafka Version and Packaging Information

Colocation of Kafka and ZooKeeper services on the same host is possible. However, for optimal performance, Cloudera recommends the usage of dedicated hosts. This is especially true for larger, production environments.

Graceful Shutdown of Kafka Brokers

If the Kafka brokers do not shut down gracefully, subsequent restarts may take longer than expected. This can happen when the brokers take longer than 30 seconds to clear their backlog while stopping the Kafka service, stopping the Kafka Broker role, or stopping a cluster where the Kafka service is running. The Kafka brokers are also shut down as part of performing an upgrade. There are two configuration properties you can set to control whether Cloudera Manager waits for the brokers to shut down gracefully:
Kafka Shutdown Properties
Property Description Default Value
Enable Controlled Shutdown Enables controlled shutdown of the broker. If enabled, the broker moves all leaders on it to other brokers before shutting itself down. This reduces the unavailability window during shutdown. Enabled
Graceful Shutdown Timeout The timeout in milliseconds to wait for graceful shutdown to complete. 30000 milliseconds

(30 seconds)

To configure these properties, go to Clusters > Kafka Service > Configuration and search for "shutdown".

If Kafka is taking a long time for controlled shutdown to complete, consider increasing the value of Graceful Shutdown Timeout. Once this timeout is reached, Cloudera Manager issues a forced shutdown, which interrupts the controlled shutdown and could cause subsequent restarts to take longer than expected.

Disks and Filesystem

Cloudera recommends that you use multiple drives to get good throughput. To ensure good latency, do not share the same drives used for Kafka data with application logs or other OS filesystem activity. You can either use RAID to combine these drives into a single volume, or format and mount each drive as its own directory. Since Kafka has replication, RAID can also provide redundancy at the application level. This choice has several tradeoffs.

If you configure multiple data directories, partitions are assigned round-robin to data directories. Each partition is stored entirely in one of the data directories. This can lead to load imbalance between disks if data is not well balanced among partitions.

RAID can potentially do a better job of balancing load between disks because it balances load at a lower level. The primary downside of RAID is that it is usually a big performance hit for write throughput, and it reduces the available disk space.

Another potential benefit of RAID is the ability to tolerate disk failures. However, rebuilding the RAID array is so I/O intensive that it can effectively disable the server, so this does not provide much improvement in availability.

The following table summarizes these pros and cons for RAID10 versus JBOD.

RAID10 JBOD
Can survive single disk failure Single disk failure kills the broker
Single log directory More available disk space
Lower total I/O Higher write throughput
  Broker is not smart about balancing partitions across disk.

Installing or Upgrading Kafka from a Parcel

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

  1. In Cloudera Manager, select Hosts > Parcels.
  2. If you do not see Kafka in the list of parcels, you can add the parcel to the list.
    1. Find the parcel for the version of Kafka you want to use on Cloudera Distribution of Apache Kafka Versions.
    2. Copy the parcel repository link.
    3. On the Cloudera Manager Parcels page, click Configuration.
    4. In the field Remote Parcel Repository URLs, click + next to an existing parcel URL to add a new field.
    5. Paste the parcel repository link.
    6. Save your changes.
  3. On the Cloudera Manager Parcels page, download the Kafka parcel, distribute the parcel to the hosts in your cluster, and then activate the parcel. See Managing Parcels. After you activate the Kafka parcel, Cloudera Manager prompts you to restart the cluster. You do not need to restart the cluster after installing Kafka. Click Close to ignore this prompt.
  4. Add the Kafka service to your cluster. See Adding a Service.

Installing or Upgrading Kafka from a Package

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

You install the Kafka package from the command line.

  1. Navigate to the /etc/repos.d directory.
  2. Use wget to download the Kafka repository. See Cloudera Distribution of Apache Kafka Version and Packaging Information.
  3. Install Kafka using the appropriate commands for your operating system.
    Kafka Installation Commands
    Operating System Commands
    RHEL-compatible
    $ sudo yum clean all
    $ sudo yum install kafka
    $ sudo yum install kafka-server
                                        
    SLES
    $ sudo zypper clean --all
    $ sudo zypper install kafka
    $ sudo zypper install kafka-server
                                        
    Ubuntu or Debian
    $ sudo apt-get update
    $ sudo apt-get install kafka
    $ sudo apt-get install kafka-server
                                
  4. Edit /etc/kafka/conf/server.properties to ensure that the broker.id is unique for each node and broker in Kafka cluster, and zookeeper.connect points to same ZooKeeper for all nodes and brokers.
  5. Start the Kafka server with the following command:

    $ sudo service kafka-server start.

To verify all nodes are correctly registered to the same ZooKeeper, connect to ZooKeeper using zookeeper-client.

$ zookeeper-client
$ ls /brokers/ids

You should see all of the IDs for the brokers you have registered in your Kafka cluster.

To discover to which node a particular ID is assigned, use the following command:

$ get /brokers/ids/<ID>

This command returns the host name of node assigned the ID you specify.

Special Considerations When Upgrading from Kafka 1.x to Kafka 2.x

If you upgrade to Kafka 2.0, Cloudera recommends taking the cluster offline because it is a major upgrade with incompatible protocol changes. The upgrade steps are the same even if a cluster is offline.

If taking the cluster offline is not an option, use the following steps to perform a rolling upgrade:
  1. In Cloudera Manager, go to the Kafka Configuration page and add inter.broker.protocol.version=0.8.2.X to the Kafka Advanced Configuration Snippet (Safety Valve). See Custom Configuration.
  2. Upgrade your parcel or package as described in the steps above.
  3. Perform a rolling restart.
  4. After the entire is upgraded and restarted, remove the property you added in step 1.
  5. To have the new protocol take effect, perform another rolling restart.
Upgrade Considerations
  • Always upgrade your Kafka cluster before upgrading your clients.
  • If using MirrorMaker, upgrade your downstream Kafka clusters first. Otherwise, incompatible messages might be sent downstream.

Special Considerations When Upgrading to Kafka 2.1.x

You must upgrade your Kafka 2.0.x brokers to Kafka 2.1.x before you upgrade your Kafka 2.0.x clients to Kafka 2.1.x.