CDH 6 includes Apache Kafka as part of the core package. The documentation includes improved contents for how to set up, install, and administer your Kafka ecosystem. For more information, see the Cloudera Enterprise 6.0.x Apache Kafka Guide. We look forward to your feedback on both the existing and new documentation.

What's New in CDK Powered By Apache Kafka?

New Features in CDK 4.1.0 Powered By Apache Kafka

  • Rebase on Kafka 2.2.1

    CDK 4.1.0 Powered By Apache Kafka is a minor release based on Apache Kafka 2.2.1. For upstream release notes, see Apache Kafka version 2.2.0 and 2.2.1 release notes.

  • Kafka Topics Tool Able to Connect Directly to Brokers

    The kafka-topics command line tool is now able to connect directly to brokers with the --bootstrap-server option instead of zookeeper. The old --zookeeper option is still available for now. For more information, see KIP-377.

New Features in CDK 4.0.0 Powered By Apache Kafka

  • Rebase on Kafka 2.1.0

    CDK 4.0.0 Powered By Apache Kafka is a major release based on Apache Kafka 2.1.0. For upstream release notes, see Apache Kafka version 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1 and 2.1.0 release notes.

  • JBOD Support
    As of CDK 4.0.0, Cloudera officially supports Kafka clusters with nodes using JBOD configurations. JBOD support introduces a new command line tool and improves an existing tool:
    • A new tool, kafka-log-dirs, is added. The tool allows users to query partition assignment information.
    • The kafka-reassign-partitions tool is expanded with a new functionality that allows users to reassign partitions between log directories. Users can move partitions to a different log directory on the same broker as well as to log directories on other brokers.
  • Kafka Streams
    Starting with CDK 4.0.0, Cloudera officially supports Kafka Streams. You can access the Apache Kafka website for information about how to use Kafka Streams.
  • Exactly Once Semantics
    Starting with CDK 4.0.0, Cloudera officially supports idempotent and transactional capabilities in the producer.This feature ensures that messages are delivered exactly once to a particular topic partition during the lifetime of a single producer.

New Features in CDK 3.1.0 Powered By Apache Kafka

  • Rebase on Kafka 1.0.1

    CDK 3.1.0 Powered By Apache Kafka is a minor release based on Apache Kafka 1.0.1.

    For upstream release notes, see Apache Kafka version 1.0.0 and 1.0.1 release notes.

  • Kafka uses HA-capable Sentry client

    This functionality enables automatic failover in the event that the primary Sentry host goes down or is unavailable.

  • Wildcard usage for Kafka-Sentry components

    You can specify an asterisk (*) in a Kafa-Sentry command for the TOPIC component of a privilege to refer to any topic in the privilege. Supported with CDH 5.14.2.

    You can also use an asterisk (*) in a Kafka-Sentry command for the CONSUMERGROUPS component of a privilege to refer to any consumer groups in the privilege. This is useful when used with Spark Streaming, where a generated group.id may be needed. Supported with CDH 5.14.2.

  • Health Tests in Cloudera Manager

    Two new Kafka Broker Health Tests have been added to Cloudera Manager:

    • Kafka Broker Swap Memory Usage
    • Kafka Broker Unexpected Exits

    These health tests are available when Kafka is managed by Cloudera Manager version 5.14 and later. For details, see Kafka Broker Health Tests.

New Features in CDK 3.0.0 Powered By Apache Kafka

  • Rebase on Kafka 0.11.0.0

    CDK 3.0.0 Powered By Apache Kafka is a major release based on Apache Kafka 0.11.0.0. See https://archive.apache.org/dist/kafka/0.11.0.0/RELEASE_NOTES.html.

  • Health test for offline and lagging partitions

    New health tests set the controller broker's health to BAD if the broker hosts at least one offline partition and the leader broker's health to CONCERNING if it hosts any lagging partitions. Supported with Cloudera Manager 5.14.0.

New Features in CDK 2.2.0 Powered By Apache Kafka

New Features in CDK 2.1.0 Powered By Apache Kafka

New Features in Cloudera Distribution CDK 2.0.0 Powered By Apache Kafka

  • Rebase on Kafka 0.9

    CDK 2.0.0 Powered By Apache Kafka is rebased on Apache Kafka 0.9. See https://archive.apache.org/dist/kafka/0.9.0.0/RELEASE_NOTES.html.

  • Kerberos

    CDK 2.0.0 Powered By Apache Kafka supports Kerberos authentication of connections from clients and other brokers, including to ZooKeeper.

  • SSL

    CDK 2.0.0 Powered By Apache Kafka supports wire encryption of communications from clients and other brokers using SSL.

  • New Consumer API

    CDK 2.0.0 Powered By Apache Kafka includes a new Java API for consumers.

  • MirrorMaker

    MirrorMaker is enhanced to help prevent data loss and improve reliability of cross-data center replication.

  • Quotas

    You can use per-user quotas to throttle producer and consumer throughput in a multitenant cluster. See Quotas.

New Features in CDK 1.4.0 Powered By Apache Kafka

New Features in CDK 1.3.2 Powered By Apache Kafka

New features in CDK 1.3.0 Powered By Apache Kafka

  • Metrics Reporter

    Cloudera Manager now displays Kafka metrics. Use the values to identify current performance issues and plan enhancements to handle anticipated changes in workload. See Viewing Apache Kafka Metrics.

  • MirrorMaker configuration

    Cloudera Manager allows you to configure the Kafka MirrorMaker cross-cluster replication service. You can add a MirrorMaker role and use it to replicate to a machine in another cluster. See Kafka MirrorMaker.

New Features in CDK 1.1.0 Powered By Apache Kafka

  • New producer

    The producer added in CDK 1.1.0 Powered By Apache Kafka combines features of the existing synchronous and asynchronous producers. Send requests are batched, allowing the new producer to perform as well as the asynchronous producer under load. Every send request returns a response object that can be used to retrieve status and exceptions.

  • Ability to delete topics

    You can now delete topics using the kafka-topics --delete command.

  • Offset management

    In previous versions, consumers that wanted to keep track of which messages were consumed did so by updating the offset of the last consumed message in ZooKeeper. With this new feature, Kafka itself tracks the offsets. Using offset management can significantly improve consumer performance.

  • Automatic leader rebalancing

    Each partition starts with a randomly selected leader replica that handles requests for that partition. When a cluster first starts, the leaders are evenly balanced among hosts. When a broker restarts, leaders from that broker are distributed to other brokers, which results in an unbalanced distribution. With this feature enabled, leaders are assigned to the original replica after a restart.

  • Connection quotas

    Kafka administrators can limit the number of connections allowed from a single IP address. By default, this limit is 10 connections per IP address. This prevents misconfigured or malicious clients from destabilizing a Kafka broker by opening a large number of connections and using all available file handles.