Managing Topics across Multiple Kafka Clusters

You may have more than one Kafka cluster to support:
  • Geographic distribution
  • Disaster recovery
  • Organizational requirements

You can distribute messages across multiple clusters. It can be handy to have a copy of one or more topics from other Kafka clusters available to a client on one cluster. Mirror Maker is a tool that comes bundled with Kafka to help automate the process of mirroring or publishing messages from one cluster to another. "Mirroring" occurs between clusters where "replication" distributes message within a cluster.

Mirror Maker Makes Topics Available on Multiple Clusters

While the diagram shows copying to one topic, Mirror Maker’s main mode of operation is running continuously, copying one or more topics from the source cluster to the destination cluster.

Keep in mind the following design notes when configuring Mirror Maker:

  • Mirror Maker runs as a single process.
  • Mirror Maker can run with multiple consumers that read from multiple partitions in the source cluster.
  • Mirror Maker uses a single producer to copy messages to the matching topic in the destination cluster.

Consumer/Producer Compatibility

The Mirror Maker consumer needs to be client compatible with the source cluster. The Mirror Maker producer needs to be client compatible with the destination cluster.

See Client/Broker Compatibility Across Kafka Versions for more details about what it means to be "compatible."

Topic Differences between Clusters

Because messages are copied from the source cluster to the destination cluster—potentially through many consumers funneling into a single producer—there is no guarantee of having identical offsets or timestamps between the two clusters. In addition, as these copies occur over the network, there can be some mismatching due to retries or dropped messages.

Optimize Mirror Maker Producer Location

Because Mirror Maker uses a single producer and since producers typically have more difficulty with high latency and/or unreliable connections, it is better to have the producer run “closer” to the destination cluster, meaning in the same data center or on the same rack.

Destination Cluster Configuration

Before starting Mirror Maker, make sure that the destination cluster is configured correctly:
  • Make sure there is sufficient disk space to copy the topic from the source cluster to the destination cluster.
  • Make sure the topic exists in the destination cluster or use the kafka-configs command to set the property auto.create.topics.enable=true. See Command Line Tools.

Kerberos and Mirror Maker

As mentioned earlier, Mirror Maker runs as a single process. The resulting consumers and producers rely on a single configuration setup. Mirror Maker requires that the source cluster and the destination cluster belong to the same Kerberos realm.

Setting up Mirror Maker in Cloudera Manager

Where Cloudera Manager is managing the destination cluster:

  1. In Cloudera Manager, select the Kafka service.
  2. Choose Action > Add Role Instances.
  3. Under Kafka Mirror Maker, click Select hosts.
  4. Select the host where Mirror Maker will run and click Continue.
  5. Fill in the Destination Broker List and Source Broker List with your source and destination Kafka clusters.

    Use host name, IP address, or fully qualified domain name.

  6. Fill out the Topic Whitelist.

    The whitelist is required.

  7. Fill out the TLS/SSL sections if security needs to be enabled.
  8. Start the Mirror Maker instance.

Settings to Avoid Data Loss

The Avoid Data Loss option from earlier releases has been removed in favor of automatically setting the following properties. Also note that MirrorMaker starts correctly if you enter the numeric values in the configuration snippet (rather than using "max integer" for retries and "max long" for max.block.ms).

Producer settings
  • acks=all
  • retries=2147483647
  • max.block.ms=9223372036854775807
Consumer setting
  • auto.commit.enable=false
MirrorMaker setting
  • abort.on.send.failure=true