Migrating from Apache Kafka to Cloudera Distribution of Kafka

This topic describes the required steps to migrate an existing Apache Kafka instance to Cloudera Distribution of Apache Kafka.

Assumptions

  • This guide assumes you are migrating to a Kafka cluster managed by Cloudera Manager.
  • This guide assumes you can plan a maintenance window for your migration.
  • This guide assumes you are migrating from a compatible release version, as shown in the table below:
From Apache Kafka To Cloudera Distribution of Kafka
0.8.x 1.x

Migration Steps

Cloudera suggests you migrate your system using the following procedure. The order in which you migrate components is significant. It is important to migrate brokers first, then migrate clients.

Before You Begin

  1. Shut down all existing producers, consumers, MirrorMaker instances, and Kafka brokers.
  2. If not already installed, install Cloudera Manager. See Installing Cloudera Manager, CDH, and Managed Services.
    1. Add the CDH and Kafka Parcels at installation time.
    2. Do not add any services yet: those instruction are below. Skip the install page by clicking the Cloudera Manager Home button.

Step 1. Migrating Zookeeper

Kafka stores its metadata in ZooKeeper. As part of migrating to Cloudera Distribution of Kafka, you must also migrate your ZooKeeper instance to the supported version, included with CDH.
  1. Shut down your existing ZooKeeper cluster.
  2. Back up your dataDir and dataLogDir by copying them to another location or machine.
  3. Add the ZooKeeper service to the cluster where you want to run Cloudera Kafka. See Adding a Service.
  4. Add the ZooKeeper role to all machines that were running ZooKeeper.
  5. Set any custom configuration from your old zoo.cfg file in Cloudera Manager.
  6. Make sure dataDir and dataLogDir match your old configuration. This is important because this is where all your data is stored.
  7. Make sure the zookeeper user owns the files in the dataDir and dataLogDir. For example:
    ex: chown -R zookeeper /var/lib/zookeeper
  8. Start the new ZooKeeper service.
  9. Use the zookeeper-client CLI to validate that some data exists. You should see nodes such as brokers, consumers, and configs. You might need to adjust your chroot. For example:
    zookeeper-client -server hostname:port
                        ls /
                        

Step 2. Migrating Kafka Brokers

  1. All producers, consumers, and Kafka brokers should still be shut down.
  2. Back up your log.dirs from the old broker machines by copying them to another location or machine.
  3. Add the Kafka service to the cluster where you migrated ZooKeeper. See Adding a Service.
  4. Add the broker role to all machines that were running brokers.
  5. Make sure the kafka user owns the log.dirs files. For example:
    chown -R kafka /var/local/Kafka/data
  6. Set any custom configuration from your old server.properties file in Cloudera Manager.
    • Make sure to overide the broker.id on each node to match the configured value in your old configurations. This is important: if these values do not match, Kafka treats your brokers as new brokers and not your existing ones.
    • Make sure log.dirs and zookeeper.chroot matches your old configuration. This is important, because this is where all of your data and state information is stored.
  7. Start the Kafka brokers using Cloudera Manager.

Step 3. Migrating MirrorMaker

  1. Add the MirrorMaker role to all machines that were running MirrorMaker before.
  2. Set any custom configuration from your old producer.properties and consumer.properties files in Cloudera Manager.
  3. Start the MirrorMaker instances using Cloudera Manager.

Step 4. Migrating Kafka Clients

Although Kafka might function with your existing clients, you must also upgrade all of your producers and consumers in order to have all Cloudera patches and bug fixes, and to have a fully supported system.

Migration requires that you change your Kafka dependencies from the Apache versions to the Cloudera versions, recompile your classes, and redeploy them. Use the Maven repository locations as described in Maven Artifacts for Kafka.