CDH 6 includes Apache Kafka as part of the core package. The documentation includes improved contents for how to set up, install, and administer your Kafka ecosystem. For more information, see the Cloudera Enterprise 6.0.x Apache Kafka Guide. We look forward to your feedback on both the existing and new documentation.
Migrating from Apache Kafka to CDK Powered By Apache Kafka
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
This topic describes the required steps to migrate an existing Apache Kafka instance to CDK Powered By Apache Kafka.
- You are migrating to a Kafka cluster managed by Cloudera Manager.
- You can plan a maintenance window for your migration.
- You are migrating from a compatible release version, as shown in the table below:
|From Apache Kafka||To CDK Powered By Apache Kafka|
Steps for Migrating from Apache Kafka to Cloudera Distribution of Apache Kafka
Cloudera recommends the following migration procedure. You must migrate brokers first, and then clients.
Before You Begin
- Shut down all existing producers, consumers, MirrorMaker instances, and Kafka brokers.
- If not already installed, install Cloudera Manager. See Installing Cloudera Manager, CDH, and Managed Services.
- Add the CDH and Kafka parcels at installation time.
- Do not add any services yet. Skip the install page by clicking the Cloudera Manager icon in the top navigation bar.
Step 1. Migrating ZookeeperKafka stores its metadata in ZooKeeper. When migrating to Cloudera Distribution of Kafka, you must also migrate your ZooKeeper instance to the supported version included with CDH.
- Shut down your existing ZooKeeper cluster.
- Back up your dataDir and dataLogDir by copying them to another location or machine.
- Add the ZooKeeper service to the cluster where you will run Cloudera Kafka. See Adding a Service.
- Add the ZooKeeper role to all machines that were running ZooKeeper.
- Set any custom configuration from your old zoo.cfg file in Cloudera Manager.
- Make sure dataDir and dataLogDir match your old configuration. This is important because this is where all your data is stored.
- Make sure the zookeeper user owns the files in the dataDir and dataLogDir. For example:
ex: chown -R zookeeper /var/lib/zookeeper
- Start the new ZooKeeper service.
- Use the zookeeper-client CLI to validate that data exists. You should see nodes such as brokers, consumers,
and configs. You might need to adjust your chroot. For example:
zookeeper-client -server hostname:port ls /
Step 2. Migrating Kafka Brokers
All producers, consumers, and Kafka brokers should still be shut down.
- Back up your log.dirs from the old broker machines by copying them to another location or machine.
- Add the Kafka service to the cluster where you migrated ZooKeeper. See Adding a Service.
- Add the broker role to all machines that were running brokers.
- Make sure the kafka user owns the log.dirs files. For example:
chown -R kafka /var/local/Kafka/data
- Set any custom configuration from your old server.properties file in Cloudera Manager.
- Make sure to override the broker.id on each node to match the configured value in your old configurations. If these values do not match, Kafka treats your brokers as new brokers and not your existing ones.
- Make sure log.dirs and zookeeper.chroot match your old configuration. All of your data and state information is stored here.
- Start the Kafka brokers using Cloudera Manager.
Step 3. Migrating MirrorMakerThese are the steps for migrating the MirrorMaker role. To avoid compatibility issues, migrate downstream clusters first.
- Add the MirrorMaker role to all machines that were running MirrorMaker before.
- Set any custom configuration from your old producer.properties and consumer.properties files in Cloudera Manager.
- Start the MirrorMaker instances using Cloudera Manager.
Step 4. Migrating Kafka Clients
Although Kafka might function with your existing clients, you must also upgrade all of your producers and consumers to have all Cloudera patches and bug fixes, and to have a fully supported system.
Migration requires that you change your Kafka dependencies from the Apache versions to the Cloudera versions, recompile your classes, and redeploy them. Use the Maven repository locations as described in Maven Artifacts for CDK Powered By Apache Kafka.