CDH 6 includes Apache Kafka as part of the core package. The documentation includes improved contents for how to set up, install, and administer your Kafka ecosystem. For more information, see the Cloudera Enterprise 6.0.x Apache Kafka Guide. We look forward to your feedback on both the existing and new documentation.
Using Apache Kafka with Apache Spark Streaming
In CDH 5.7 and higher, the Spark connector to Kafka only works with Kafka 2.0 and higher.
Validating Kafka Integration with Spark Streaming
To validate your Kafka integration with Spark Streaming, run the KafkaWordCount example.
/opt/cloudera/parcels/CDH/lib/spark/bin/run-example streaming.KafkaWordCount <zkQuorum> <group> <topics> <numThreads>
If you installed Spark using packages, use the following command:
/usr/lib/spark/bin/run-example streaming.KafkaWordCount <zkQuorum> <group> <topics><numThreads>
- <zkQuorum> - ZooKeeper quorum URI used by Kafka (for example, zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181).
- <group> - Consumer group used by the application.
- <topic> - Kafka topic containing the data for the application.
- <numThreads> - Number of consumer threads reading the data. If this is higher than the number of partitions in the Kafka topic, some threads will be idle.