X

Cloudera Tutorials

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products. Login or register below to access all Cloudera tutorials.

Cloudera named a leader in 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems Get the report

Ready to Get Started?

 

NOTICE

 

As of January 31, 2021, this tutorial references legacy products that no longer represent Cloudera’s current product offerings.

Please visit recommended tutorials:

 

Introduction

While the demo application runs, you will gain an understanding of how Kafka receives data from a producer at its particular topics.

Outline

Environment Setup

If you have the latest Cloudera DataFlow (CDF) Sandbox installed, then the demo comes pre-installed.

Open a terminal on your local machine and access the sandbox through the shell-in-a-box method. Please visit Learning the Ropes of the HDP Sandbox to review this method.

Before we can perform Kafka operations on the data, we must first have data in Kafka, so let's run the NiFi DataFlow Application. Refer to the steps in this module: Run NiFi in the Trucking IoT Demo, then you will be ready to explore Kafka.

Turn Kafka component on if it's not already on through Ambari.

Persist Data Into Kafka Topics

A NiFi simulator generates data of two types: TruckData and TrafficData as a CSV string. There is some preprocessing that happens on the data to prepare it to be split and sent by NiFi's Kafka producers to two separate Kafka Topics: trucking_data_truck and trucking_data_traffic.

List Kafka Topics

From the terminal, we can see the two Kafka Topics that have been created:

/usr/hdf/current/kafka-broker/bin/kafka-topics.sh --list --zookeeper localhost:2181

Results:

Output:
trucking_data_driverstats
trucking_data_joined
trucking_data_traffic
trucking_data_truck_enriched

View Data in Kafka Topics

As messages are persisted into the Kafka Topics from the producer, you can see them appear in each topic by writing the following commands:

View Data for Kafka Topic: trucking_data_truck_enriched:

/usr/hdf/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server sandbox-hdf.hortonworks.com:6667 --topic trucking_data_truck_enriched --from-beginning

View Data for Kafka Topic: trucking_data_traffic:

/usr/hdf/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server sandbox-hdf.hortonworks.com:6667 --topic trucking_data_traffic --from-beginning

As you can see Kafka acts as a robust queue that receives data and allows for it to be transmitted to other systems.

Note: You may notice the is data encoded in a format we cannot read, this format is necessary for Schema Registry. The reason we are using Schema Registry is because we need it for Stream Analytics Manager to pull data from Kafka.

Next: Learn Basic Operations of Kafka

You have already become familiar with some Kafka operations through the command line, so let's explore basic operations to see how those topics were created, how they can be deleted and how we can use tools to monitor Kafka.



Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.