X

Cloudera Tutorials

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products. Login or register below to access all Cloudera tutorials.

Cloudera named a leader in 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems Get the report

Ready to Get Started?

 

NOTICE

 

As of January 31, 2021, this tutorial references legacy products that no longer represent Cloudera’s current product offerings.

Please visit recommended tutorials:

 

Introduction

Let's walk through NiFi's place in the demo.

Outline

Environment Setup

We will be working on the trucking-IoT project. If you have the latest Cloudera DataFlow (CDF) Sandbox installed, then the demo comes pre-installed.

Deploy the NiFi DataFlow

Let's activate the NiFi data flow, so it will process the simulated data and push the data into Kafka Topics. Open NiFi at http://sandbox-hdf.hortonworks.com:9090/nifi/. If not, or you do not already have it setup, then refer to Setup Demo on existing CDF Sandbox.

The Trucking IoT component template should appear on the NiFi canvas by default as seen below.

dataflow

To add the Trucking IoT template manually do the following:

1. Drag and drop the components template icon nifi_template onto the NiFi canvas. Select Trucking IoT, then click ADD. Deselect the data flow by clicking anywhere on the canvas.

2. In the Operate Palette with the hand point upward, expand it if it is closed, click on the gear icon then click on Controller Services gear icon. In Controller Services, check that the state is "Enabled" as seen on the image below.

controller-services-lightning-bolt

If it is not "Enabled" follow the steps below:

3. Click on the Lighting Bolt to the right of HortonworksSchemaRegistry.

4. For Scope, select Service and referencing componen...,press ENABLE then CLOSE.

controller-services-scope

5. All the Controller Services should be "Enabled" as seen on step 2.

Note: If any of your services are disabled, you can enable them by clicking on the lightning bolt symbol on the far right of the table. Controller Services are required to be enabled to successfully run the dataflow.

Let's select the entire dataflow. Hold command or ctrl and A and the whole dataflow will be selected. In the Operate Pallete, click on the start button start-button and let it run for 1 minute. The red stop symbols red-symbol at the corner of each component in the dataflow will turn to a green play symbol green-symbol. You should see the numbers in the connection queues change from 0 to a higher number indicating that the data is being processed.

You should see an image similar to the one below:

dataflow

Let's analyze what actions the processors taking on the data via NiFi's Data Provenance:

Unselect the entire dataflow then right click on GetTruckingData: Generates data of two types: TruckData and TrafficData. Click View Data Provenance.

GetTruckingData

A table with provenance events will appear. An event illustrates what type of action the processor took against the data. For GetTruckingData, it is creating sensor data in two categories as one stream. Choose an event with 20 bytes to see TrafficData or greater than or equal to 98 bytes to see TruckData.

data-provenance

To view TruckData or TrafficData sensor data select the i to the left of the row you want to see. Go to the tab that says CONTENT, then VIEW.

  • TruckData: Data simulated by sensors onboard each truck.

TruckData

  • TrafficData: Data simulated from traffic congestion on a particular trucking route.

TrafficData

You can check the data provenance at each processor to get a more in-depth look at the steps NiFi is performing to process and transform the two types of simulated data. Here is a flow chart to show the steps:

nifi-flow-chart

Next: Building a NiFi DataFlow

Now that we know how NiFi fits into the data pipeline of the demo and what kind of transformations on the data is performing, let's dive into configuring processors to see how the dataflow is constructed.



Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.