As of January 31, 2021, this tutorial references legacy products that no longer represent Cloudera’s current product offerings.
Please visit recommended tutorials:
- How to Create a CDP Private Cloud Base Development Cluster
- All Cloudera Data Platform (CDP) related tutorials
Visit the Storm tutorial to learn about the Trucking IoT Use Case.
What is NiFi's role in this Stream Processing Application?
- NiFi acts as the producer that ingests data from the truck and traffic IoT devices, does simple event processing on the data, so that it can be split into TruckData and TrafficData that can be sent as messages to two Kafka topics.
To learn about what NiFi is, visit What is Apache NiFi? from our Analyze Transit Patterns with Apache NiFi tutorial.
At a high level, our data pipeline looks as follows:
MiNiFi Simulator -----> NiFi ----> Kafka
There is a data simulator that replicates MiNiFi's place in the data flow on IoT edge, MiNiFi is embedded on the vehicles, so the simulator generates truck and traffic data. NiFi ingests this sensor data. NiFi's flow performs preprocessing on the data to prepare it to be sent to Kafka.
Guaranteed Delivery: Achieved by persistent write-ahead log and content repository allow for very high transaction rates, effective load-spreading, copy-on-write, and play to the strengths of traditional disk read/writes.
Data Buffering with Back Pressure and Pressure Release: If data being pushed into the queue reaches a specified limit, then NiFi will stop the process send data into that queue. Once data reaches a certain age, NiFi will terminate the data.
Prioritized Queuing: A setting for how data is retrieved from a queue based on largest, smallest, oldest or other custom prioritization scheme.
Flow Specific QoS: Flow specific configuration for critical data that is loss intolerant and whose value becomes of less value based on time sensitivity.
Ease of Use
Visual Command and Control: Enables visual establishment of data flow in real-time, so any changes made in the flow will occur immediately. These changes are isolated to only the affected components, so there is not a need to stop an entire flow or set of flows to make a modification.
Flow Templates: A way to build and publish flow designs for benefitting others and collaboration.
Data Provenance: Taking automatic records and indexes of the data as it flows through the system.
Recovery/Recording a rolling buffer of fine-grained history: Provides click to content, download of content and replay all at specific points in an object's life cycle.
System to System: Offers secure exchange through use of protocols with encryption and enables the flow to encrypt and decrypt content and use shared-keys on either side of the sender/recipient equation.
User to System: Enables 2-Way SSL authentication and provides pluggable authorization, so it can properly control a user's access and particular levels (read-only, data flow manager, admin).
Multi-tenant Authorization: Allows each team to manage flows with full awareness of the entire flow even parts they do not have access.
Extension: Connects data systems no matter how different data system A is from system B, the data flow processes execute and interact on the data to create a uni-line or bidirectional line of communication.
Classloader Isolation: NiFi provides a custom class loader to guarantee each extension bundle is as independent as possible, so component-based dependency problems do not occur as often. Therefore, extension bundles can be created without worry of conflict occurring with another extension.
Site-to-Site Communication Protocol: Eases transferring of data from one NiFi instance to another easily, efficiently and securely. So devices embedded with NiFi can communicate with each other via S2S, which supports a socket based protocol and HTTP(S) protocol.
Flexible Scaling Model
Scale-out (Clustering): Clustering many nodes together. So if each node is able to handle hundreds of MB per second, then a cluster of nodes could be able to handle GB per second.
Scale-up & down: Increase the number of concurrent tasks on a processor to allow more processes to run concurrently or decrease this number to make NiFi suitable to run on edge devices that have limited hardware resources. View MiNiFi Subproject to learn more about solving this small footprint data challenge.
We have become familiar with NiFi's role in the use case, next let's move onto seeing NiFi in action while the demo application runs.