X

Cloudera Tutorials

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products. Login or register below to access all Cloudera tutorials.

Cloudera named a leader in 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems Get the report

Ready to Get Started?

 

NOTICE

 

As of January 31, 2021, this tutorial references legacy products that no longer represent Cloudera’s current product offerings.

Please visit recommended tutorials:

 

Introduction

Apache NiFi is the first integrated platform that solves the real-time challenges of collecting and transporting data from a multitude of sources and provides interactive command and control of live flows with full and automated data provenance. NiFi provides the data acquisition, simple event processing, transport and delivery mechanism designed to accommodate the diverse dataflows generated by a world of connected people, systems, and things.

For the purposes of this tutorial, assume that a city planning board is evaluating the need for a new highway. This decision is dependent on current traffic patterns, particularly as other roadwork initiatives are under way. Integrating live data poses a problem because traffic analysis has traditionally been done using historical, aggregated traffic counts. To improve traffic analysis, the city planner wants to leverage real-time data to get a deeper understanding of traffic patterns. NiFi was selected for this real-time data integration.

Goals and Objectives

The goal of this tutorial is to provide you with an opportunity to interact with Apache NiFi features while building a dataflow. You do not need programming experience or flow-based programming syntax and feature knowledge to successfully complete this tutorial.

The learning objectives of this tutorial are to:

  • Understand Apache NiFi fundamentals
  • Introduce NiFi’s HTML user interface
  • Introduce NiFi processor configuration, relationships, data provenance, and documentation
  • Create dataflows
  • Incorporate APIs into a NiFi dataflow
  • Learn about NiFi templates
  • Create Process Groups

Prerequisites

Outline

In this tutorial, we work with San Francisco MUNI Transit agency data, gathered from NextBus XML Live Feed, handling vehicle locations, speeds, and other variables.

The tutorial consists of seven sections:

  1. NiFi DataFlow Automation Concepts - Explore the fundamentals of Data Flow Management with NiFi: Core Concepts, Architecture, etc
  2. Launch NiFi HTML UI - Launch your NiFi HTML User Interface (UI). Get NiFi up and running on the CDF Sandbox.
  3. Build a NiFi Process Group to Simulate NextBus API - Simulate the NextBus API live feed with a data seed and check the data generating from the simulator.
  4. Build a NiFi Process Group to Parse Transit Events - Parse the XML file for transit observations(vehicle location, speed, vehicle ID, etc).
  5. Build a NiFi Process Group to Validate the GeoEnriched Data - Integrate Google Places API to bring more meaningful geographic insights and validate them.
  6. Build a NiFi Process Group to Store Data As JSON - Convert XML to JSON data format and store into file on local file system.
  7. Ingest Live Vehicle Routes via NextBus API - Ingest NextBus's live stream data for San Francisco MUNI agency.

Each tutorial provides step by step instructions, so that you can complete the learning objectives and tasks associated with it. You are also provided with a dataflow template for each tutorial that you can use for verification. Each tutorial builds on the previous.



Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.