X

Cloudera Tutorials

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products. Login or register below to access all Cloudera tutorials.

Cloudera named a leader in 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems Get the report

Ready to Get Started?

 

NOTICE

 

As of January 31, 2021, this tutorial references legacy products that no longer represent Cloudera’s current product offerings.

Please visit recommended tutorials:

 

Introduction

On the previous tutorial we collected data from sensors mounted on our smart vehicle and built a pipeline to mode the data to be used in training of a Machine Learning (ML) model. This section we will showcase the flow of data streaming from the edge to CDF. The data is in the form of images and metadata associated with each image collected by the CSDV (e.g. IMU information, steering angle, and location), we will direct the flow of data towards a CDH cluster where the data will be stored and curated with the purpose of using it to train a model.

Prerequisites

Outline

Concepts

Upload Hadoop HDFS Location to NiFi

Build NiFi Flow to Load Data into HDFS

Start NiFi Flow

Summary

Further Reading

Concepts

We will use Cloudera Edge Manager (CEM) to build a NiFi dataflow in the interactive UI running in the cloud on an aws ec2 instance. This dataflow will be used to extract data from the MiNiFi agent, transform the data for routing csv and image data to HDFS running on another ec2 instance.

  • Cloudera Flow Manager runs on port: 8080/nifi/

<cem-ec2-public-dns>:8080/nifi/

Upload Hadoop HDFS Location to NiFi

SSH into EC2 instance running NiFi:

ssh -i /path/to/pem_file <os-name>@<public-dns-ipv4>
# download hdfs core-site.xml
mkdir -p /tmp/service/hdfs/
cd /tmp/service/hdfs/
wget https://raw.githubusercontent.com/hortonworks/data-tutorials/dev/tutorials/cdf/edge2ai-autonomous-car/assets/services/hadoop_hdfs/core-site.xml

Enter your CDH public host name in these field of core-site.xml:

  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://{CDP Public DNS}:8020</value>
  </property>

Save core-site.xml.

Build NiFi Flow to Load Data into HDFS

Add Input Port for CSV Data Ingest from MiNiFi Agent

We will use the input port created on the previous section as an entry point for our flow onto NiFi:

input-port-csv

Note: Take note of input port ID under port details since we will need it for CEM UI to connect the MiNiFi processors to the NiFi RPG.

Save CSV Input Port ID for MiNiFi CEM Flow

input-port-csv-id

Note: if you haven't added inport port id for csv data in your minifi flow, take this id above to your minifi flow.

Connect and Load CSV to HDFS

Add a PutHDFS processor onto canvas to store driving log data. Update processor name to PutCsvHDFS.

Update the following processor properties:

Table 5: update PutCsvHDFS Properties

Property Value
Hadoop Configuration Resources /tmp/service/hdfs/core-site.xml
Directory /tmp/data/input/racetrack/image/

Connect the AWS_MiNiFi_CSV input port to PutCsvHDFS processor:

connect-csv-to-hdfs

Add Input Port for Image Data Ingest from MiNiFi Agent

If you haven't already Add an input port to extract image data from MiNiFi:

input-port-img

Take note of input port ID under port details since we will need it for CEM UI.

Save Image Input Port ID for MiNiFi CEM Flow

input-port-img-id

Note: if you haven't added inport port id for image data in your minifi flow, take this id above to your minifi flow.

Connect and Load Images to HDFS

Add a PutHDFS processor onto canvas to store driving log data. Update processor name to PutImgHDFS.

Table 6: Update the following processor properties:

Property Value
Hadoop Configuration Resources /tmp/service/hdfs/core-site.xml
Directory /tmp/data/input/racetrack/image/logitech

Connect the AWS_MiNiFi_IMG input port to PutImgHDFS processor:

connect-img-to-hdfs

Start NiFi Flow

Highlight all components on NiFi canvas with ctrl+A or cmd+A, then in the operate panel, press the start button:

started-nifi-flow

You should see data flowing from NiFi to HDFS as above.

Note: if you don't see data flowing, go back to the CEM UI, make sure you have your flow connected to this NiFi remote instance. Also make sure MiNiFi Agent is runnining. Potential error you may see cannot be ignored, it most likely means you have the wrong core-site.xml. You should make sure if you need to do a search for core-site.xml on CDH that it comes from the client, example cdsw-client, and the following error should go away for PutHDFS:

puthdfs-error-ignore

Summary

This tutorial explained in greater detail what Cloudera DataFlow is and how it’s components are indispensable tools when building a bridge from edge to AI. In the section of the tutorial we will review the benefits of Cloudera Data Science Workbench and use it to build a model which will be deployed back into our car using CDF and the concepts we learned today.

Further Reading

Cloudera Edge Management

Apache NiFi MiNiFi

Apache NiFi



Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.