As of January 31, 2021, this tutorial references legacy products that no longer represent Cloudera’s current product offerings.
Please visit recommended tutorials:
- How to Create a CDP Private Cloud Base Development Cluster
- All Cloudera Data Platform (CDP) related tutorials
In this tutorial, you will verify your sandbox IP is mapped to your desired hostname, your admin password is setup and the services that need to be on are activated.
Map HDP Sandbox IP to hostname, if you need help, reference Learning the Ropes of the HDP Sandbox tutorial section ENVIRONMENT SETUP, go to Map Sandbox IP To Your Desired Hostname In The Hosts File in that tutorial
Map CDF Sandbox IP to hostname, if you need help, reference Learning the Ropes of the CDF Sandbox tutorial section ENVIRONMENT SETUP, go to Map Sandbox IP To Your Desired Hostname In The Hosts File in that tutorial
Set the Ambari admin password for HDP, if you need help, reference Learning the Ropes of the HDP Sandbox tutorial section Admin Password Reset
Set the Ambari admin password for CDF, if you need help, reference Learning the Ropes of the CDF Sandbox tutorial section Learning the Ropes of CDF Sandbox
Need to have data present in Druid, refer to Real-Time Event Processing In NiFi, SAM, Schema Registry and SuperSet tutorial to setup the SAM data pipeline to store data into Druid. All you need to do is step 1 through 3.
If unsure, login to Ambari admin Dashboard
- for HDF at http://sandbox-hdf.hortonworks.com:8080 and verify Zookeeper, Storm, Kafka, NiFi, Schema Registry, Streaming Analytics Manager starts up, else start them with Maintenance Mode turned off.
- for HDP at http://sandbox-hdp.hortonworks.com:8080 and verify HDFS, YARN, Druid and Superset starts up, else start them with Maintenance Mode turned off.
For example, to start Druid, you would do the following.
After starting Druid and Superset, your Background Operations would look similar:
We include reference images of what needs to be started in the data pipeline to get the data into Druid. You should have already done this step, which was pointed out in the prerequisites.
1. In the NiFi canvas http://sandbox-hdf.hortonworks.com:9090/nifi, start the NiFi DataFlow by pressing the green start button in the operate panel.
2. In the SAM canvas http://sandbox-hdf.hortonworks.com:7777/, start the SAM topology by pressing the green start button at the bottom right of the canvas.
3. In the Superset UI http://sandbox-hdp.hortonworks.com:9089, login with credentials admin/admin, wait about 5 – 10 minutes for Kafka data to be consumed, then periodically, select the Sources dropdown and click on Refresh Druid Metadata. Eventually, the two Druid data sources will appear.
Congratulations! Data is now in Druid. We can see the datasources in Superset. We are ready to start creating visualization representations of the data.
- An introduction to Druid
- How Superset and Druid Power Real-Time Analytics at Airbnb | DataEngConf SF '17