X

Cloudera Tutorials

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products. Login or register below to access all Cloudera tutorials.

By registering or submitting your data, you acknowledge, understand, and agree to Cloudera's Terms and Conditions, including our Privacy Statement.
By checking this box, you consent to receive marketing and promotional communications about Cloudera’s products and services and/or related offerings from us, or sent on our behalf, in accordance with our Privacy Statement. You may withdraw your consent by using the unsubscribe or opt-out link in our communications.

Cloudera acquires Octopai's platform to enhance metadata management capabilities

Read the press release

 

Introduction

 

Learn to quickly visualize datasets and create stunning dashboards using Cloudera Data Visualization on Cloudera on cloud.

Cloudera Data Visualization enables data engineers, business analysts, and data scientists to quickly and easily explore data, collaborate, and share insights across the data lifecycle.

 

 

Prerequisites

 

 

 

Watch Video

 

The video below provides a brief overview of what is covered in this tutorial:

 

 

Download Assets

 

There are two (2) options in getting assets for this tutorial:

  1. Download a ZIP file

It contains only necessary files used in this tutorial. Unzip tutorial-files.zip and remember its location.

  1. Clone our GitHub repository

It provides assets used in this and other tutorials; organized by tutorial title.

 

Using AWS CLI, copy the following data file to your S3 bucket, defined by your environment’s storage.location.base attribute:

shipping-data.csv

For example, property storage.location.base has value s3a://usermarketing-cdp-demo; we will copy the file using the command:

aws s3 cp shipping-data.csv s3://usermarketing-cdp-demo/tutorial-data/shipping-data.csv

 

Note: shipping dataset is publicly available on Kaggle.

 

output-aws-cli

 

Create Virtual Warehouse

 

Before we create a virtual warehouse, we need to make sure the environment is activated and running.

Beginning from the Cloudera Home Page, select Data Warehouse.

 

cdp-homepage-data-warehouse

Locate your environment by using the filter. If you see  next to the environment name, no need to activate it because it's already been activated and running.

Otherwise, click on  to activate the environment. This will create the default database catalog, environment_name-default.

 

data-warehouse-activate-env

 

Now that the environment has been activated, in the Virtual Warehouse section, select  to create a virtual warehouse:

  1. Name: Shipping-VW
  2. Type: HIVE
  3. Database Catalog: environment_name-default
  4. Size: xsmall - 2 Executor Nodes
  5. AutoSuspend Timeout: 300 seconds
  6. Nodes: Min: 2, Max: 4
  7. Install Data Visualization
  8. CREATE

 

create-virtual-warehouse

 

Create and Populate Table

 

Open HUE from your virtual warehouse:

  1. Filter using your Virtual Warehouse, Shipping-VW
  2. Click on 
  3. Open HUE

 

virtual-warehouse-open-hue

 

Now that we have HUE opened, select </> Editor, copy-paste the following SQL statements onto the worksheet, make one modification and execute it:

DROP TABLE IF EXISTS default.shipping;


CREATE EXTERNAL TABLE IF NOT EXISTS default.shipping (
    ID                  integer,
    Warehouse_block     string,
    Mode_of_Shipment    string,
    Customer_care_calls integer,
    Customer_rating     integer,
    Cost_of_the_Product integer,
    Prior_purchases     integer,
    Product_importance  string,
    Gender              string,
    Discount_offered    string,
    Weight_in_gms       integer,
    Arrive_on_time      integer
  )
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  STORED AS TEXTFILE
  LOCATION ${dataset_location}
  tblproperties("skip.header.line.count"="1");


SELECT * FROM default.shipping;

 

 

IMPORTANT: You need to provide the value for variable named, dataset_location. Set the value to the location of the dataset you specified when you downloaded the dataset. For example, 's3a://usermarketing-cdp-demo/tutorial-data/'.

 

hue-create-table

 

Visualize Data

 

Open Data Visualization from your virtual warehouse.

  1. Filter using your Virtual Warehouse, Shipping-VW
  2. Click on 
  3. Open Data Visualization

 

virtual-warehouse-open-dataviz

 

Create Dataset using Table Data

 

Starting from Cloudera Data Visualization home page, select DATA.

  1. Select Default Hive VW connection
  2. Select Datasets tab
  3. Select NEW DATASET

Create New dataset using table source:

Dataset Title: Shipping

Dataset Source: From Table

Select Database: default

Select Table: shipping

Edit dataset to adjust Dimensions/Measures:

  1. Select dataset named Shipping
  2. Select Fields
  3. Select EDIT FIELDS

Modify product_importance as a Measure

Modify discount_offered as a Measure

Modify id as a Dimension

Click on SAVE

 

dataviz-create-dataset

 

Visualize Data

 

Open Data Visualization from your virtual warehouse.

  1. Filter using your Virtual Warehouse, Shipping-VW
  2. Click on 
  3. Open Data Visualization

 

virtual-warehouse-open-dataviz

 

Let’s build the Dashboard.

  1. Select VISUALS
  2. Select NEW DASHBOARD

 

viz-new-dashboard

 

We will use:

Title: Shipping Dashboard Example

Subtitle: Visually appealing representation of shipping data

 

In Dashboard Designer, select the Visuals tab. We are going to create four (4) new visuals using the Default Hive VW connection and Shipping dataset.

 

dataviz-create-new-visual

 

Visual for Shipping per Warehouse

 

Title: Shipping per Warehouse

Build Tab:

Graph Type: Bars

X Axis: warehouse_block

Y Axis: Record Count

Visual Settings Tab:

Bar size range: 25-103

Refresh Visual

 

dataviz-shipping-per-warehouse

 

Visual for Mode of Shipment

 

Title: Mode of Shipment

Build Tab:

Graph Type: Pie

Dimensions: mode_of_shipment

Measures: Record Count

Refresh Visual

 

dataviz-mode-of-shipment

 

Visual for Average Customer Rating

 

Title: Average Customer Rating

Build Tab:

Graph Type: Gauge

Dimensions: mode_of_shipment

Measures: Record Count

Refresh Visual

 

dataviz-average-customer-rating

 

Visual for Weight Distribution

 

Title: Shipped Weight Distribution

Build Tab:

Graph Type: Histogram

Measure: weight_in_gms

Visual Settings Tab:

Bar size range: 20-110

Refresh Visual

 

dataviz-shipped-weight-distribution

 

We are done creating our dashboard. To save it, click on Save.

The final dashboard should look like the following:

 

dataviz-final-dashboard

 

Summary

 

Congratulations on completing the tutorial.

As you have seen, it is easy to visualize datasets using Cloudera Data Visualization in Cloudera's data platform. 

You are encouraged to be creative and develop other stunning dashboards.

 

 

Further Reading

 

Videos

Blogs

Other

 

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.