Cloudera Tutorials

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products. Login or register below to access all Cloudera tutorials.

Cloudera named a leader in 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems Get the report




In this tutorial you’ll learn how to analyze datasets using Data Analytics Studio (DAS) on Cloudera Data Platform (CDP) - Public Cloud. We will also visualize data using a common Business Intelligence (BI) tool - Tableau.




  • Have access to Cloudera Data Platform (CDP) Public Cloud with a Data Lake running.
  • Basic AWS CLI skills
  • (optionally) Installed Tableau. For this tutorial, we're using Tableau Desktop version 2020.2.0 for Windows.


Watch Video


The video below provides a brief overview of what is covered in this tutorial:



Become Familiar with Data Analytics Studio (DAS)


There are 3 tabs to interact with in Data Analytics Studio:

Queries: Allows you to view, search and run previously executed queries. Clicking on query provides additional information such as query details, visual explain, configurations and timeline.



Compose: Worksheets are used to write SQL statements. You can execute SQL statements, view results and get visual explain for queries. You can also save worksheets for re-running later or download results as a CSV file.



Database: Provides detailed information about the structure and metadata of tables within databases.



Create Virtual Warehouse


Before we create a virtual warehouse, we need to make sure the environment is activated and running.

If you see next to the environment name, no need to activate it because it's already been activated and running.

Otherwise, click on to activate the environment. This will create the default database catalog; environment_name-default.

In the Virtual Warehouse section, click on to create a virtual warehouse:

  1. Name: usermarketing-demo
  2. Type: HIVE
  3. Virtual Warehouse Catalog: usermarketing-default
  4. Virtual Warehouse Size: XSMALL
  5. AutoSuspend Timeout: 300 seconds
  6. Nodes: Min: 2, Max: 4
  7. WaitTime: 30 seconds
  8. Click CREATE


Create and Populate Table


Open DAS from your virtual warehouse.

Beginning from CDP home page > Data Warehouse:

  1. Click on Overview
  2. Search for your Virtual Warehouse
  3. Click on 
  4. Open DAS


Now that we have DAS opened, click on Compose, copy-paste the following DDL into the Worksheet, make one modification, then execute:

  id            INT,
  year          INT,
  manufacturer  STRING,
  model         STRING,
  price         DECIMAL(12,2),
  odometer      INT,
  color         STRING,
  transmission  STRING,
  cylinders     STRING,
  fuel          STRING,
  condition     STRING,
  title_status  STRING,
  region        STRING,
  url           STRING,
  description   STRING

-- Modify <storage-location> with Hive Metastore attribute:
--                      hive.metastore.warehouse.external.dir
LOCATION '<storage-location>/usedcars'
tblproperties ("skip.header.line.count"="1")

NOTE: The modification needed above is to specify the storage location for the CSV file. Download dataset and place it in the storage location.

For example:

  • Hive Metastore attribute hive.metastore.warehouse.external.dir is s3a://usermarketing-cdp-demo/warehouse/tablespace/external/hive
  • modify <storage-location> with s3a://usermarketing-cdp-demo/warehouse/tablespace/external/hive
  • After I downloaded/decompressed dataset, I used AWS CLI to copy the file using command:
    • aws s3 cp usedcars.csv s3://usermarketing-cdp-demo/warehouse/tablespace/external/hive/usedcars/usedcars.csv


Let's run a simple query to get the average price of cars, grouped by the model year. Copy-Paste the following query into the Worksheet and execute:

SELECT year, CAST(AVG(price) AS DECIMAL(12,2)) AS avg_price, count(*) AS num_cars
  FROM usedcars
  WHERE year > 1900
    AND year <= 2021
  GROUP BY year




Visualize Data


Two things are needed to visualize data using Tableau.

  • JDBC Driver (*.jar)

Both of these may be obtained by our virtual warehouse. Starting from CDP home page > Data Warehouse:

  1. Click on Overview
  2. Search for your Virtual Warehouse
  3. Click on 
  4. select Copy JDBC URL or Download JDBC Jar


We will use Tableau for our BI tool. According to Tableau's website, the JDBC driver must be stored in specific location:

  • Windows: C:\Program Files\Tableau\Drivers
  • Mac: ~/Library/Tableau/Drivers
  • Linux: /opt/tableau/tableau_driver/jdbc


To begin visualizing your data, Open Tableau:

  1. Select Other Databases (JDBC)
  2. URL: Past the URL you copied from your virtual warehouse
  3. Dialect: SQL92
  4. Username/Password: use credentials used to connect to DAS
  5. Sign In


In the Data Source tab:

  1. select default schema
  2. select and drag table usedcars to the top-right


Select Sheet 1 tab:

  1. select and drag Year into Columns
  2. select and drag Price into Rows
  3. modify measure for Price to use Average


Finally, for our finishing touches:

  1. click on Show Me and select area charts
  2. right-click on Avg. Price > Edit Axis > Uncheck Include zero




Congratulations on completing the tutorial.

As you have seen, it is easy to analyze datasets using Data Analytics Studio (DAS) on Cloudera Data Platform (CDP). I'm sure you will agree that connecting a BI tool, in this case Tableau, was easy and painless.


Further Reading




Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.