Cloudera Tutorials

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products. Login or register below to access all Cloudera tutorials.

Cloudera makes bold bet on strategic acquisition of Verta’s Operational AI Platform Read the blog




See for yourself how easy it is to use Cloudera Machine Learning (CML) on Cloudera Data Platform Public Cloud (CDP-PC).

In this tutorial, we will create a linear regression model using housing data, deploy the model within CML, and test it using a web application.





  • Have access to Cloudera Data Platform (CDP) Public Cloud
  • Have created a CDP workload User
  • Ensure proper CML role access
    • MLUser - ability to run workloads
    • MLAdmin - ability to create and delete workspaces
  • Installed npm (Node Package Manager)


Watch Video


The video below provides a brief overview of what is covered in this tutorial:



Download Assets


Download and unzip tutorial files; remember this location.

For now, this is all we need to do. We will use these files later in the tutorial.

Note: House price dataset is publicly available from Kaggle.



Setup ML Environment


Create Workspace


If you don’t already have a machine learning workspace provisioned for you, let’s create it.

Select Machine Learning from Cloudera Data Platform (CDP) home page:




In the ML Workspaces section, select Provision Workspace:




Two simple pieces of information are needed to a provision workspace - Workspace Name and Environment. For example:

  1. Workspace Name: cml-tutorial
  2. Environment: usermarketing
  3. Select Provision Workspace




Create Project


Beginning from the ML Workspaces section:

  1. Open your workspace by selecting on its name: cml-tutorial

  2. Select New Project




Complete the New Project form using:

  1. Project Name: Predicting House Prices

  2. Project Visibility: Public

  3. Initial Setup: Local
    Upload or Drag-Drop cml-project-houseprice.zip you downloaded earlier

  4. Create Project




Create Model


Train Model


Now that we have a working environment, let’s create a session in our project. We will use the housing data to train a linear regression model. 


Beginning from the Projects section, select the project name, Predicting House Prices.
Select New Session and complete the session form:

  1. Session Name: Home Price Prototype

  2. Editor: Workbench

  3. Kernel: Python 3

  4. Engine Image: Default

  5. Resource Profile: Default (1 vCPU / 2 GiB Memory)

  6. Start Session




Let’s open a terminal window by selecting, >_ Terminal Access and type:

sh cdsw-build.sh

This will install the dependent libraries needed for the project (sklearn, pandas and numpy). Once it completes, close the terminal window.



NOTE: You only need to install dependent libraries once - this step can be skipped in future sessions.


Select file, train-model.py and click on to run the entire program.

Using house_data.csv, a linear regression model will be created and saved in a new file called, housePredictor.pickle.

Now that we’ve created our model, we no longer need this session - select Stop to terminate the session.




Add Model to Project


In the Models section, select New Model to add the model we’ve just created to the project. Complete form as follows:

Name: HousePredictor

Description: Predicts the price of a home

File: model-wrapper.py

Function: PredictFunc

Example Input:

  "bathrooms": "2",
  "bedrooms": "3",
  "sqft_living": "1800",
  "sqft_lot": "2200",
  "floors": "1",
  "waterfront": "1",
  "condition": "3"

Example Output: { "result" : 100000 }

Kernel: Python 3

Engine Profile: Default

Replicas: 1

Select Deploy Model




Setup and Run Web Application


There are two pieces of information we need to get before we could deploy the website.

Beginning from the Projects section, select the project name, Predicting House Prices. Next, select the model name, HousePredictor.




Under the Overview and Shell tabs, capture hostURL and accessKey.


As part of the download assets, we provided a folder named, cml-webapp-houseprice. Using your favorite editor, modify cml-webapp-houseprice/src/App.js by replacing:

<accessKey> with accessKey

<hostURL> with hostURL




In the command line, move into folder cml-webapp-houseprice and run the following commands:

npm install

npm start




A new browser window/tab should automatically open using http://localhost:3000. You are encouraged to play with different home configurations and see its predicted value.



Congratulations on completing the tutorial.

While playing with the web application, you may have noticed interesting price values being predicated. If you in the mood for a good challenge, modify train-model.py and improve the model.

As you have seen, it is easy to use Cloudera Machine Learning (CML) to deploy your machine learning projects. This is only the beginning - there is so much more to learn.



Further Reading






Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.