X

Cloudera Tutorials

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products. Login or register below to access all Cloudera tutorials.

By registering or submitting your data, you acknowledge, understand, and agree to Cloudera's Terms and Conditions, including our Privacy Statement.
By checking this box, you consent to receive marketing and promotional communications about Cloudera’s products and services and/or related offerings from us, or sent on our behalf, in accordance with our Privacy Statement. You may withdraw your consent by using the unsubscribe or opt-out link in our communications.

Cloudera acquires Octopai's platform to enhance metadata management capabilities

Read the press release

 

Introduction

 

See for yourself how easy it is to use Cloudera AI on Cloudera on cloud

In this tutorial, we will create a linear regression model using housing data, deploy the model within Cloudera AI, and test it using a web application.

 

 

Prerequisites

 

  • Have access to Cloudera on Cloud
  • Have created a Cloudera workload User
  • Ensure proper Cloudera AI role access
    • MLUser - ability to run workloads
    • MLAdmin - ability to create and delete workspaces
  • Installed npm (Node Package Manager)

 

Watch Video

 

The video below provides a brief overview of what is covered in this tutorial:

 

 

Download Assets

 

Download and unzip tutorial files; remember this location.

For now, this is all we need to do. We will use these files later in the tutorial.

Note: House price dataset is publicly available from Kaggle.

 

 

Setup ML Environment

 

Create Workspace

 

If you don’t already have a machine learning workspace provisioned for you, let’s create it.

Select AI from Cloudera home page:

 

cdp-home-ml

 

In the ML Workspaces section, select Provision Workspace:

 

cml-workspaces-provision-button

 

Two simple pieces of information are needed to a provision workspace - Workspace Name and Environment. For example:

  1. Workspace Name: cml-tutorial
  2. Environment: usermarketing
  3. Select Provision Workspace

 

cml-workspaces-provision-form

 

Create Project

 

Beginning from the ML Workspaces section:

  1. Open your workspace by selecting on its name: cml-tutorial

  2. Select New Project

 

cml-workspaces-open-workspace

 

Complete the New Project form using:

  1. Project Name: Predicting House Prices

  2. Project Visibility: Public

  3. Initial Setup: Local
    Upload or Drag-Drop cml-project-houseprice.zip you downloaded earlier

  4. Create Project

 

cml-new-project-form

 

Create Model

 

Train Model

 

Now that we have a working environment, let’s create a session in our project. We will use the housing data to train a linear regression model. 

 

Beginning from the Projects section, select the project name, Predicting House Prices.
Select New Session and complete the session form:

  1. Session Name: Home Price Prototype

  2. Editor: Workbench

  3. Kernel: Python 3

  4. Engine Image: Default

  5. Resource Profile: Default (1 vCPU / 2 GiB Memory)

  6. Start Session

 

cml-new-session-form

 

Let’s open a terminal window by selecting, >_ Terminal Access and type:

sh cdsw-build.sh

This will install the dependent libraries needed for the project (sklearn, pandas and numpy). Once it completes, close the terminal window.

 

cml-session-terminal

NOTE: You only need to install dependent libraries once - this step can be skipped in future sessions.

 

Select file, train-model.py and click on to run the entire program.

Using house_data.csv, a linear regression model will be created and saved in a new file called, housePredictor.pickle.

Now that we’ve created our model, we no longer need this session - select Stop to terminate the session.

 

cml-run-train-model

 

Add Model to Project

 

In the Models section, select New Model to add the model we’ve just created to the project. Complete form as follows:

Name: HousePredictor

Description: Predicts the price of a home

File: model-wrapper.py

Function: PredictFunc

Example Input:

{
  "bathrooms": "2",
  "bedrooms": "3",
  "sqft_living": "1800",
  "sqft_lot": "2200",
  "floors": "1",
  "waterfront": "1",
  "condition": "3"
}

Example Output: { "result" : 100000 }

Kernel: Python 3

Engine Profile: Default

Replicas: 1

Select Deploy Model

 

cml-models-create

 

Setup and Run Web Application

 

There are two pieces of information we need to get before we could deploy the website.

Beginning from the Projects section, select the project name, Predicting House Prices. Next, select the model name, HousePredictor.

 

cml-project-model

 

Under the Overview and Shell tabs, capture hostURL and accessKey.

 

As part of the download assets, we provided a folder named, cml-webapp-houseprice. Using your favorite editor, modify cml-webapp-houseprice/src/App.js by replacing:

<accessKey> with accessKey

<hostURL> with hostURL

 

cml-model-accesskey

 

In the command line, move into folder cml-webapp-houseprice and run the following commands:

npm install

npm start

 

cml-webapp-start

 

A new browser window/tab should automatically open using http://localhost:3000. You are encouraged to play with different home configurations and see its predicted value.

 

cml-webapp-estimator

Congratulations on completing the tutorial.

While playing with the web application, you may have noticed interesting price values being predicated. If you in the mood for a good challenge, modify train-model.py and improve the model.

As you have seen, it is easy to use Cloudera Machine Learning (CML) to deploy your machine learning projects. This is only the beginning - there is so much more to learn.

 

 

Further Reading

 

Blogs

Other

 

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.