Your browser is out of date

Update your browser to view this website correctly. Update my browser now


Course Description

This course introduces Cloudera Data Science Workbench (CDSW) and shows how data scientists can use it to run data science and machine learning workflows on the cluster using the Python and R languages.


This course is intended for current or aspiring data scientists who want to learn how to do data science on the cluster using Cloudera Data Science Workbench.

Course Objectives

  • What Cloudera Data Science Workbench (CDSW) is and how it works
  • How to use CDSW to run data science and machine learning workflows on the cluster using the Python and R languages

Book the course

How would you like to train?

Course Contents

1. Introduction to The Course

2. Overview of CDSW

  • Introduction to Cloudera Data Science Workbench
  • Who Can Use CDSW
  • How to Access CDSW
  • Navigating around CDSW
  • User Settings
  • Hadoop Authentication

3. Projects in CDSW

  • Creating a New Project
  • Navigating around a Project
  • Project Settings

4. The CDSW Workbench Interface

  • The Workbench Interface
  • Using the Sidebar
  • Using the Code Editor
  • Engines and Sessions

5. Running Python and R Code in CDSW

  • Running Code
  • Using the Session Prompt
  • Using the Terminal
  • Installing Packages
  • Using Markdown in Comments

6. Using Apache Spark 2 in CDSW

  • Scenario and Dataset
  • Copying Files to HDFS
  • Introducing PySpark (Python track)
  • Introducing sparklyr (R track)
  • Connecting to Spark
  • Reading Data
  • Inspecting Data

7. Data Science and Machine Learning in CDSW

  • Transforming Data (Python track)
  • Transforming Data Using dplyr Verbs (R track)
  • Using SQL Queries
  • Spark DataFrames Functions (R track)
  • Visualizing Data from Spark
  • Machine Learning with MLlib
  • Session History

8. Experiments and Models in CDSW

  • Machine Learning Workflow
  • Running Experiments
  • Using Packages in Experiments
  • Deploying Models
  • Calling Models
  • Using Packages in Models

9. Teams and Collaboration in CDSW

  • Collaboration in CDSW
  • Teams in CDSW
  • Cloning a Git Repository with SSH
  • Using Git for Collaboration

10. Conclusion

Cloudera has not only prepared us for success today, but has also trained us to face and prevail over our big data challenges in the future.


Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.