Your browser is out of date

Update your browser to view this website correctly. Update my browser now



This course includes video lectures, assessments, and hands-on exercise access. The course provides an introduction to Machine Learning, including coverage of collaborative filtering, clustering, classification, algorithms, and data volume.

Immersive Training

Through instructor-led discussion, as well as hands-on exercises, participants will learn topics including:

  • Data types, statistics support, feature extraction, transforming vectors, using the StandardScaler class
  • An overview of dimensionality reduction
  • Machine learning models, regression, linear regression support, and regularization. 
  • Finally, the course discusses machine learning with Spark ML topics such as using data frames, transformers and estimators, an introduction to pipelines, using pipelines to generate models, and regularization.

Audience and prerequisites

Introduction to Machine Learning does not have prerequisites, but student must know Python or Scala to understand the material covered. .

Please note that this course does not teach big data concepts, nor does it cover how to use Cloudera software. Instead, it is meant as a follow up to our Developer Training for Spark and Hadoop course.


Book the course

How would you like to train?

Take the foundational course - Spark and Hadoop Developer Training

Scala and Python developers will learn key concepts and gain the expertise needed to ingest and process data, and develop high-performance applications using Apache Spark 2.

Learn More

Course Contents

Machine Learning Overview

  • Introduction
  • Collaborative Filtering
  • Clustering
  • Classification
  • Relationship of Algorithms and Data Volume

Machine Learning with Spark MLlib

  • Introduction
  • Data Types
  • Basic Statistics
  • Feature Extraction
  • Dimensionality Reduction
  • Models
  • Regression

Machine Learning with Spark ML

  • Overview of Spark ML
  • DataFrames
  • Transformers and Estimators
  • Pipelines
  • Decision Tree Classifiers
  • k-Means Clustering

Cloudera Developer Training was great. I believe Cloudera is the best vendor evangelizing the big data movement. Thanks for all your help getting me started on this journey.

Cisco Systems

Learn more

CCA Spark and Hadoop Developer Certification

This course, along with the Developer Training for Spark and Hadoop, is excellent preparation for the CCA Spark and Hadoop Developer exam. Although we recommend further training and hands-on experience before attempting, both courses are foundational for passing the exam. 

Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.

Advance your career

Big data developers are among the world's most in-demand and highly-compensated technical roles. Check out some of the job opportunities currently listed that match the professional profile, many of which seek CCA qualification.

Private training

We also provide private training at your site, at your pace, and tailored to your needs.

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extention blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.