Your browser is out of date

Update your browser to view this website correctly. Update my browser now


Apache Spark is easy to develop with and fast to run. Understand how to use K-means for clustering data, where you can then find anomalies from the typical patterns for fraud detection, network intrusions, and such. Learn how Spark takes advantage of Resilient Distributed Datasets (RDD) – parallel transformations on data in stable storage.

Complete the form to view this resource. We believe strongly in user privacy.

Yes, I would like to be contacted by Cloudera for newsletters, promotions, events and marketing activities. Please read our privacy and data policy.
Yes, I consent to my information being shared with Cloudera's solution partners to offer related products and services. Please read our privacy and data policy.
I agree to Cloudera's terms and conditions.

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.