Your browser is out of date!

Update your browser to view this website correctly. Update my browser now


Easy, Productive Development

Simple, yet rich, APIs for Java, Scala, and Python open up data for interactive discovery and iterative development of applications. Through shared common code, data scientists and developers can increase productivity with rapid prototyping for batch and streaming applications, using the language and third-party tools on which they already rely.

Learn why Spark is a delight for developers

Explore the Developer Guide for Spark

Fast Processing

Take advantage of Spark’s distributed in-memory storage for high performance processing across a variety of use cases, including batch processing, real-time streaming, and advanced modeling and analytics. With significant performance improvements over MapReduce, Spark is the tool of choice for data scientists and analysts to turn their data into real results.

Learn why Apache Spark is a hit for data scientists

How-to: Analyze Fantasy Sports using Spark and SQL

Integrated across the platform

As an integrated part of Cloudera’s platform, Spark benefits from unified resource management (through YARN), simple administration (through Cloudera Manager), and compliance-ready security and governance (through Apache Sentry and Cloudera Navigator) — all critical for running in production.

Learn more

The One Platform Initiative

Apache Spark is well-positioned to replace MapReduce as the default data-processing engine in the Hadoop ecosystem, but for customers to fully embrace Spark for all production workloads, there is still work to be done to make it enterprise-grade. The One Platform Initiative is the driving force behind the community goal of reaching that objective.

To achieve this vision, Cloudera's committers, working alongside the community, will specifically address the issues shown in the diagram to the right (with some items already done).

The Cloudera difference for Apache Spark

As the first distribution to ship and support Spark, Cloudera not only has the most experience — with production customers across industries — but also has built the deepest engineering integration between Spark and the rest of the ecosystem, including bringing Spark to YARN and adding necessary security and management integrations (500+ patches contributed, to date).

Cloudera also has multiple Spark committers on staff, so you get direct access and influence to the roadmap based on your needs and use cases.

Concur case study

Video: Spark in the Enterprise, 2 Years Later

Partnered with the ecosystem

Seamlessly integrate with the tools your data scientists and developers are already using by leveraging Cloudera’s 1,700+ partner ecosystem. With a robust partner certification program and a dedicated Spark Partner Accelerator program, we are continuously working to build out production-hardened integrations between Spark and the most popular third-party tools.

Learn more about our partners

Expert support for Apache Spark

Cloudera has Spark experts across the globe, for world-class support 24/7. With more experience across more customers, for more use cases, Cloudera is the leader in Spark support so you can focus on results.

Get started now


Get the support you need to move quickly from concept to production


Learn how to deploy a Cloudera data management solution for your enterprise


Browse our latest product documentation