Innovating in open source
Some vendors consume the open source community’s activity; others help drive it. Cloudera leads in influencing Hadoop platform evolution by creating, contributing, and supporting new capabilities that meet customer requirements for security, scale, and usability.
Curation of open standards
Cloudera has a long and proven track record of identifying, curating, and supporting open standards (including Apache HBase, Apache Spark, and Apache Kafka) that provide the mainstream, long-term architecture upon which new customer use cases are built.
Read about our commitment to open source
Highest enterprise requirements
To ensure the best customer experience, Cloudera invests significant resources in multi-dimensional testing on real workloads before releases, as well as in supportability of the entire platform via extensive involvement in the open source community.
Open source big data: An ecosystem of projects
Apache Hadoop is an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Hadoop services provide for data storage, data processing, data access, data governance, security, and operations.
Data processing
Apache Ambari
A completely open source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters.
Apache Crunch
A Java library provides a framework for writing, testing, and running MapReduce pipelines
Security, governance, and metadata
Data warehouse
Apache Phoenix
An open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase
Apache Sqoop
Efficiently transfers bulk data between Apache Hadoop and structured datastores
Data engineering
Apache Druid
An open-source analytics data store designed for business intelligence (OLAP) queries on event data.
Apache Oozie
The blueprint for Enterprise Hadoop includes Apache Hadoop’s original data storage and data processing layers
Apache Spark
Spark adds in-Memory Compute for ETL, Machine Learning and Data Science Workloads to Hadoop
Operations
Apache Impala
The open source, analytic MPP database for Apache Hadoop that provides the fastest time-to-insight.
Apache Zeppelin
A completely open web-based notebook that enables interactive data analytics
Cloudera's open source credentials
- Cloudera is the first and original source of a supported, 100% open source Hadoop distribution (CDH)—which has been downloaded more than all others combined.
- Cloudera has contributed more code and features to the Hadoop ecosystem, not just the core, and shipped more of them, than any competitor.
- Cloudera employs the most contributors and committers across open standards in the ecosystem, not just the core.
- Cloudera employees have founded more (20+) successful Hadoop ecosystem projects than any competitor, including Apache Hadoop itself.
- For components that are supported by multiple vendors (i.e., standards), more than half of all Apache JIRAs that are assigned to platform vendor employees are closed/resolved by Cloudera employees.
- Cloudera engineers currently occupy more than 100 Apache committer seats (and more than 80 project management committee seats) across all projects that we support.

The best support for customer success
Cloudera’s global support team of project committers across the Hadoop ecosystem represents the largest, most experienced engineering resource dedicated full-time to customer success.