Cloudera makes bold bet on strategic acquisition of Verta’s Operational AI Platform Read the blog

We taught the world the value of big data with open source, and our strong belief in the value of open source, open standards, and open markets are driving the next wave of innovation.

Innovating in open source

Some vendors consume the open source community’s activity; others help drive it. Cloudera leads in influencing Hadoop platform evolution by creating, contributing, and supporting new capabilities that meet your requirements for security, scale, and usability.

Curation of open standards

Cloudera has a long and proven track record of identifying, curating, and supporting open standards (including Apache HBase, Apache Spark, and Apache Kafka) that provide the mainstream, long-term architecture upon which new customer use cases are built.

Highest enterprise requirements

To ensure the best customer experience, Cloudera invests significant resources in multi-dimensional testing on real workloads before releases, as well as in supportability of the entire platform via extensive involvement in the open source community.

Our contributions to the open source community ensure we receive the latest innovations in return



Apache committer seats


PMC seats across 22 projects



Our open source ecosystem 

Apache Hadoop is an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Hadoop services are foundational to data storage, data processing, data access, data governance, security, and operations.

Apache Accumulo

A sorted, distributed key-value store with cell-based access control.

Apache Atlas

Agile enterprise regulatory compliance through metadata.

Apache Flink

A real-time stream processing framework for big data analytics and applications.

Apache Hadoop

A distributed storage and processing framework for large-scale data processing tasks.

Apache HBase

A non-relational (NoSQL) database that runs on top of HDFS.

Apache Hive

The de facto standard for SQL queries in Hadoop.

Apache Iceberg

An open table format for large-scale analytics, delivering the reliability and simplicity of SQL tables.

Apache Impala

The open source, analytic MPP database for Apache Hadoop that provides the fastest time-to-insight.

Apache Kafka 

A fast, scalable, fault-tolerant messaging system

Apache Knox Gateway 

A secure entry point for Hadoop clusters.

Apache Kudu

Storage for use cases that require fast analytics on rapidly changing data.

Apache NiFi

A real-time integrated data logistics and simple event processing platform.

Apache Oozie

The blueprint for enterprise Hadoop, including its original data storage and data processing layers.

Apache Phoenix

A massively parallel relational database engine supporting OLTP for Hadoop using Apache HBase.

Apache Ranger

Comprehensive security for Enterprise Hadoop.

Apache Solr

Rapid indexing & search on Hadoop.

Apache Spark

Spark adds in-Memory Compute for ETL, Machine Learning and Data Science Workloads to Hadoop.

Apache Sqoop

Efficiently transfers bulk data between Apache Hadoop and structured datastores.

Apache Tez 

A Framework for YARN-based, Data Processing Applications In Hadoop.

Apache  YARN

The Architectural Center of Enterprise Hadoop.

Apache Zeppelin

A completely open web-based notebook that enables interactive data analytics.

Apache ZooKeeper 

An open source server that reliably coordinates distributed processes.


A distributed file system designed for storing and managing vast data.


An open source SQL Workbench for Data Warehouses.

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.