We taught the world the value of big data with open source, and our strong belief in the value of open source, open standards, and open markets are driving the next wave of innovation.

Innovating in open source

Some vendors consume the open source community’s activity; others help drive it. Cloudera leads in influencing Hadoop platform evolution by creating, contributing, and supporting new capabilities that meet your requirements for security, scale, and usability.

Curation of open standards

Cloudera has a long and proven track record of identifying, curating, and supporting open standards (including Apache HBase, Apache Spark, and Apache Kafka) that provide the mainstream, long-term architecture upon which new customer use cases are built.

Highest enterprise requirements

To ensure the best customer experience, Cloudera invests significant resources in multi-dimensional testing on real workloads before releases, as well as in supportability of the entire platform via extensive involvement in the open source community.

Our contributions to the open source community ensure we receive the latest innovations in return

200+

Apache committer seats

65

PMC seats across 22 projects

>35

projects

Our open source ecosystem

Apache Hadoop is an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Hadoop services are foundational to data storage, data processing, data access, data governance, security, and operations.

Apache Accumulo

A sorted, distributed key-value store with cell-based access control.

Apache Atlas

Agile enterprise regulatory compliance through metadata.

Apache Flink

A real-time stream processing framework for big data analytics and applications.

Apache Hadoop

A distributed storage and processing framework for large-scale data processing tasks.

Apache HBase

A non-relational (NoSQL) database that runs on top of HDFS.

Apache Hive

The de facto standard for SQL queries in Hadoop.

Apache Iceberg

An open table format for large-scale analytics, delivering the reliability and simplicity of SQL tables.

Apache Impala

The open source, analytic MPP database for Apache Hadoop that provides the fastest time-to-insight.

Apache Kafka

A fast, scalable, fault-tolerant messaging system

Apache Knox Gateway

A secure entry point for Hadoop clusters.

Apache Kudu

Storage for use cases that require fast analytics on rapidly changing data.

Apache NiFi

A real-time integrated data logistics and simple event processing platform.

Apache Oozie

The blueprint for enterprise Hadoop, including its original data storage and data processing layers.

Apache Phoenix

A massively parallel relational database engine supporting OLTP for Hadoop using Apache HBase.

Apache Ranger

Comprehensive security for Enterprise Hadoop.

Apache Solr

Rapid indexing & search on Hadoop.

Apache Spark

Spark adds in-Memory Compute for ETL, Machine Learning and Data Science Workloads to Hadoop.

Apache Sqoop

Efficiently transfers bulk data between Apache Hadoop and structured datastores.

Apache Tez

A Framework for YARN-based, Data Processing Applications In Hadoop.

Apache YARN

The Architectural Center of Enterprise Hadoop.

Apache Zeppelin

A completely open web-based notebook that enables interactive data analytics.

Apache ZooKeeper

An open source server that reliably coordinates distributed processes.

HDFS

A distributed file system designed for storing and managing vast data.

Hue

An open source SQL Workbench for Data Warehouses.

Logos for Apache Airflow, Flink, Apache Hbase, Hive, Iceberg, Apache Impala, Apache Kafka, Apache Kudu, Apache Nifi, Apache Ozone, Apache Phoenix, Apache Ranger, Apache Spark, TensorFlow, Accumulo, Apache Arrow, Apache Atlas

Get started now

Contact Sales

Explore professional services

Get training

Find documentation

Misa Amane

Open source & open standards

Capitalize on the collective wisdom of the open source community.