Update your browser to view this website correctly. Update my browser now
Current production version: 5.14.0
CDH is Cloudera's software distribution containing Apache Hadoop and related projects. All components are 100% open source (Apache License); see Release Notes. Unless otherwise specified, use these installation instructions for all CDH components.
Data serialization: rich data structures, a fast/compact binary format, and RPC.
Java library for more easily writing, testing, and running MR pipelines. Only in CDH!
Release: 1.1.0 (incubating)
Library of useful statistical UDFs for doing large-scale analysis. Only in CDH!
Collects/aggregates event data and streams it into HDFS or HBase in real time.
Infinitely scalable storage, resource management, and processing.
Scalable record and table storage for Hadoop with random read/write access.
SQL framework for doing batch transformation (ETL) of Hadoop data.
For high-concurrency, low-latency SQL queries across HDFS, S3, or HBase.
Kafka is distributed, resilient, publish-subscribe messaging service.
Libraries for clustering, classification and collaborative filtering.
A workflow scheduler for managing all your Hadoop jobs efficiently.
Provides compressed, efficient columnar data representation in Hadoop.
Offers a framework for batch analysis of large data sets using a high-level language.
Provides granular support, role-based access control for Hadoop users.
Does in-memory processing to make jobs faster and easier to write.
Release: 1.4.7 / 1.99.5
Moves data across relational databases and HDFS in a highly scalable way.
Highly reliable distributed coordination service used in HBase, among other places.
Free-text, Google-style search of Hadoop data for business users. Only in CDH!
Web-based GUI that makes it easy for users to work with Hadoop data.
APIs, examples, and docs for building apps on top of Hadoop. Only in CDH!
Completes Hadoop's storage layer to enable fast analytics on fast data.