Update your browser to view this website correctly. Update my browser now
Current production version: 5.12.x
CDH is Cloudera's software distribution containing Apache Hadoop and related projects. All components are 100% open source (Apache License); see Release Notes. Unless otherwise specified, use these installation instructions for all CDH components.
Data serialization: rich data structures, a fast/compact binary format, and RPC.
Java library for more easily writing, testing, and running MR pipelines. Only in CDH!
Release: 1.1.0 (incubating)
Library of useful statistical UDFs for doing large-scale analysis. Only in CDH!
Collects/aggregates event data and streams it into HDFS or HBase in real time.
Infinitely scalable storage, resource management, and processing.
Scalable record and table storage for Hadoop with random read/write access.
SQL framework for doing batch transformation (ETL) of Hadoop data.
Release: 2.9.0 (incubating)
For high-concurrency, low-latency SQL queries across HDFS, S3, or HBase.
Kafka is distributed, resilient, publish-subscribe messaging service.
Libraries for clustering, classification and collaborative filtering.
A workflow scheduler for managing all your Hadoop jobs efficiently.
Provides compressed, efficient columnar data representation in Hadoop.
Offers a framework for batch analysis of large data sets using a high-level language.
Provides granular support, role-based access control for Hadoop users.
Does in-memory processing to make jobs faster and easier to write.
Release: 1.4.7 / 1.99.5
Moves data across relational databases and HDFS in a highly scalable way.
Highly reliable distributed coordination service used in HBase, among other places.
Free-text, Google-style search of Hadoop data for business users. Only in CDH!
Web-based GUI that makes it easy for users to work with Hadoop data.
APIs, examples, and docs for building apps on top of Hadoop. Only in CDH!