CDH is Cloudera's software distribution containing Apache Hadoop and related projects. All components are 100% open source (Apache License); see Release Notes. Unless otherwise specified, use these installation instructions for all CDH components.
Data serialization: rich data structures, a fast/compact binary format, and RPC.
Completes Hadoop's storage layer to enable fast analytics on fast data.
APIs, examples, and docs for building apps on top of Hadoop. Only in CDH!
Collects/aggregates event data and streams it into HDFS or HBase in real time.
Infinitely scalable storage, resource management, and processing.
Scalable record and table storage for Hadoop with random read/write access.
SQL framework for doing batch transformation (ETL) of Hadoop data.
For high-concurrency, low-latency SQL queries across HDFS, S3, or HBase.
Kafka is distributed, resilient, publish-subscribe messaging service.
A workflow scheduler for managing all your Hadoop jobs efficiently.
Provides compressed, efficient columnar data representation in Hadoop.
Release: 1.4.7 / 1.99.5
Moves data across relational databases and HDFS in a highly scalable way.
Offers a framework for batch analysis of large data sets using a high-level language.
Provides granular support, role-based access control for Hadoop users.
Does in-memory processing to make jobs faster and easier to write.
Web-based GUI that makes it easy for users to work with Hadoop data.
Highly reliable distributed coordination service used in HBase, among other places.
Free-text, Google-style search of Hadoop data for business users. Only in CDH!