For data serialization: rich data structures, a fast/compact binary format, RPC, and more.
Unifies data storage (HDFS) and processing (MapReduce or Spark) for unlimited scalability.
For interactive, low-latency SQL queries across HDFS or HBase (at high concurrency).
A workflow scheduler for managing all your Hadoop jobs efficiently.
Release: 1.4.0 (Incubating)
Provides granular, role-based access control for Hadoop users.
Java library for more easily writing, testing, and running MR pipelines. Only in CDH!
Scalable record and table storage for Hadoop data with real-time read/write access.
APIs, examples, and docs for building apps on top of Hadoop. Only in CDH!
Provides compressed, efficient columnar data representation in Hadoop.
Does in-memory processing to make jobs faster and easier to write.
Release: 1.1.0 (incubating)
Library of useful statistical UDFs for doing large-scale analysis. Only in CDH!
Enables high-latency SQL queries for batch transformation of data.
Mediates cluster management and monitoring between Impala and YARN.
Offers a framework for batch analysis of large data sets using a high-level language.
Release: 1.5 / 1.99.3
Moves data across relational databases and HDFS in a highly scalable way.
Collects/aggregates event data and streams it into HDFS or HBase in real time.
Web-based GUI that makes it easy for users to work with Hadoop data.
Libraries for clustering, classification and collaborative filtering of Hadoop data.
Offers free-text, Google-style search of Hadoop data for business users. Only in CDH!
Highly reliable distributed coordination service used in HBase, among other places.