For data serialization: rich data structures, a fast/compact binary format, RPC, and more.
Unifies data storage (HDFS) and processing (MapReduce or Spark) for unlimited scalability.
For interactive, low-latency SQL queries across HDFS or HBase (at high concurrency).
A workflow scheduler for managing all your Hadoop jobs efficiently.
Release: 1.2.0 (Incubating)
Provides granular, role-based access control for Hadoop users. Only in CDH!
Java library for more easily writing, testing, and running MR pipelines. Only in CDH!
Scalable record and table storage for Hadoop data with real-time read/write access.
APIs, examples, and docs for building apps on top of Hadoop. Only in CDH!
Provides compressed, efficient columnar data representation in Hadoop.
Does in-memory processing to make jobs faster and easier to write. Only in CDH!
Release: 1.1.0 (incubating)
Library of useful statistical UDFs for doing large-scale analysis. Only in CDH!
Enables high-latency SQL queries for batch transformation of data.
Mediates cluster management and monitoring between Impala and YARN.
Offers a framework for batch analysis of large data sets using a high-level language.
Release: 1.4.4 / 1.99.3
Moves data across relational databases and HDFS in a highly scalable way.
Collects/aggregates event data and streams it into HDFS or HBase in real time.
Web-based GUI that makes it easy for users to work with Hadoop data.
Libraries for clustering, classification and collaborative filtering of Hadoop data.
Offers free-text, Google-style search of Hadoop data for business users. Only in CDH!
Highly reliable distributed coordination service used in HBase, among other places.