For data serialization: rich data structures, a fast/compact binary format, RPC, and more.
Unifies data storage (HDFS) and processing (MapReduce or Spark) for unlimited scalability.
Web-based GUI that makes it easy for users to work with Hadoop data.
Libraries for clustering, classification and collaborative filtering of Hadoop data.
Offers free-text, Google-style search of Hadoop data for business users. Only in CDH!
Highly reliable distributed coordination service used in HBase, among other places.
Java library for more easily writing, testing, and running MR pipelines. Only in CDH!
Scalable record and table storage for Hadoop data with real-time read/write access.
For interactive, low-latency SQL queries across HDFS or HBase (at high concurrency).
A workflow scheduler for managing all your Hadoop jobs efficiently.
Release: 1.4.0 (Incubating)
Provides granular, role-based access control for Hadoop users.
Release: 1.1.0 (incubating)
Library of useful statistical UDFs for doing large-scale analysis. Only in CDH!
Enables high-latency SQL queries for batch transformation of data.
APIs, examples, and docs for building apps on top of Hadoop. Only in CDH!
Provides compressed, efficient columnar data representation in Hadoop.
Does in-memory processing to make jobs faster and easier to write.
Collects/aggregates event data and streams it into HDFS or HBase in real time.
Kafka is distributed, resilient, publish-subscribe messaging service.
Mediates cluster management and monitoring between Impala and YARN.
Offers a framework for batch analysis of large data sets using a high-level language.
Release: 1.4.5 / 1.99.5
Moves data across relational databases and HDFS in a highly scalable way.