Platform features
Data management and analytics functions |
Projects & components |
CDP Private Cloud Base Edition |
Enterprise Data Hub |
HDP Enterprise Plus |
|
Distributed batch processing of large data sets | Apache Hadoop | ||||
Database for structured data storage of large tables | Apache HBase +conn, +indx | ||||
Data warehouse summarization & ad hoc querying | Apache Hive | ||||
Metadata store for Hive tables | Hive Metastore (HMS) | ||||
Workflow scheduler to manage Hadoop jobs | Apache Oozie | ||||
Columnar storage format for Hadoop ecosystem | Apache Parquet | ||||
Fast compute engine for ETL, ML, stream processing | Apache Spark | ||||
Bulk data between Hadoop and structured datastores | Apache Sqoop | ||||
Job scheduling and cluster resource management | YARN | ||||
Coordination service for distributed applications | Apache Zookeeper | ||||
Store and manage large data sets across a cluster | Apache Accumulo | ||||
Metadata management, governance & data catalog | Apache Atlas | ||||
OLTP and real-time SQL access of large datasets | Apache Phoenix | ||||
Manage data security across the Hadoop ecosystem | Apache Ranger | ||||
Smallest, fastest columnar storage for Hadoop | Apache ORC | ||||
Data-flow framework for batch, interactive use-cases | Apache Tez | ||||
Fast analytical queries on event-driven data | Apache Druid | ||||
Perimeter security governing access to Hadoop | Apache Knox | ||||
Easy interaction with Spark clusters via REST interface | Apache Livy | ||||
Cryptographic key | Ranger KMS | ||||
Notebook for interactive analytics | Apache Zeppelin | ||||
Data serialization system | Apache Avro | ||||
Manage and control Hadoop ecosystem functions | Cloudera Manager | ||||
SQL workbench for data warehouses | Hue | ||||
Distributed MPP SQL query engine for Hadoop | Apache Impala | ||||
Cryptographic key management | Key Trustee Server | ||||
Column-oriented data store for fast data analytics | Apache Kudu | ||||
Enterprise search platform | Apache Solr | ||||
Key Trustee Server hardware security integration | Key HSM | ||||
Transparently encrypts and secures data at rest | Navigator Encrypt | ||||
Real-time streaming data pipelines and apps | Apache Kafka | ||||
Distributed object store for Hadoop | Apache Ozone | ||||
Streams Messaging for data ingestion and buffering | Apache Kafka | ||||
Monitoring and management of Kafka clusters | Streams Messaging Manager | ||||
Replication of cross-cluster Kafka data | Streams Replication Manager | ||||
Integrate with data sources from Kafka | Kafka Connect | ||||
Governance and management of metadata and schemas | Schema Registry | ||||
Auto-balancing of Kafka clusters | Cruise Control | ||||
Light-weight stream processing engine for Kafka | Kafka Streams |