Deploy a broad range of analytics in the public cloud quickly and easily.
CDP Data Hub is a powerful analytics service on Cloudera Data Platform (CDP) Public Cloud that makes it easier and faster to achieve high-value analytics from the Edge to AI in a familiar cluster model in the cloud. Featuring the widest range of analytical workloads—including streaming, ETL, data marts, databases, and machine learning—CDP Data Hub lets you easily move existing workloads from on premises to the cloud or build directly in the cloud.
The comprehensive, cloud-based solution is powered by Cloudera Runtime, a suite of integrated open source technologies, and built on SDX. It offers extensive choices in cluster shapes, workload types, pre-built templates, and configuration options, delivering an intuitive, customizable experience for users who are comfortable with traditional architectures.
Real-time data mart
Data engineering for complex pipelines
Streaming on hybrid cloud
Real-time data mart
Enable analytics on high volumes of fast-arriving data.
The Real Time Data Mart template in Data Hub lets you ingest millions of records per second, with in-place updates as needed. The data is immediately available in an optimal format for querying. This pattern is ideal for time-series applications, event analytics, CDC reconciliation, and real-time data processing pipelines. The template features the Apache Kudu analytic storage engine, Apache Impala for fast SQL execution, HUE for SQL development and analysis, and Apache Spark Streaming for stream processing/analytics.
Data Engineering for complex pipelines
Enrich, transform, and load data.
Data Hub enables you to enrich, transform, and cleanse data in order to create, execute, and manage end-to-end data pipelines with high degrees of flexibility and customization. The Data Engineering template enables you to execute a wide range of data processing workloads including batch and real-time stream processing using Apache Spark and Hive.
Collect, process, and build real-time analytics
DataFlow for CDP Data Hub is a comprehensive edge-to-cloud streaming data platform that addresses some of the streaming data challenges across hybrid environments with Apache NiFi and Kafka. It enables users to extend the same on-premises streaming experience of Cloudera DataFlow to the cloud without taxing enormous resources to develop, configure, and maintain them.
Build highly reliable enterprise-class applications.
Data Hub allows you to run high-performance NoSQL databases with support for ANSI SQL. This provides unparalleled scale and performance for business-critical operational applications with Apache Hbase. Operational Database provides evolutionary schema support that enables developers to leverage the power of data while preserving flexibility in application design. It also provides auto-scaling based on the workload utilization of the cluster to optimize infrastructure utilization and cost.