Apache Druid
Druid is an open-source analytics data store designed for business intelligence (OLAP) queries on event data. Druid provides low latency (real-time) data ingestion, flexible data exploration, and fast data aggregation.
How Druid Works
Druid is fast because data is converted into a heavily indexed columnar format that is ideal for typical OLAP query patterns. Druid is queried through Hive SQL, using the Druid to Hive connector included in HDP, or through a native REST API.

What Druid Does
Feature | Description |
---|---|
Sub-Second Queries | Druid delivers sub-second queries, even when you have terabytes of data and dozens of dimensions. |
Real-Time Data Ingestion | Druid makes real-time a reality. Query data seconds after it arrives. Native integration with Apache Kafka makes it simple to enable real-time analytics. |
Integrated with Apache Hive | Build OLAP cubes and run sub-second SQL queries using any Hive-compatible tool. |
Apache Ambari Integration | Apache Ambari makes deploying, configuring and monitoring Druid a breeze.. |
Focus for Druid
Cloudera focuses on enabling fast, scalable analytics that seamlessly combines historical and real-time data.
- Real-Time Analytics: The Druid / Hive connector lets you build OLAP cubes using SQL, or tap in to existing Druid cubes. Or take advantage of Hive’s powerful SQL support to perform deep analytics on your Druid data.
- Management: Apache Ambari makes it easy to deploy, configure, monitor and manage Druid clusters.
- Security: Druid now fully supports Kerberos and secure Hadoop, and Apache Ambari manages all the heavy lifting of securing your Druid cluster.
