Fast analytics on fast data
Kudu can provide both inserts and updates, in addition to efficient columnar scans, enabling the Apache Hadoop™ ecosystem to tackle new analytic workloads.
Kudu fills the gap between HDFS and Apache HBase formerly solved with complex hybrid architectures, easing the burden on both architects and developers.
Kudu was specifically built for the Hadoop ecosystem, allowing Apache Spark™, Apache Impala, and MapReduce to process and analyze data natively. Additionally, Kudu tables can be joined with data in HDFS or HBase.
Common use cases
Kudu is designed to excel at use cases that require the combination of random reads/writes and the ability to do fast analytic scans—which previously required the creation of complex Lambda architectures. When combined with the broader Hadoop ecosystem, Kudu enables a variety of use cases, including:
- IoT and time series data
- Machine data analytics (network security, health, etc.)
- Online reporting
Integrated across the Ecosystem
Designed to work alongside the Apache Hadoop ecosystem, Kudu integrates tightly with Impala, Spark, and MapReduce. Data can be streamed from live, real-time sources and processed immediately upon arrival by any of those engines. The integration with Impala for BI and SQL analytics provides the ability to create an updateable, open-source data warehouse. Integration with Spark provides an easy blueprint for real-time applications.
Cloudera’s continuing open source innovation
Cloudera continues to be a driving force of innovation within the Apache Hadoop ecosystem, due in large part to the insights our large user base provides. Kudu is the result of us listening to the users’ need to create Lambda architectures to deliver the functionality needed for their use case. With Kudu, Cloudera has addressed the long-standing gap between HDFS and HBase: the need for fast analytics on fast data.