Head-to-head Benchmark Shows Apache Impala Delivers Greater Cloud-Native Capabilities as well as Better Price Performance on AWS Compared to Amazon RedShift
PALO ALTO, Calif., September 22, 2016 — Cloudera, the global provider of the fastest, easiest, and most secure data management and analytics platform built on Apache Hadoop and the latest open source technologies, today released benchmark results that validate Cloudera’s modern analytic database solution, powered by Apache Impala, not only delivers unprecedented capabilities for cloud-native workloads but does so at better cost performance compared to alternatives. Impala uniquely offers elastic scalability, better flexibility, and direct Amazon S3 queryability unavailable from traditionally-architected systems such as Redshift. With a modern design, Impala decouples data and compute to provide the same high-performance SQL analytics whether cloud-natively over data in S3 or across a wide range of on-premise and cloud storage options. Furthermore, Impala enables all these capabilities while also delivering up to 275% more cost-efficiency and up to 10x greater performance compared to Amazon’s analytic database, Redshift, equating to more value all within an open platform
Using queries from the TPC-DS industry standard benchmark, Cloudera compared Impala running on the cloud (both cloud-natively over S3 and over local EBS storage) to Amazon Redshift (only able to run over its own storage on dedicated AWS instances). Results from the benchmark show:
- Impala is over 200% less costly and over 10x faster on S3 compared to a general purpose tuned Redshift
- Impala is still 8% less costly and 90% faster on S3 compared to a pre-tuned Redshift for specific fixed reporting queries
- Impala is 28-275% less costly and 42-400% faster on EBS compared to either pre-tuned or general purpose tuned Redshift
“Increasingly our customers are looking to move BI and analytic workloads to cloud environments to tap into the cost-effectiveness of elastic scale and greater flexibility. But they still require the high-performance analytics and big data agility they’re used to on-premises,” said Charles Zedlewski, Vice President, Products, at Cloudera. “Impala brings all its advantages it has over traditional, on-premise analytic databases to the cloud with a modern architecture that enables unprecedented agility no matter where the data lives. This comparison is clear evidence that Impala is unmatched for these BI and analytic workloads in the cloud.”
As businesses look to bring in more data from new sources, actively adjust models based on changing needs, and iteratively design for a variety of use cases, they need a modern analytic database that is built to address these requirements, without hindering business productivity. The rigid design and inelastic scale of traditionally-architected, monolithic systems, whether on-premise or in the cloud, simply are not able to keep up with today’s ever-changing business needs. Cloudera’s analytic database, powered by Impala as the interactive SQL engine, is purpose-built to bring high-performance SQL analytics to big data, with elastic scalability for cloud and on-premise deployments, as and when it is needed.
Impala works natively with data stored on a number of storage engines, including Amazon S3 object store, eliminating the need to move or load data specifically into Impala clusters. Especially for cloud deployments, this translates to cost-savings and efficiencies as transient clusters can be spun up as needed for BI and reporting workloads and, with cost-effective storage from S3, more data is quickly and readily available for analysis.
Advancing Impala’s performance, concurrency, and scalability is a consistent area of focus for Cloudera. The company has widened the performance gap between Impala’s analytic database architecture and other alternatives for both single and multi-user workloads. The latest release delivers 12x better performance on secure workloads compared to its two prior versions. Cloudera plans to continue expanding Impala’s value and price performance benefits by adding support in the future for other object stores in the public cloud.
Cloudera is revolutionizing enterprise data management by offering the first unified Platform for big data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera's open source big data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 40,000 individuals worldwide. Over 1,700 partners and a seasoned professional services team help deliver greater time to value. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production.
Connect With Cloudera
About Cloudera: cloudera.com/content/cloudera/en/about/company-profile.html
Follow us on Twitter: twitter.com/cloudera
Visit us on Facebook: facebook.com/cloudera
Join the Cloudera Community: cloudera.com/community
Cloudera, Cloudera's Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Editionand CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.