Cloudera Enterprise 5.7 Improves Data Processing with Hive-on-Spark Support and Provides Visibility into Multi-Tenant Usage
PALO ALTO, Calif., April 7, 2016 — Cloudera, the global provider of the fastest, easiest, and most secure data management and analytics platform built on Apache Hadoop and the latest open source technologies, today announced the general availability of Cloudera Enterprise 5.7. This new release provides leading performance across key workloads - including an average 3x improvement for data processing with added support of Hive-on-Spark, and an average 2x improvement for business intelligence analytics with updates to Apache Impala. Additionally, this release adds visibility into multi-tenant usage across these workloads for management efficiency and optimal resourcing. Cloudera Enterprise 5.7 is another leap forward for Hadoop as it grows to support new and changing use cases, and indicative of Cloudera’s leadership in ensuring these modern enterprises can fully embrace the platform across the business.
“Hadoop has evolved significantly in the past ten years, and with every advancement, we see the potential for new applications and use cases, while improving what’s already being done,” said Charles Zedlewski, vice president, Products at Cloudera. “The advancement of data engineering and ETL development with Hive-on-Spark marks a critical milestone in this evolution - further solidifying Spark’s status as the standard data processing engine in Hadoop. Data engineering is only a part of the story in today’s business though and, with the 5.7 release, our customers can better enable a wide range of users across the platform, all while maintaining fast performance, easy management, and compliance-ready security.”
ETL development and batch processing remains one of the most common use cases for Hadoop. Apache Hive has long played a key role for these workloads, though traditionally leveraging MapReduce as the underlying execution engine. However, with its easy development and faster performance compared to MapReduce, Apache Spark is playing an increasingly important role and is primed to replace MapReduce for these workloads. Last year Cloudera launched the One Platform Initiative as the roadmap to complete the transition from MapReduce to Spark and they are leading development to better integrate Spark with Hadoop - ensuring it meets the enterprise requirements for even the largest-scale production workloads. With the release of Hive-on-Spark in Cloudera 5.7, it brings Spark one step closer as developers can now leverage the powerful data processing capabilities of Spark, while continuing to use familiar Hive, and delivers a 3x performance improvement on average. Hive-on-Spark is a community-driven initiative launched by Cloudera, IBM, Intel, MapR, and others, and involved customers across a range of industries - including, advertising, financial services, and insurance - as part of an early access program for further development.
For further consistency, Cloudera has worked with their 2,300+ partner ecosystem to ensure customers can continue to use the leading data integration and preparation tools with Hive-on-Spark, without disrupting the business. Partners such as: BMC, ClearStory Data, Elastic, NGDATA, Solix, Trillium Software, Zementis, and others are working with Cloudera to certify their technologies for a seamless transition. (See below for their supporting statements.)
Being able to support multiple use cases across the same, shared data within a single cluster is a key benefit for Hadoop. With Cloudera Enterprise, administrators can easily provide these users and applications with the right resources to run and meet critical Service Level Agreements (SLAs). With this recent release, these administrators get full visibility into historical usage and efficiency across users, tenants, and applications. The new Cluster Utilization Reporting feature, built-into Cloudera Manager ensures efficient operations and proper resource allocation between groups and workload types; helps guarantee SLAs are being met; and provides simple troubleshooting of job and query performance issues.
Additional features in Cloudera 5.7 include:
- 2x performance improvements for BI analytics: Impala continues to maintain its performance lead as the fastest analytic SQL engine for Hadoop through dynamic partition pruning, faster query startup, runtime filters, and more
- Simplified path to production: Cloudera Manager includes cluster templates that provide a simple workflow to easily replicate configuration settings to new clusters - making it easy to move from a well-tuned test environment to production, scale-out across regions, or quickly revert to a known good configuration when problems occur
- Optimized data governance: Cloudera Navigator opens up data management and governance to the business user with simplified lineage for establishing trust and provenance of data, and adds managed metadata for improved discoverability and consistency across systems
Cloudera 5.7 is now available on www.cloudera.com/downloads
Additional Resources for Cloudera 5.7
Partner’s Support Cloudera 5.7
“Cloudera’s investment in Hive-on-Spark is of significant value to existing Hive technologies and users. We are delighted to support this new innovation with the industry leading Control-M for Hadoop. Along with Hive-on-Spark job scheduling, Control-M customers also get support for Spark SQL, Spark Streaming, Shell Scripting, and much more. Our partnership with Cloudera continues to bring new value to Hadoop users worldwide”
-- Tim Eusterman, senior director of Solutions Marketing for BMC Workload Automation
"We are excited to further solidify our commitment to Spark with support of Hive-on-Spark in our cloud-based data analytics solution. Hive integration with the Spark execution engine enables a seamless ingest, query and data inference experience for fast cycle analysis when blending and harmonizing diverse, large-scale data to reach amplified business insights.”
-- Tim Howes, Chief Technology Officer
"Elastic enables a real-time search option for the latest Cloudera Enterprise innovation, Cloudera's Hive-on-Spark, benefiting users transitioning from MapReduce to Spark who wish to use Elasticsearch. This certified integration extends ongoing collaboration between Elastic and Cloudera for Elasticsearch Hadoop and Spark deployments."
-- Costin Leau, Elastic Engineering Lead
“NGDATA provides a complete solution for customer analytics and CX optimization to drive next-best-offer scenarios for banks, media companies, and telcos. As our NBO pipeline is designed and optimized for real-time execution on customer behavioral data, business managers have similar performance expectations towards ad-hoc reporting. Thanks to Hive-on-Spark, they can now experience a dramatic speed increase of these reporting jobs with virtually no reconfiguration required. We are excited to work with Cloudera on advancing the adoption of Spark in the ecosystem, as it provides tangible business benefits to our customers.”
-- Steven Noels, CTO and Co-Founder
“Certification against Cloudera 5.7 means improved performance and usability for Solix Big Data Suite customers. With Hive-on-Spark, most any structured data workload may now be run on Apache Hadoop.”
-- Sai Gundavelli, CEO
“As businesses increasingly rely on Hadoop to process high volume, complex data, they also want to accelerate time to value of data-driven initiatives. The release of Cloudera Enterprise 5.7 demonstrates Cloudera’s commitment to providing innovative solutions that optimize speed and efficiency for data migration, data integration, and operationalized data processing with Hadoop while also simplifying Hadoop management and oversight. With Hive-on-Spark powering Trillium Refine™, enterprises can speed data preparation and processing to better enable analytics and make faster business decisions that drive growth.”
-- Keith Kohl, vice president, product management
"By adding Hive-on-Spark support to its Hadoop ecosystem, Cloudera enables users to apply data science more efficiently. As data science becomes a key differentiator for smarter business applications, we are proud to partner with Cloudera in delivering the Zementis Universal PMML Plug-in (UPPI) as a common, standards-based execution engine to operationalize machine learning and advanced predictive analytics across Hive, Spark and Storm.”
--Dr. Michael Zeller, CEO
Cloudera delivers the modern data management and analytics platform built on Apache Hadoop and the latest open source technologies. The world’s leading organizations trust Cloudera to help solve their most challenging business problems with Cloudera Enterprise, the fastest, easiest and most secure data platform available for the modern world. Our customers efficiently capture, store, process and analyze vast amounts of data, empowering them to use advanced analytics to drive business decisions quickly, flexibly and at lower cost than has been possible before. To ensure our customers are successful, we offer comprehensive support, training and professional services. Learn more at cloudera.com.
Connect with Cloudera
About Cloudera: http://www.cloudera.com/about-cloudera.html
Follow us on Twitter: twitter.com/cloudera
Visit us on Facebook: facebook.com/cloudera
Join the Cloudera Community: community.cloudera.com
Cloudera, Cloudera's Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Edition, Cloudera Navigator Optimizer and CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trademarks of their respective owners.