Cloudera and EMC Greenplum Team Up to Expand the Way Companies Collect, Process and Store Data


Integration of Cloudera’s Distribution for Hadoop With Greenplum Provides New Opportunities for Analysis of Structured and Complex Data

PALO ALTO, CA – September 22, 2010 – Cloudera, a leading provider of Hadoop-based data management software and services, and EMC Data Computing Division announced an alliance that will enable the integration of Cloudera’s Distribution for Hadoop (CDH) and Greenplum technology. The integration between CDH for collecting, consolidating and analyzing data with EMC Greenplum’s massively parallel processing database and enterprise data cloud platform will provide a robust architecture for collaborative analysis of large amounts of structured (i.e. online databases) and unstructured (i.e. log files, sensor data, documents) data.

As part of the alliance, Cloudera will build a connector between Cloudera’s Distribution for Hadoop and Greenplum technologies. The connector will enable high-speed bi-directional data transfer between the systems and will be jointly supported by both Cloudera and Greenplum. Additionally the Greenplum sales team will be trained on Cloudera’s suite of Apache Hadoop based products and services.

The alliance between EMC Greenplum and Cloudera will change the way customers collect, process and store data. Today, customers use a combination of database and archive storage products to collect, process and store complex and structured data. They are required to shuttle the data between systems, transforming and structuring it before they can analyze it. As data volumes and types grow, there is no single place to store and process all of this data.

Hadoop is becoming an increasingly popular solution to this problem. Customers are able to easily stage their data in a single Hadoop-based repository, leveraging its ability to inexpensively store both complex and structured data. They can then iterate over data using MapReduce to process and analyze the data, create meta-data layers, and transform the data for loading into a Greenplum database. Additionally, customers can combine long-term historical and new data enabling deeper insight and the detection of patterns not visible over short time periods.

“Together EMC and Cloudera have a real opportunity to help companies change the way they collect, process and store data,” said Michael Olson, CEO of Cloudera. “Organizations can use CDH to inexpensively capture complex and structured data, while Greenplum Chorus utilizes its cloud-based platform to discover data from a variety of sources and enables collaborative analysis for end users.”

“EMC is building the data system of the future, a system that brings together all of your data, all of your tools, and all of your people,” said Bill Cook, President and General Manager of EMC’s Data Computing Division. “EMC and Cloudera represent a powerful combination of what we can deliver to customers. By bringing together our solutions, our customers have a powerful tool for collaborative data analysis and can more quickly and effectively analyze data from a variety of sources.”

CDH is the most comprehensive and broadly adopted Hadoop-based platform on the market, lowering the barrier to Hadoop adoption by making it simple to install and easy to integrate into the data center. It consists of core Apache Hadoop and eight additional open source projects, all tested and integrated into a single platform, making it the most complete Hadoop-based distribution. For more information about CDH, visit

EMC will be exhibiting and presenting on its relationship with Cloudera at the annual Hadoop World conference taking place in New York City on October 12. Attend Hadoop World 2010 for additional examples of Hadoop in the enterprise.

About Cloudera

Cloudera, the leader in Apache Hadoop-based software and services, enables data driven enterprises to easily derive business value from all their structured and unstructured data. Cloudera's Distribution including Apache Hadoop (CDH), available to download for free at, is the most comprehensive, tested, stable and widely deployed distribution of Hadoop in commercial and non-commercial environments. For the fastest path to reliably using this completely open source technology in production for Big Data analytics and answering previously un-addressable big questions, organizations can subscribe to Cloudera Enterprise, comprised of Cloudera Manager software and Cloudera Support. Cloudera also offers training and certification on Apache technologies, as well as consulting services. As the top contributor to the Apache open source community and with tens of thousands of nodes under management across customers in financial services, government, telecommunications, media, web, advertising, retail, energy, bioinformatics, pharma/healthcare, university research, oil and gas and gaming, Cloudera's depth of experience and commitment to sharing expertise are unrivaled.

Connect with Cloudera

Read the blog:
Follow on Twitter:
Visit on Facebook: