First Distribution for Hadoop Adds Easy Installation, Simple Configuration and Commercial Support to the Open Source Technology Powering the World’s Largest Web Companies
BURLINGAME, CA–(Marketwire – March 16, 2009) – Cloudera, the commercial Hadoop™ company, today announced the general availability of the Cloudera Distribution for Hadoop, an open source product used to store and process complex, large-scale data: petabytes of information, often distributed across thousands of servers. Hadoop is in production use at most of the world’s largest Web companies, including Facebook, Google, and Yahoo!. Cloudera, with the financial backing of Accel Partners, is the first company to develop technology to bring Hadoop into enterprise data centers.
“After working with large Hadoop deployments at companies like Facebook, Google and Yahoo!, we came to realize that people needed Hadoop installation, configuration, and management to be much easier,” said Christophe Bisciglia, Cloudera founder and former manager of Google’s Hadoop cluster. “Cloudera is advancing Hadoop technology to make it easier for everyone to store and process the same types of complex, large-scale data that large Web companies are successfully using in their businesses.”
The Cloudera Distribution for Hadoop is freely available for download and immediate use. The product is distributed as a pre-packaged RPM bundle for Red Hat Linux systems or an Amazon EC2 image. To make Hadoop easy to install and use, Cloudera is launching a new portal called http://my.cloudera.com where people can use a Web-based configuration tool to create custom packages that are optimized to their specific needs. Settings for the cluster can also be saved on the portal to enable automatic updates. There is no charge to use http://my.cloudera.com. The RPM packages and EC2 images are freely distributed under the Apache 2 software license.
“Since we use Hadoop to help run our business, we are excited that Cloudera is offering commercial support for Hadoop and is making the technology more accessible to businesses,” said David C Peterson, SVP Technology at ContextWeb, Inc., a leading contextual advertising company and operator of the ADSDAQ Exchange. “Businesses need to feel confident that there is a company like Cloudera to stand behind Hadoop in order for this great open source technology to become widely used by companies.”
Cloudera is also making a pre-configured VMware image freely available for evaluation and use with their free online training. People that want to test the Cloudera Distribution for Hadoop or learn more about Hadoop and Cloudera’s online training can download the image and run it on their Linux, Mac or Windows desktop. The image ships with example code and all the components needed to use the Cloudera Distribution for Hadoop, including a master server and single node.
The Cloudera Distribution for Hadoop is a complete system to handle the processing and storage of big data. Major components include:
– HDFS – Hadoop Distributed File System, a distributed and fault-
tolerant file system designed to run on commodity hardware. HDFS assumes
that hardware failure is normal and provides quick detection and automatic
recovery. HDFS can support tens of millions of files in a single instance;
– MapReduce implementation to divide applications into many small blocks
of work for automatic parallelization and execution on large clusters.
Cloudera’s implementation of MapReduce takes care of partitioning of input
data, scheduling program execution across distributed machines, and the
handling of machine failure;
– Hive – a data warehousing infrastructure built on top of Hadoop that
provides tools for easy data summary generation, ad hoc querying, and
analysis. Hive comes with Hive QL, a simple query language based on SQL.
– Pig – a platform for analyzing large data sets in Hadoop using a high-
level language for expressing data analysis programs, PigLatin.
Additional information about:
– Cloudera Distribution for Hadoop with free access to a web configuration system, downloadable software, VMware image and documentation;
– The Story of the Cloudera Distribution for Hadoop – video featuring
CEO and founder (http://www.youtube.com/watch?v=Y3eL6DfNkTw);
– Screencast on configuring the Cloudera Distribution for Hadoop:
Cloudera is revolutionizing enterprise data management by offering the first unified Platform for big data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera's open source big data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 40,000 individuals worldwide. Over 1,700 partners and a seasoned professional services team help deliver greater time to value. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production.
Connect With Cloudera
Follow us on Twitter: http://twitter.com/cloudera
Visit us on Facebook: http://www.facebook.com/cloudera
Join the Cloudera Community: http://cloudera.com/community
Cloudera, Cloudera's Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Editionand CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.