Developer Center
Cloudera Blog · Pig Posts
CDH2: “Testing” Heading Towards “Stable”

In September 2009, we announced the first release of CDH2, our current testing repository. Packages in our testing repository are recommended for people who want more features and are willing to upgrade as bugs are worked out. Our testing packages pass unit and functional tests but will not have the same “soak time” as our stable packages. A testing release represents a work in progress that will eventually be promoted to stable. It’s a long road of feedback, bug fixes, QA and testing to move from testing to stable. As someone who tracks the maturity of a testing build throughout its life cycle, I’m pleased to say we’ve put a lot of polish into this release.
(more…)

CDH2: Testing Release now with Pig, Hive, and HBase

At the beginning of September, we announced the first release of CDH2, our current testing repository. Packages in our testing repository are recommended for people who want more features and are willing to upgrade as bugs are worked out. Our testing packages pass unit and functional tests but will not have the same “soak time” as our stable packages. A testing release represents a work in progress that will eventually be promoted to stable.

We plan on pushing new packages into the testing repository every 3 to 6 weeks.  And it just so happens it is just about 3 weeks after we announced the first testing release. So it must be time for a new one. Here are some of the highlights:

CDH2: Cloudera’s Distribution for Hadoop 2

In March of this year, we released our distribution for Hadoop.  Our initial focus was on stability and making Hadoop easy to install. This original distribution, now named CDH1, was based on the most stable version of Apache Hadoop at the time:0.18.3. We packaged up Apache Hadoop, Pig and Hive into RPMs and Debian packages to make managing Hadoop installations easier.  For the first time ever, Hadoop cluster managers were able to bring up a deployment by running one of the following commands depending on your Linux distribution:

# yum install hadoop
# apt-get install hadoop

As proof of this, our easy-to-use Hadoop Amazon Machine Images (AMIs) use these commands at boot to install the latest release of CDH1 whenever a Hadoop cluster is launched on ec2.

Analyzing Apache logs with Pig

(guest blog post by Dmitriy Ryaboy)

A number of organizations donate server space and bandwidth to the Apache Foundation; when you download Hadoop, Tomcat, Maven, CouchDB, or any of the other great Apache projects, the bits are sent to you from a large list of mirrors. One of the ways in which Cloudera supports the open source community is to host such a mirror.

In this blog post, we will use Pig to examine the download logs recorded on our server, demonstrating several features that are often glossed over in introductory Pig tutorials—parameter substitution in PigLatin scripts, Pig Streaming, and the use of custom loaders and user-defined functions (UDFs). It’s worth mentioning here that, as of last week, the Cloudera Distribution for Hadoop includes a package for Pig version 0.2 for both Red Hat and Ubuntu, as promised in an earlier post. It’s as simple as apt-get install pig or yum install hadoop-pig.

Pig Training Now Available Online

Today I did a web search for “pig training” using my favorite search engine. I was wildly entertained by the results, and have embedded my favorite for your viewing pleasure.

However, when I stopped laughing, I realized that this probably isn’t what most people reading this blog would have hoped to find. To that end, I am happy to announce that Cloudera’s Online Hadoop Training now includes two sessions on Apache Pig.