Developer Center
Cloudera Blog · Training Posts

Hadoop and the Cloudera Data Platform.

Cloudera’s Hadoop Training Programs Expand Internationally

It’s been over a year now since we started offering Hadoop training in the Bay Area, and since then, we’ve put many of our introductory materials online (for free), and offer in-person public classes in cities around the US (click here for a full list of sessions). The response has been incredible, but one thing is painfully obvious: we’re not doing enough to meet the needs of the growing world-wide Hadoop community.

To that end, we’ve made investments in translating translating our materials into new languages and thinking about how to scale our training programs internationally.

As a first step, we’ll offer our three-day developer training session outside the US later this spring. We’ll announce cities and dates in the EU soon, but we’re happy to announce our first two sessions in Asia now:

Hadoop World: NYC 2009

To say we were surprised by the quality and quantity of submissions we received for Hadoop World: NYC 2009 would be an understatement. We were amazed at how many “normal” companies have come to use Hadoop for everything ranging from business intelligence to protein alignment. It’s truly exciting to see how a system originally designed to process and index the web has evolved to support the data-driven workloads of so many industries.

It’s with great joy that we invite you to come learn about what the following companies have done with Hadoop: About.com, Booz Allen Hamilton, China Mobile, ContextWeb, eBay, Facebook, IBM, Intel, JPMC, Microsoft, The New York Times, NexR, Rackspace, Vertica, Visa, Visible Measures, Yale, and Yahoo!

If you have ever wondered what Hadoop might be able to do for you, this is your chance to learn  both from leaders in the webspace and within your own industry.

Running the Cloudera Training VM in VirtualBox

Cloudera’s Training VM is one of the most popular resources on our website. It was created with VMware Workstation, and plays nicely with the VMware Player for Windows, Linux, and Mac. But VMware isn’t for everyone. Thomas Lockney has managed to get our VM image running on Virtual Box, and has written a step-by-step guide for the community. Thanks Thomas! – Christophe

I was quite pleased when I discovered that Cloudera had created a virtual machine image that could be used while working through their training material. It would make the process simpler, and it looked like a potentially useful environment for general Hadoop experimentation. However, their VM is built for VMware, which I stopped using a while back. However, as a heavy VirtualBox user, I knew that it would not be hard to get it running in my preferred desktop virtualization environment.

Here’s a step-by-step guide for getting Cloudera’s virtual machine image up and running. I’ll include screenshots for most of the steps to make it as clear as possible. I’ll assume you already have at least some familiarity with running VirtualBox (if not, there are plenty of good tutorials and references available online) and some experience with Ubuntu or some other fairly modern Linux desktop system.

Announcing Cloudera Certification for Hadoop

As Hadoop continues to turn heads at startups and big enterprises alike, Cloudera has received several requests to offer certification in addition to our popular training programs.

Certification is a critical component of any software ecosystem, and especially so for open source projects with quickly expanding user bases. Certification allows developers to ensure their skills are up to date, and allows employers and customers to confidently identify individuals that are up for the challenge of solving problems with Hadoop.

To that end, we are happy to announce Cloudera Certification for Hadoop.

Pig Training Now Available Online

Today I did a web search for “pig training” using my favorite search engine. I was wildly entertained by the results, and have embedded my favorite for your viewing pleasure.

However, when I stopped laughing, I realized that this probably isn’t what most people reading this blog would have hoped to find. To that end, I am happy to announce that Cloudera’s Online Hadoop Training now includes two sessions on Apache Pig.

Configuring Eclipse for Hadoop Development (a screencast)

One of the perks of using Java is the availability of functional, cross-platform IDEs.  I use vim for my daily editing needs, but when it comes to navigating, debugging, and coding large Java projects, I fire up Eclipse.

Typically, when you’re developing Map-Reduce applications, you simply point Eclipse at the Hadoop jar file, and you’re good to go.  (Cloudera’s Hadoop training VM has a fully-configured example.) However, when you want to dig deeper to explore—and modify—Hadoop’s internals themselves, you’ll want to configure Eclipse to build Hadoop.  Because there’s generated code and a complicated ant build.xml file, this takes some tinkering.  Now that I have the full Hadoop Eclipse experience going (it took me a few tries), I’ve prepared a screencast that will help guide you through it, from downloading Eclipse to debugging one of its unit tests.  You’ll also want to reference the EclipseEnvironment Hadoop wiki page, which has more details.


Eclipse for Hadoop Development from Cloudera.

Cloudera’s Basic Hadoop Training: Now Free Online

Exciting news: We’re providing our basic hadoop training for free online. We’ll still host basic courses live, but for folks who can’t make it to the Bay Area, or want to attend a more advanced training course first, we hope this proves useful.

There are 6 lectures, 2 hands-on activities, and 1 tutorial. We provide a virtual machine for the activities and tutorials so new users can get up and running right away. Topics include: