New to Hadoop
Are you a technical end-user learning Apache Hadoop? Consider taking the following steps.
Knowing where to start can be difficult, but the structured, curated list of resources below should help.
1. Read Up on Background
Getting a bit of background information first is always a good idea.
- Ask Bigger Questions: A Round Table Discussion
- Hadoop: What It Is, How It Works, and What it Can Do (via O'Reilly)
- Hadoop FAQ - Getting Started Version (via Gwen Shapira)
- Google's original MapReduce paper
- Cloudera Glossary
- Video: Overview of Hadoop Platform Components
CDH is Cloudera's 100% open-source, enterprise-ready distro of Apache Hadoop and related projects. Install it directly for the best Hadoop experience, or test-drive it in VM form first (or spin-up a cluster in the cloud).
- Download the QuickStart VM
- Install Cloudera Standard to get started for free
- How-to: Install CDH and Impala on EC2 using Cloudera Manager Free Edition
- How-to: Deploy a CDH Cluster in Skytap Cloud
- How-to: Create a Hadoop Cluster POC using CDH on EC2 (via Randy Zwitch)
- How-to: Deploy CDH on Windows Azure Virtual Machines using Cloudera Manager (via Thomas Conte)
It's the quickest way to become dangerous.
When "being dangerous" isn't good enough, it's time to train with Cloudera University.
Reading books, or at least keeping them around for reference, is the best way to progressively deepen your knowledge.
- Hadoop, The Definitive Guide - by Tom White
- Hadoop Operations - by Eric Sammer
- HBase, The Definitive Guide - by Lars George
- HBase in Action - by Nick Dimiduk & Amandeep Khurana
Make an impact on the quality and direction of the Hadoop stack - by reporting bugs and/or becoming an active contributor to a project.
- See the Community page