This is the documentation for CDH 5.1.x.
Documentation for other versions is available at Cloudera Documentation.

Launching a Cluster

To launch a cluster:

$ whirr launch-cluster --config hadoop.properties

As the cluster starts up, messages are displayed in the console. You can see debug-level log messages in a file named whirr.log in the directory where you ran the whirr command. After the cluster has started, a message appears in the console showing the URL you can use to access the web UI for Whirr.

Running a Whirr Proxy

For security reasons, traffic from the network where your client is running is proxied through the master node of the cluster using an SSH tunnel (a SOCKS proxy on port 6666). A script to launch the proxy is created when you launch the cluster, and may be found in ~/.whirr/<cluster-name>.

To launch the Whirr proxy:

  1. Run the following command in a new terminal window:
    $ . ~/.whirr/myhadoopcluster/hadoop-proxy.sh
  2. To stop the proxy, kill the process by pressing Ctrl-C.

Running a MapReduce job

After you launch a cluster, a hadoop-site.xml file is automatically created in the directory ~/.whirr/<cluster-name>. You need to update the local Hadoop configuration to use this file.

To update the local Hadoop configuration to use hadoop-site.xml:

  1. On all systems, type the following commands:
    $ cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.whirr
    $ rm -f /etc/hadoop/conf.whirr/*-site.xml
    $ cp ~/.whirr/myhadoopcluster/hadoop-site.xml /etc/hadoop/conf.whirr
  2. If you are using an Ubuntu, Debian, or SLES system, type these commands:
    $ sudo update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.whirr 50
    $ update-alternatives --display hadoop-conf
  3. If you are using a Red Hat system, type these commands:
    $ sudo alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.whirr 50
    $ alternatives --display hadoop-conf
  4. You can now browse HDFS:
    $ hadoop fs -ls /

To run a MapReduce job, run these commands:

  • For MRv1:
$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce
$ hadoop fs -mkdir input
$ hadoop fs -put $HADOOP_MAPRED_HOME/CHANGES.txt input
$ hadoop jar $HADOOP_MAPRED_HOME/hadoop-examples.jar wordcount input output
$ hadoop fs -cat output/part-* | head
  • For YARN:
$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
$ hadoop fs -mkdir input
$ hadoop fs -put $HADOOP_MAPRED_HOME/CHANGES.txt input
$ hadoop jar $HADOOP_MAPRED_HOME/hadoop-mapreduce-examples.jar wordcount input output
$ hadoop fs -cat output/part-* | head

Destroying a cluster

When you are finished using a cluster, you can terminate the instances and clean up the resources using the commands shown in this section.

 

WARNING

All data will be deleted when you destroy the cluster.

To destroy a cluster:

  1. Run the following command to destroy a cluster:
    $ whirr destroy-cluster --config hadoop.properties
  2. Shut down the SSH proxy to the cluster if you started one earlier.