Using Whirr to Launch Cloudera Manager
Cloudera Manager provides an installation wizard that installs Cloudera Manager, CDH and Impala on a cluster of Amazon Web Services (AWS) EC2 instances. See Installing Cloudera Manager and CDH on EC2. Alternatively, you can install Cloudera Manager using Whirr following the instructions here. Follow these instructions to start a cluster on Amazon Elastic Compute Cloud (EC2) running Cloudera Manager. Cloudera Manager allows you to install, run, and manage a Hadoop cluster.
At present you can launch and run only an MRv1 cluster; YARN is not supported.
This method uses Whirr to start a cluster with:
- one node running the Cloudera Manager Admin Console, and
- a user-selectable number of nodes for the Hadoop cluster itself.
Once Whirr has started the cluster, you use Cloudera Manager in the usual way.
Step 1: Set your AWS credentials as environment variables
Run the following commands from your local machine:
$ export AWS_ACCESS_KEY_ID=... $ export AWS_SECRET_ACCESS_KEY=...
Step 2: Install Whirr
Install CDH repositories; for example for CDH4, see the CDH4 Installation Guide.
Install the whirr package; for example for CDH4, see the Installing Whirr heading in Whirr Installation topic in the CDH4 Installation Guide.
Create environment variables:
$ export WHIRR_HOME=/usr/lib/whirr $ export PATH=$WHIRR_HOME/bin:$PATH
Step 3: Create a password-less SSH Key Pair
Create a password-less SSH Key Pair for Whirr to use:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_cm
Step 4: Get your Whirr-Cloudera-Manager Configuration
You can download a sample Whirr EC2 Cloudera Manager configuration as follows:
$ curl -O https://raw.github.com/cloudera/whirr-cm/master/cm-ec2.properties
To upload a Cloudera Manager License as part of the installation (Cloudera can provide this if you do not have one), place the license in a file cm-license.txt on the Whirr classpath (for example in $WHIRR_HOME/conf), using a command such as the following:
$ mv -v eval_acme_20120925_cloudera_enterprise_license.txt $WHIRR_HOME/conf/cm-license.txt
To upload a Cloudera Manager configuration as part of the installation, place the configuration in a file called cm-config.json on the Whirr classpath (for example in $WHIRR_HOME/conf). The format of this file should match the JSON as downloaded from the Cloudera Manager UI. For example:
$ curl -O https://raw.github.com/cloudera/whirr-cm/master/cm-config.json $ mv -v cm-config.json $WHIRR_HOME/conf/cm-config.json
Step 5: Launch a Cloudera Manager Cluster
The following command starts a cluster with five Hadoop nodes:
$ whirr launch-cluster --config cm-ec2.properties
- To change the number of nodes edit the whirr.instance-templates line in the cm-ec2.properties file. For example, to launch a cluster with 20 nodes: whirr.instance-templates=1 cmserver,20 cmagent
- To add a no-op node to use as gateway node: whirr.instance-templates=1 cmserver,20 cmagent,1 noop
Whirr reports progress to the console as it runs. The command exits when the cluster is ready to be used.
Using the Cluster
Once the Hadoop cluster is up and running you can run jobs from any Cloudera Manager Agent machine, or from a Cloudera Manager gateway node.
Using a Gateway Node (Optional)
In most cases, you will not a need a gateway node, but you may want to consider using one if you want to run jobs on a machine that is not also running CDH TaskTracker and DataNode processes. In that case, edit whirr.instance-templates to use the noop option shown in the previous section, launch the cluster, and then follow Cloudera Manager instructions to add a gateway role on the no-op node, which you can find in the documentation for your version of Cloudera Manager, for example at Adding Role Instances.
Then SSH to the gateway machine. Now you can interact with the cluster; for example, to list files in HDFS:
hadoop fs -ls /tmp
Shutting Down the Cluster
When you want to shut down the cluster, run the following command.
All data and state stored on the cluster will be lost.
whirr destroy-cluster --config cm-ec2.properties