This is the documentation for Cloudera Manager 5.0.0.
Documentation for other versions is available at Cloudera Documentation.

Installation Path B - Manual Installation Using Cloudera Manager Packages

Before You Begin

Install the Oracle JDK

Install the Oracle Java Development Kit (JDK) on each of your cluster hosts where you want to run Hadoop before installing Cloudera's packages. Cloudera Manager can manage both CDH 4 and CDH 5 hosts, and the required JDK version varies accordingly.

Install and Configure External Databases

The Cloudera Manager configuration, as well as the other monitoring and management information is stored in databases. If you intend to use an external database for monitoring and management information, install and configure them following the instructions in MySQL Database, Oracle Database, and External PostgreSQL Database.

(CDH 5 only) Install Python 2.6 or 2.7

Python 2.6 or 2.7 is required to run Hue. RHEL 5 and CentOS 5, in particular, require the EPEL repository package.

In order to install packages from the EPEL repository, first download the appropriate repository rpm packages to your machine and then install Python using yum. For example, use the following commands for RHEL 5 or CentOS 5:
$ su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm'
...
$ yum install python26

Establish Your Cloudera Manager Repository Strategy

Cloudera recommends installing products using package management tools such as yum for Red Hat compatible systems, zypper for SLES, or apt-get for Debian/Ubuntu. These tools depend on access to repositories to install software. For example, Cloudera maintains Internet-accessible repositories for CDH and Cloudera Manager installation files. Strategies for installing Cloudera Manager include:

  • Standard Cloudera repositories. For this method, ensure you have added the required repository information to your systems. For Cloudera repository locations and client .repo files Cloudera Manager Version and Download Information.
  • Internally hosted repositories. You might use internal repositories for environments where hosts do not have access to the Internet. In such a case, ensure your environment is properly prepared. For more information, see Understanding Custom Installation Solutions.

Red Hat-compatible

Navigate to the Cloudera Manager repo file (cloudera-manager.repo) for your system and save it in the /etc/yum.repos.d/ directory.

SLES

  1. Run the following command:
    $ sudo zypper addrepo -f http://archive.cloudera.com/cm5/sles/11/x86_64/cm/cloudera-manager.repo
  2. Update your system package index by running:
    $ sudo zypper refresh

Ubuntu or Debian

  1. Click the entry in the table at Cloudera Manager Version and Download Information that matches your Ubuntu or Debian system
  2. Navigate to the Cloudera Manager list file (cloudera.list).
  3. Copy the content of that file and append it to the content of the cloudera.list you just saved in the /etc/apt/sources.list.d/ directory.
  4. Update your system package index by running:
    $ sudo apt-get update
For example, to install Cloudera Manager for 64-bit Ubuntu Lucid, your cloudera.list file should look like:
deb http://archive.cloudera.com/cm5/ubuntu/lucid/amd64/cm lucid-cm4 contrib
deb-src http://archive.cloudera.com/cm5/ubuntu/lucid/amd64/cm lucid-cm4 contrib

Install the Cloudera Manager Server Packages

Install the Cloudera Manager Server packages either on the host where the database is installed, or on a host that has access to the database. This host need not be a host in the cluster that you want to manage with Cloudera Manager. The Cloudera Manager Server does not require CDH to be installed on the same host. On the Cloudera Manager Server host, type the following commands to install the Cloudera Manager packages.

RHEL, if you have a yum repo configured:
$ sudo yum install cloudera-manager-daemons cloudera-manager-server
RHEL, if you're manually transferring RPMs:
$ sudo yum --nogpgcheck localinstall cloudera-manager-daemons-*.rpm
$ sudo yum --nogpgcheck localinstall cloudera-manager-server-*.rpm
SLES
$ sudo zypper install cloudera-manager-daemons cloudera-manager-server
Debian/Ubuntu
$ sudo apt-get install cloudera-manager-daemons cloudera-manager-server

Configure a Database for the Cloudera Manager Server

Prepare the database as described in Cloudera Manager Server Database.

Install the Cloudera Manager Agent Packages

On every Cloudera Manager Agent host (including those that will run one or more of the Cloudera Manager Services: Service Monitor, Activity Monitor, Event Server, Alert Publisher, Reports Manager), use the following commands to install the Cloudera Manager packages:

RHEL, if you have a yum repo configured:
$ sudo yum install cloudera-manager-agent cloudera-manager-daemons
RHEL, if you're transferring manually RPMs:
$ sudo yum --nogpgcheck localinstall cloudera-manager-agent-package.*.x86_64.rpm cloudera-manager-daemons
SLES
$ sudo zypper install cloudera-manager-agent cloudera-manager-daemons
Debian/Ubuntu
$ sudo apt-get install cloudera-manager-agent cloudera-manager-daemons

Configure Cloudera Manager Agents

On every Cloudera Manager Agent host, configure the Cloudera Manager Agent to point to the Cloudera Manager Server by setting the following properties in the /etc/cloudera-scm-agent/config.ini configuration file:
Property Description
server_host Name of host where the Cloudera Manager Server is running.
server_port Port on host where the Cloudera Manager Server is running.
For more information on Agent configuration options, see Agent Configuration File

(Optional) Install CDH and Managed Services Packages

Installing packages is not required if you are planning to choose parcels in Choose Software Installation Method, in which case you can go to Start the Cloudera Manager Server.

For more information about installing CDH, see CDH 4 Installation Guide or CDH 5 Installation Guide.

Choose a CDH and Managed Service Repository Strategy

  • Standard Cloudera repositories.
  • Internally hosted repositories. You might use internal repositories for environments where hosts do not have access to the Internet. In such a case, ensure your environment is properly prepared. For more information, see Understanding Custom Installation Solutions.

Install CDH and Managed Service

CDH Version Procedure
CDH 5
  • Red Hat
    1. Download and install the "1-click Install" package
      1. Download the CDH 5 "1-click Install" package.

        Click the entry in the table below that matches your Red Hat or CentOS system, choose Save File, and save the file to a directory to which you have write access (it can be your home directory).

        For OS Version Click this Link
        Red Hat/CentOS/Oracle 5 Red Hat/CentOS/Oracle 5 link
        Red Hat/CentOS/Oracle 6 (64-bit) Red Hat/CentOS/Oracle 6 link (64-bit)
      2. Install the RPM.

        For Red Hat/CentOS/Oracle 5:

        $ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm 

        For Red Hat/CentOS/Oracle 6 (64-bit):

        $ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
    2. (Optionally) add a repository key:
      • Red Hat/CentOS/Oracle 5
        $ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
      • Red Hat/CentOS/Oracle 6
        $ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
    3. Install the CDH packages:
      $ sudo yum clean all
      $ sudo yum install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-python sqoop sqoop2 whirr
        Note: Installing these packages will also install all the other CDH packages that are needed for a full CDH 5 installation.
  • SLES
    1. Download and install the "1-click Install" package.
      1. Download the CDH 5 "1-click Install" package.

        Click this link, choose Save File, and save it to a directory to which you have write access (it can be your home directory).

      2. Install the RPM:
        $ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm
      3. Update your system package index by running:
        $ sudo zypper refresh
    2. (Optionally) add a repository key:
      $ sudo rpm --import http://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera
    3. Install the CDH packages:
      $ sudo zypper clean --all
      $ sudo zypper install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-python sqoop sqoop2 whirr
        Note: Installing these packages will also install all the other CDH packages that are needed for a full CDH 5 installation.
  • Ubuntu and Debian
    1. Download and install the "1-click Install" package
      1. Download the CDH 5 "1-click Install" package:

        Click one of the following: this link for a Wheezy system, or this link for a Precise system.

      2. Install the package. Do one of the following: Choose Open with in the download window to use the package manager, or Choose Save File, save the package to a directory to which you have write access (it can be your home directory) and install it from the command line, for example:
        sudo dpkg -i cdh5-repository_1.0_all.deb
    2. (Optionally) add a repository key:
      • Ubuntu Lucid
        $ curl -s http://archive.cloudera.com/cdh5/ubuntu/lucid/amd64/cdh/archive.key | sudo apt-key add -
      • Ubuntu Precise
        $ curl -s http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key | sudo apt-key add -
      • Debian Squeeze
        $ curl -s http://archive.cloudera.com/cdh5/debian/squeeze/amd64/cdh/archive.key | sudo apt-key add -
    3. Install the CDH packages:
      $ sudo apt-get update
      $ sudo apt-get install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-python sqoop sqoop2 whirr
        Note: Installing these packages will also install all the other CDH packages that are needed for a full CDH 5 installation.
CDH 4
  • Red Hat-compatible
    1. Click the entry in the table at CDH Download Information that matches your Red Hat or CentOS system, navigate to the repo file (cloudera-cdh4.repo) for your system and save it in the /etc/yum.repos.d/ directory.
    2. Optionally add a repository key:
      • Red Hat/CentOS/Oracle 5
        $ sudo rpm --import http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
      • Red Hat/CentOS 6
        $ sudo rpm --import http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
    3. Install packages on every host in your cluster:
      1. Install CDH 4 packages:
        $ sudo yum -y install bigtop-utils bigtop-jsvc bigtop-tomcat hadoop hadoop-hdfs hadoop-httpfs hadoop-mapreduce hadoop-yarn hadoop-client hadoop-0.20-mapreduce hue-plugins hbase hive oozie oozie-client pig zookeeper
      2. To install the hue-common package and all Hue applications on the Hue host, install the hue meta-package:
        $ sudo yum install hue 
    4. Click the entries in the table at Cloudera Impala Version and Download Information and Cloudera Search Version and Download Information that matches your Red Hat or CentOS system, navigate to the repo file for your system and save it in the /etc/yum.repos.d/ directory.
    5. (Requires CDH 4.2 or later) Install Impala and the Impala Shell on Impala machines:
      $ sudo yum -y install impala impala-shell
    6. (Requires CDH 4.3 or later) Install the Solr Server on machines where you want Cloudera Search.
      $ sudo yum -y install solr-server
  • SLES
    1. Run the following command:
      $ sudo zypper addrepo -f http://archive.cloudera.com/cdh4/sles/11/x86_64/cdh/cloudera-cdh4.repo
    2. Update your system package index by running:
      $ sudo zypper refresh
    3. Optionally add a repository key:
      $ sudo rpm --import http://archive.cloudera.com/cdh4/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera  
    4. Install packages on every host in your cluster:
      1. Install CDH 4 packages:
        $ sudo zypper install bigtop-utils bigtop-jsvc bigtop-tomcat hadoop hadoop-hdfs hadoop-httpfs hadoop-mapreduce hadoop-yarn hadoop-client hadoop-0.20-mapreduce hue-plugins hbase hive oozie oozie-client pig zookeeper
      2. To install the hue-common package and all Hue applications on the Hue host, install the hue meta-package:
        $ sudo zypper install hue 
      3. (Requires CDH 4.2 or later) Run the following command:
        $ sudo zypper addrepo -f http://archive.cloudera.com/impala/sles/11/x86_64/impala/cloudera-impala.repo
      4. Install Impala and the Impala Shell on Impala machines:

        On 64-bit SUSE systems:

        $ sudo zypper install impala impala-shell
      5. (Requires CDH 4.3 or later) Run the following command:
        $ sudo zypper addrepo -f http://archive.cloudera.com/search/sles/11/x86_64/search/cloudera-search.repo
      6. Install the Solr Server on machines where you want Cloudera Search.
        $ sudo zypper install solr-server
  • Ubuntu or Debian
    1. Click the entry in the table at CDH Version and Packaging Information that matches your Ubuntu or Debian system.
    2. Navigate to the list file (cloudera.list) for your system and save it in the /etc/apt/sources.list.d/ directory.
      For example, to install CDH 4 for 64-bit Ubuntu Lucid, your cloudera.list file should look like:
      deb [arch=amd64] http://archive.cloudera.com/cdh4/ubuntu/lucid/amd64/cdh lucid-cdh4 contrib
      deb-src http://archive.cloudera.com/cdh4/ubuntu/lucid/amd64/cdh lucid-cdh4 contrib
    3. Optionally add a repository key:
      • Ubuntu Lucid
        $ curl -s http://archive.cloudera.com/cdh4/ubuntu/lucid/amd64/cdh/archive.key | sudo apt-key add -
      • Ubuntu Precise
        $ curl -s http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/archive.key | sudo apt-key add -
      • Debian Squeeze
        $ curl -s http://archive.cloudera.com/cdh4/debian/squeeze/amd64/cdh/archive.key | sudo apt-key add -
    4. Install packages on every host in your cluster:
      1. Install CDH 4 packages:
        $ sudo apt-get install bigtop-utils bigtop-jsvc bigtop-tomcat hadoop hadoop-hdfs hadoop-httpfs hadoop-mapreduce hadoop-yarn hadoop-client hadoop-0.20-mapreduce hue-plugins hbase hive oozie oozie-client pig zookeeper
      2. To install the hue-common package and all Hue applications on the Hue host, install the hue meta-package:
        $ sudo apt-get install hue 
      3. Click the entries in the table at Cloudera Impala Version and Download Information and Cloudera Search Version and Download Information that matches your Ubuntu or Debian system.
      4. Navigate to the list file for your system and save it in the /etc/apt/sources.list.d/ directory.
      5. (Requires CDH 4.2 or later) Install Impala and the Impala Shell on Impala machines:

        On 64-bit systems:

        $ sudo apt-get install impala impala-shell
      6. (Requires CDH 4.3 or later) Install the Solr Server on machines where you want Cloudera Search.
        $ sudo apt-get install solr-server

Start the Cloudera Manager Server

  Important: When you start the Cloudera Manager Server and Agents, Cloudera Manager assumes you are not already running HDFS and MapReduce. If these services are running:
  1. Shut down HDFS and MapReduce. See Stopping Services (for CDH4) or Stopping Services (for CDH 5) for the commands to stop these services.
  2. Configure the init scripts to not start on boot, use commands similar to those shown in Configuring init to Start Core Hadoop System Services or Configuring init to Start Core Hadoop System Services but disable the start on boot (for example, $ sudo chkconfig hadoop-hdfs-namenode off).
Contact Cloudera Support for help converting your existing Hadoop configurations for use with Cloudera Manager.
  1. To start the Cloudera Manager Server, type this command on the Cloudera Manager Server host:
    $ sudo service cloudera-scm-server start
    If you have problems starting the Cloudera Manager Server, such as database permission problems, you can use the Cloudera Manager Server log /var/log/cloudera-scm-server/cloudera-scm-server.log to troubleshoot the problem. See Troubleshooting Installation and Upgrade Problems.

Start the Cloudera Manager Agents

Run this command on each Agent host:
$ sudo service cloudera-scm-agent start

When the Agent starts up, it contacts the Cloudera Manager Server. When the Agent hosts reboot, cloudera-scm-agent starts automatically.

Start the Cloudera Manager Admin Console

The Cloudera Manager Server URL takes the following form http://Server host:port, where Server host is the fully-qualified domain name or IP address of the host where the Cloudera Manager Server is installed and port is the port configured for the Cloudera Manager Server. The default port is 7180.
  1. In a web browser, enter the URL, including the port, for the Cloudera Manager Server. The login screen for Cloudera Manager Admin Console displays.
  2. Log into Cloudera Manager Admin Console. The default credentials are: Username: admin Password: admin. Cloudera Manager does not support changing the admin username for the installed account. You can change the password using Cloudera Manager after you run the installation wizard. While you cannot change the admin username, you can add a new user, assign administrative privileges to the new user, and then delete the default admin account.

Choose Cloudera Manager Edition and Hosts

  1. When you start the Cloudera Manager Admin Console, the install wizard starts up. Click Continue to get started.
  2. Choose which edition to install:
    • Cloudera Express, which does not require a license, but provides a somewhat limited set of features
    • Cloudera Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed
    • Cloudera Enterprise with one of the following license types:
      • Basic Edition
      • Flex Edition
      • Data Hub Edition
    If you choose Cloudera Express or Cloudera Enterprise Data Hub Edition Trial, you can elect to upgrade the license at a later time. See Managing Licenses.
  3. If you have elected Cloudera Enterprise, install a license:
    1. Click Upload License.
    2. Click the document icon to the left of the Select a License File text field.
    3. Navigate to the location of your license file, click the file, and click Open.
    4. Click Upload.
  4. Click Continue in the next screen. The Specify Hosts page displays.
  5. Click the Currently Managed Hosts tab (you have already installed Cloudera Manager Agents on your hosts), choose the hosts to add to the cluster, and click Continue. The Select Repository page displays.

Choose Software Installation Method

  1. Choose how to install CDH and managed services:
    • Use Packages if you have installed packages in (Optional) Install CDH and Managed Services Packages. When you select packages, you also must select the correct CDH version (CDH 4 or CDH 5) that matches the packages that were manually installed, and click Continue.
    • Use Parcels
      1. Choose the parcels you want to install. The choices you see depend on the repositories you have chosen – a repository may contain multiple parcels. Only the parcels for the latest supported service versions are configured by default.
        You can add additional parcels for previous versions by specifying custom repositories. For example, you can find the locations of the previous CDH 4 parcels at http://archive.cloudera.com/cdh4/parcels/. Or, if you are installing CDH 4.3 and want to use Sentry for Hive Authorization, you can add the Sentry parcel using this mechanism. To add a custom parcel repository:
        1. Enter the URL of the repository you want into More Options field, and click the + Add button. The URL you specify here will also be added to the list of repositories listed in the Configuring Server Parcel Settings page and a parcel will be added to the list of parcels on the Select Repository page. If you have multiple repositories configured, you will see all the unique parcels contained in all your repositories.
      2. Click Continue. Cloudera Manager installs the CDH and managed service parcels. During the parcel installation, progress is indicated for the two phases of the parcel installation process (Download and Distribution) in a separate progress bars. If you are installing multiple parcels you will see progress bars for each parcel. When the Continue button appears at the bottom of the screen, the installation process is completed. Click Continue.
  2. The Host Inspector runs to validate the installation, and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Continue.

Add Services

The following instructions describe how to use the Cloudera Manager wizard to configure and start the CDH and managed services.

  1. In the first page of the Add Services wizard you choose the combination of services to install and whether to install Cloudera Navigator:
    • Click the radio button next to the combination of services to install:
      CDH 4 CDH 5
      • Core Hadoop - HDFS, MapReduce, ZooKeeper, Oozie, Hive, and Hue
      • Core with HBase
      • Core with Impala
      • All Services - HDFS, MapReduce, ZooKeeper, HBase, Impala, Oozie, Hive, Hue, and Sqoop
      • Custom Services - Any combination of services.
      • Core Hadoop - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, and Sqoop
      • Core with HBase
      • Core with Impala
      • Core with Search
      • Core with Spark
      • All Services - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, Sqoop, HBase, Impala, Solr, Spark, and Keystore Indexer
      • Custom Services - Any combination of services.
      As you select the services, keep the following in mind:
      • Some services depend on other services; for example, HBase requires HDFS and ZooKeeper. Cloudera Manager tracks dependencies and installs the correct combination of services.
      • In a CDH 4 cluster, the MapReduce service is the default MapReduce computation framework. Choose Custom Services to install YARN or use the Add Service functionality to add YARN after installation completes.
          Important: You can create a YARN service in a CDH 4 cluster, but it is not considered production ready.
      • In a CDH 5 cluster, the YARN service is the default MapReduce computation framework. Choose Custom Services to install MapReduce or use the Add Service functionality to add MapReduce after installation completes.
          Important: In CDH 5 the MapReduce service has been deprecated. However, the MapReduce service is fully supported for backward compatibility through the CDH 5 life cycle.
      • The Flume service can be added only after your cluster has been set up.
    • If you have chosen Data Hub Edition Trial or Cloudera Enterprise, optionally check the Include Cloudera Navigator checkbox to enable Cloudera Navigator. See the Cloudera Navigator Documentation.
    Click Continue. The Customize Role Assignments page displays.
  2. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same set of hosts to which the HDFS DataNode role is assigned. These assignments are typically acceptable, but you can reassign services to hosts of your choosing, if desired.

    Click a field below a role to display a dialog containing a pageable list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts or Custom to display the pageable hosts dialog.

    The following shortcuts for specifying host names are supported:
    • Range of hostnames (without the domain portion)
      Range Definition Matching Hosts
      10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
      host[1-3].company.com host1.company.com, host2.company.com, host3.company.com
      host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com
    • IP addresses
    • Rack name

    Click the View By Host button for an overview of the role assignment by host ranges.

  3. When you are satisfied with the assignments, click Continue. The Database Setup page displays.
  4. On the Database Setup page, configure settings for required databases:
    1. Provide information for the Activity Monitor (only needed when using MapReduce), Reports Manager, and Hive Metastore databases.
        Important: The value you enter as the database hostname must match the value you entered for the hostname (if any) when you created the database.
    2. Click Test Connection to confirm that Cloudera Manager can communicate with the databases using the information you have supplied. If the test succeeds in all cases, click Continue; otherwise check and correct the information you have provided for the databases and then try the test again. (For Hive, if you are using the embedded database, you will see a message saying the database will be created at a later point in the installation process.) The Review Changes page displays.
  5. Review the configuration changes to be applied. Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed. For example, you might confirm the NameNode Data Directory and the DataNode Data Directory for HDFS. Click Continue. The wizard starts the services.
  6. When all of the services are started, click Continue. You will see a success message indicating that your cluster has been successfully started.
  7. Click Continue to proceed to the Home Page.

Change the Default Administrator Password

As soon as possible after running the wizard and beginning to use Cloudera Manager, you should change the default administrator password:
  1. Right-click the logged-in username at the far right of the top navigation bar and select Change Password.
  2. Enter the current password, and a new password twice and then click Submit.

Test the Installation

You can test the installation following the instructions in Testing the Installation.