This is the documentation for Cloudera 5.2.x.
Documentation for other versions is available at Cloudera Documentation.

Installation Path A - Automated Installation by Cloudera Manager

Before You Begin

In certain circumstances you may need to perform optional installation and configuration steps.

Install and Configure External Databases

If you intend to use an external database for services or Cloudera Management Service roles, install and configure it following the instructions in External Databases for Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.

(CDH 5 only) On RHEL 5 and CentOS 5, Install Python 2.6 or 2.7

If you are using Hue or installing CDH 5 using packages, Python 2.6 or 2.7 must be available or already installed. To ensure that it is installed, add the following to /etc/yum.repos.d/epel.repo:

[epel]
name=Local Mirror Extra Packages for Enterprise Linux 5 - x86_64
baseurl=http://mirror.infra.cloudera.com/epel/5/x86_64
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/cloudera-rpm-gpg/RPM-GPG-KEY-EPEL-5
failovermethod=priority
priority=20

Configure an HTTP Proxy

The Cloudera Manager installer accesses archive.cloudera.com by using yum on RHEL systems, zypper on SLES systems, or apt-get on Debian/Ubuntu systems. If your hosts access the Internet through an HTTP proxy, you can configure yum, zypper, or apt-get, system-wide, to access archive.cloudera.com through a proxy. To do so, modify the system configuration on the Cloudera Manager Server host and on every cluster host as follows:
OS File Property
RHEL-compatible /etc/yum.conf proxy=http://server:port/
SLES /root/.curlrc --proxy=http://server:port/
Ubuntu or Debian /etc/apt/apt.conf Acquire::http::Proxy "http://server:port";

Download and Run the Cloudera Manager Server Installer

  1. Download the Cloudera Manager installer binary from Cloudera Manager 5.2.1 Downloads to the cluster host where you want to install the Cloudera Manager Server.
    1. Click Download Cloudera Express or Download Cloudera Enterprise. See Cloudera Express and Cloudera Enterprise Features.
    2. Optionally register and click Submit or click the Just take me to the download page link. The cloudera-manager-installer.bin file downloads.
  2. Change cloudera-manager-installer.bin to have executable permission.
    $ chmod u+x cloudera-manager-installer.bin
  3. Run the Cloudera Manager Server installer:
    • Install Cloudera Manager packages from the Internet - sudo ./cloudera-manager-installer.bin
    • Install Cloudera Manager packages from a local repository - sudo ./cloudera-manager-installer.bin --skip_repo_package=1
  4. Read the Cloudera Manager README and then press Return or Enter to choose Next.
  5. Read the Cloudera Express License and then press Return or Enter to choose Next. Use the arrow keys and press Return or Enter to choose Yes to confirm you accept the license.
  6. Read the Oracle Binary Code License Agreement and then press Return or Enter to choose Next.
  7. Use the arrow keys and press Return or Enter to choose Yes to confirm you accept the Oracle Binary Code License Agreement. The following occurs:
    1. The installer installs the Oracle JDK and the Cloudera Manager repository files.
    2. The installer installs the Cloudera Manager Server and embedded PostgreSQL packages.
    3. The installer starts the Cloudera Manager Server and embedded PostgreSQL database.
  8. When the installation completes, the complete URL provided for the Cloudera Manager Admin Console, including the port number, which is 7180 by default. Press Return or Enter to choose OK to continue.
  9. Press Return or Enter to choose OK to exit the installer.
  Note: If the installation is interrupted for some reason, you may need to clean up before you can re-run it. See Uninstalling Cloudera Manager and Managed Software.

Start and Log into the Cloudera Manager Admin Console

The Cloudera Manager Server URL takes the following form http://Server host:port, where Server host is the fully-qualified domain name or IP address of the host where the Cloudera Manager Server is installed and port is the port configured for the Cloudera Manager Server. The default port is 7180.
  1. Wait several minutes for the Cloudera Manager Server to complete its startup. To observe the startup process you can perform tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
  2. In a web browser, enter http://Server host:7180, where Server host is the fully-qualified domain name or IP address of the host where the Cloudera Manager Server is running. The login screen for Cloudera Manager Admin Console displays.
  3. Log into Cloudera Manager Admin Console. The default credentials are: Username: admin Password: admin. Cloudera Manager does not support changing the admin username for the installed account. You can change the password using Cloudera Manager after you run the installation wizard. While you cannot change the admin username, you can add a new user, assign administrative privileges to the new user, and then delete the default admin account.

Use the Cloudera Manager Wizard for Software Installation and Configuration

The following instructions describe how to use the Cloudera Manager installation wizard to do an initial installation and configuration. The wizard lets you:

  • Select the version of Cloudera Manager you want to install
  • Find the cluster hosts you specify via hostname and IP address ranges
  • Connect to each host with SSH to install the Cloudera Manager Agent and other components
  • Optionally installs the Oracle JDK on the cluster hosts. If you choose not to have the JDK installed, you must install it on all clusters according to the following instructions prior to running the wizard:
  • Install CDH and managed service packages or parcels
  • Configure CDH and managed services automatically and start the services
  Important: All hosts in the cluster must have some way to access installation files via one of the following methods:
  • Internet access to allow the wizard to install software packages or parcels from archive.cloudera.com.
  • A custom internal repository that the host(s) can access. For example, for a Red Hat host, you could set up a Yum repository. See Creating and Using a Package Repository for more information about this option.

Choose Cloudera Manager Edition and Hosts

  1. Choose which edition to install:
    • Cloudera Express, which does not require a license, but provides a somewhat limited set of features.
    • Cloudera Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed
    • Cloudera Enterprise with one of the following license types:
      • Basic Edition
      • Flex Edition
      • Data Hub Edition
    If you choose Cloudera Express or Cloudera Enterprise Data Hub Edition Trial, you can elect to upgrade the license at a later time. See Managing Licenses.
  2. If you have elected Cloudera Enterprise, install a license:
    1. Click Upload License.
    2. Click the document icon to the left of the Select a License File text field.
    3. Navigate to the location of your license file, click the file, and click Open.
    4. Click Upload.
    Click Continue to proceed with the installation.
  3. Information is displayed indicating what the CDH installation includes. At this point, you can access online Help or the Support Portal if you wish. Click Continue to proceed with the installation.
  4. To enable Cloudera Manager to automatically discover hosts on which to install CDH and managed services, enter the cluster hostnames or IP addresses. You can also specify hostname and IP address ranges. For example:
    Range Definition Matching Hosts
    10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
    host[1-3].company.com host1.company.com, host2.company.com, host3.company.com
    host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com

    You can specify multiple addresses and address ranges by separating them by commas, semicolons, tabs, or blank spaces, or by placing them on separate lines. Use this technique to make more specific searches instead of searching overly wide ranges. The scan results will include all addresses scanned, but only scans that reach hosts running SSH will be selected for inclusion in your cluster by default. If you don't know the IP addresses of all of the hosts, you can enter an address range that spans over unused addresses and then deselect the hosts that do not exist (and are not discovered) later in this procedure. However, keep in mind that wider ranges will require more time to scan.

  5. Click Search. Cloudera Manager identifies the hosts on your cluster to allow you to configure them for services. If there are a large number of hosts on your cluster, wait a few moments to allow them to be discovered and shown in the wizard. If the search is taking too long, you can stop the scan by clicking Abort Scan. To find additional hosts, click New Search, add the host names or IP addresses and click Search again. Cloudera Manager scans hosts by checking for network connectivity. If there are some hosts where you want to install services that are not shown in the list, make sure you have network connectivity between the Cloudera Manager Server host and those hosts. Common causes of loss of connectivity are firewalls and interference from SELinux.
  6. Verify that the number of hosts shown matches the number of hosts where you want to install services. Deselect host entries that do not exist and deselect the hosts where you do not want to install services. Click Continue. The Select Repository page displays.

Choose Software Installation Method and Install Software

  1. Select the repository type to use for the installation: parcels or packages.
    • Use Parcels:
      1. Choose the parcels to install. The choices you see depend on the repositories you have chosen – a repository may contain multiple parcels. Only the parcels for the latest supported service versions are configured by default.
        You can add additional parcels for previous versions by specifying custom repositories. For example, you can find the locations of the previous CDH 4 parcels at http://archive.cloudera.com/cdh4/parcels/. Or, if you are installing CDH 4.3 and want to use policy-file authorization, you can add the Sentry parcel using this mechanism.
        1. To specify the parcel directory, local parcel repository, add a parcel repository, or specify the properties of a proxy server through which parcels are downloaded, click the More Options button and do one or more of the following:
          • Parcel Directory and Local Parcel Repository Path - Specify the location of parcels on cluster hosts and the Cloudera Manager Server host.
          • Parcel Repository - In the Remote Parcel Repository URLs field, click the button and enter the URL of the repository. The URL you specify is added to the list of repositories listed in the Configuring Cloudera Manager Server Parcel Settings page and a parcel is added to the list of parcels on the Select Repository page. If you have multiple repositories configured, you will see all the unique parcels contained in all your repositories.
          • Proxy Server - Specify the properties of a proxy server.
        2. Click OK.
    • Use Packages:
      1. Select the major release of CDH to install.
      2. Select the specific release of CDH to install.
      3. Select the specific releases of Impala and Solr to install, assuming you have selected an appropriate CDH version. You can choose either the latest version or use a custom repository. Choose None if you do not want to install that service.
  2. Select the release of Cloudera Manager Agent to install. You can choose either the version that matches the Cloudera Manager Server you are currently using or specify a version in a custom repository.
  3. If you opted to use custom repositories for installation files, you can provide a GPG key URL that applies for all repositories.
  4. Click Continue. The JDK Installation Options page displays.
    • Leave Install Oracle Java SE Development Kit (JDK) checked to allow Cloudera Manager to install the JDK on each cluster host or uncheck if you plan to install it yourself.
    • If your local laws permit you to deploy unlimited strength encryption and you are running a secure cluster, check the Install Java Unlimited Strength Encryption Policy Files checkbox.
    Click Continue.
  5. Specify SSH login properties:
    • Select root or enter the user name for an account that has password-less sudo permission.
    • Select an authentication method:
      • If you choose to use password authentication, enter and confirm the password.
      • If you choose to use public-key authentication provide a passphrase and path to the required key files.
    • You can choose to specify an alternate SSH port. The default value is 22.
    • You can specify the maximum number of host installations to run at once. The default value is 10.
    Click Continue. Cloudera Manager performs the following:
    • Parcels - installs the Oracle JDK and the Cloudera Manager Agent packages and starts the Agent. Click Continue. During the parcel installation, progress is indicated for the two phases of the parcel installation process (Download and Distribution) in a separate progress bars. If you are installing multiple parcels you will see progress bars for each parcel. When the Continue button appears at the bottom of the screen, the installation process is completed.
    • Packages - configures package repositories, installs the Oracle JDK, CDH and managed service and the Cloudera Manager Agent packages, and starts the Agent. When the Continue button appears at the bottom of the screen, the installation process is completed. If the installation has completed successfully on some hosts but failed on others, you can click Continue if you want to skip installation on the failed hosts and continue to the next screen to start configuring services on the successful hosts.
    While packages are being installed, the status of installation on each host is displayed. You can click the Details link for individual hosts to view detailed information about the installation and error messages if installation fails on any hosts. If you click the Abort Installation button while installation is in progress, it will halt any pending or in-progress installations and roll back any in-progress installations to a clean state. The Abort Installation button does not affect host installations that have already completed successfully or already failed.
  6. Click Continue. The Host Inspector runs to validate the installation, and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Finish. The Cluster Setup page displays.

Add Services

  1. In the first page of the Add Services wizard you choose the combination of services to install and whether to install Cloudera Navigator:
    • Click the radio button next to the combination of services to install:
      CDH 4 CDH 5
      • Core Hadoop - HDFS, MapReduce, ZooKeeper, Oozie, Hive, and Hue
      • Core with HBase
      • Core with Impala
      • All Services - HDFS, MapReduce, ZooKeeper, HBase, Impala, Oozie, Hive, Hue, and Sqoop
      • Custom Services - Any combination of services.
      • Core Hadoop - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, and Sqoop
      • Core with HBase
      • Core with Impala
      • Core with Search
      • Core with Spark
      • All Services - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, Sqoop, HBase, Impala, Solr, Spark, and Key-Value Store Indexer
      • Custom Services - Any combination of services.
      As you select the services, keep the following in mind:
      • Some services depend on other services; for example, HBase requires HDFS and ZooKeeper. Cloudera Manager tracks dependencies and installs the correct combination of services.
      • In a Cloudera Manager deployment of a CDH 4 cluster, the MapReduce service is the default MapReduce computation framework. Choose Custom Services to install YARN or use the Add Service functionality to add YARN after installation completes.
          Note: You can create a YARN service in a CDH 4 cluster, but it is not considered production ready.
      • In a Cloudera Manager deployment of a CDH 5 cluster, the YARN service is the default MapReduce computation framework. Choose Custom Services to install MapReduce or use the Add Service functionality to add MapReduce after installation completes.
          Note: In CDH 5, the MapReduce service has been deprecated. However, the MapReduce service is fully supported for backward compatibility through the CDH 5 life cycle.
      • The Flume service can be added only after your cluster has been set up.
    • If you have chosen Data Hub Edition Trial or Cloudera Enterprise, optionally check the Include Cloudera Navigator checkbox to enable Cloudera Navigator. See the Cloudera Navigator Documentation.
    Click Continue. The Customize Role Assignments page displays.
  2. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same set of hosts to which the HDFS DataNode role is assigned. These assignments are typically acceptable, but you can reassign them if necessary.

    Click a field below a role to display a dialog containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts or Custom to display the pageable hosts dialog.

    The following shortcuts for specifying hostname patterns are supported:
    • Range of hostnames (without the domain portion)
      Range Definition Matching Hosts
      10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
      host[1-3].company.com host1.company.com, host2.company.com, host3.company.com
      host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com
    • IP addresses
    • Rack name

    Click the View By Host button for an overview of the role assignment by hostname ranges.

  3. When you are satisfied with the assignments, click Continue. The Database Setup page displays.
  4. Configure database settings:
    1. Choose the database type:
      • Leave the default setting of Use Embedded Database to have Cloudera Manager create and configure required databases. Make a note of the auto-generated passwords.
      • Select Use Custom Databases to specify external databases.
        1. Enter the database host, database type, database name, username, and password for the database that you created when you set up the database.
    2. Click Test Connection to confirm that Cloudera Manager can communicate with the database using the information you have supplied. If the test succeeds in all cases, click Continue; otherwise check and correct the information you have provided for the database and then try the test again. (For some servers, if you are using the embedded database, you will see a message saying the database will be created at a later step in the installation process.) The Review Changes page displays.
  5. Review the configuration changes to be applied. Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed.
      Warning: DataNode data directories should not be placed on NAS devices.
    Click Continue. The wizard starts the services.
  6. When all of the services are started, click Continue. You will see a success message indicating that your cluster has been successfully started.
  7. Click Finish to proceed to the Home Page.

Configure Cluster CDH Version for Package Installs

If you have installed CDH as a package, after an install or upgrade, make sure that the cluster CDH version matches the package CDH version, using the procedure in Configuring the CDH Version of a Cluster. If the cluster CDH version does not match the package CDH version, Cloudera Manager incorrectly enables and disables service features based on the cluster's configured CDH version.

Change the Default Administrator Password

As soon as possible after running the wizard and beginning to use Cloudera Manager, change the default administrator password:
  1. Right-click the logged-in username at the far right of the top navigation bar and select Change Password.
  2. Enter the current password and a new password twice, and then click Update.

Test the Installation

You can test the installation following the instructions in Testing the Installation.