This is the documentation for Cloudera Manager 4.8.4.
Documentation for other versions is available at Cloudera Documentation.

Installation Path A - Automated Installation by Cloudera Manager

If your cluster meets the requirements for Installation Path A, follow the instructions in this section for automated installation by Cloudera Manager. The requirements for Path A are:

  • Uniform SSH access to cluster hosts on the same port from Cloudera Manager Server host.
  • All hosts must have access to standard package repositories.
  • All hosts must have access to the either archive.cloudera.com on the internet or to a local repository with the necessary installation files.

The Cloudera Manager configuration, as well as the other monitoring and management information is stored in databases. As part of the process of Installation Path A, Cloudera Manager installs embedded PostgreSQL databases. It is simplest to use these automatically installed and configured databases. During the installation, you are provided with the option to select databases other than the automatically installed databases. If you intended to customize the installation to use other databases, install and configure them before beginning to use Installation Path A.

Using custom databases is a more advanced process, which is more often a part of an Installation Using Your Own Method. For more information on installing custom databases, see Installing and Configuring Databases. Otherwise, use the embedded PostgreSQL database, which the installer creates.

The general steps in this procedure for Installation Path A are:

Step 1: Download and Run the Cloudera Manager Installer

  Important:

For installation purposes, the Cloudera Manager Server must have SSH access to the cluster hosts and you must log in using a root account or an account that has password-less sudo permission. See Requirements for Cloudera Manager for more information.

Cloudera Manager accesses archive.cloudera.com by using yum on Red Hat systems, zypper on SUSE systems, or apt-get on Debian/Ubuntu systems. If your hosts access the Internet through an HTTP Proxy, you can configure yum, zypper, or apt-get, system-wide, to access archive.cloudera.com through a proxy. To do so, modify the system configuration on the Cloudera Manager Server host and on every cluster host where you want to install CDH. This is not required in all cases.

To configure your system to use a proxy

On Red Hat systems, add the following property to /etc/yum.conf:

proxy=http://server:port/

On SUSE systems, add the following property to /root/.curlrc:

--proxy=http://server:port/

On Debian/Ubuntu systems, add the following property to /etc/apt/apt.conf:

Acquire::http::Proxy "http://server:port";

To download and run the Cloudera Manager installer:

  1. Download cloudera-manager-installer.bin from the Cloudera Downloads page to the host where you want to install the Cloudera Manager Server. The host must be on your cluster or accessible to your cluster over your network. Install Cloudera Manager on a single host.
  2. After downloading cloudera-manager-installer.bin, change it to have executable permission.
    $ chmod u+x cloudera-manager-installer.bin
  3. Run cloudera-manager-installer.bin.
      Note: The installer's default behavior is to install the Cloudera Manager packages from the Internet. If you have created a local repository and configured your machine to recognize that repository, you can instruct the installer to use local repositories by running the cloudera-manager-installer.bin with the --skip_repo_package=1 option.
    $ sudo ./cloudera-manager-installer.bin
  4. Read the Cloudera Manager Readme and then press Enter to choose Next.
  5. Read the Cloudera Manager License and then press Enter to choose Next. Use the arrow keys and press Enter to choose Yes to confirm you accept the license.
  6. Read the Oracle Binary Code License Agreement and then press Enter to choose Next. Use the arrow keys and press Enter to choose Yes to confirm you accept the Oracle Binary Code License Agreement. The Cloudera Manager installer begins installing the Oracle JDK and the Cloudera Manager repo files and then installs the packages. The installer also installs the Cloudera Manager Server.
      Note:
    If an error message "Failed to start server" appears while running cloudera-manager-installer.bin, exit the installation program. If the Cloudera Manager Server log file /var/log/cloudera-scm-server/cloudera-scm-server.log contains the following message, then it's likely you have SELinux enabled
    Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
            at java.net.URLClassLoader$1.run(Unknown Source)
            at java.security.AccessController.doPrivileged(Native Method)
            at java.net.URLClassLoader.findClass(Unknown Source)
            at java.lang.ClassLoader.loadClass(Unknown Source)
            ...
    You can disable SELinux by running the following command on the Cloudera Manager Server host:
    $ sudo setenforce 0

    To disable it permanently, edit /etc/selinux/config.

  7. Note the complete URL provided for the Cloudera Manager Admin Console, including the port number, which is 7180 by default. Press Enter to choose OK to continue.
  8. Press Enter to choose OK to exit the installer.
  Note:

If the installation is interrupted for some reason, you may need to clean up before you can re-run it. See Uninstalling Cloudera Manager.

Step 2: Start the Cloudera Manager Admin Console

The Cloudera Manager Admin Console enables you to use Cloudera Manager to configure, manage, and monitor Hadoop on your cluster. Before using the Cloudera Manager Admin Console, gather information about the server's URL and port.

The server URL takes the following form:

http://<Server host>:<port>

<Server host> is the fully-qualified domain name or IP address of the host machine where the Cloudera Manager Server is installed. <port> is the port configured for the Cloudera Manager Server. The default port is 7180. For example, use a URL such as the following:

http://myhost.example.com:7180/

Cloudera Manager does not support changing the admin username for the installed account. You can change the password using Cloudera Manager after you run the wizard in the next section. While you cannot change the admin username, you can add a new user, assign administrative privileges to the new user, and then delete the default admin account.

To start the Cloudera Manager Admin Console:

In a web browser, enter the URL, including the port, for the Cloudera Server. The login screen for Cloudera Manager appears.

Log into Cloudera Manager. The default credentials are: Username: admin Password: admin

Step 3: Use Cloudera Manager for Automated CDH Installation and Configuration

The following instructions show you how to use the Cloudera Manager wizard to do an initial installation and configuration. The wizard helps you to install and set up Cloudera parcels or packages across your cluster and will:

  • Let you select the version of Cloudera Manager you want to install:
    • Cloudera Standard, which does not require a license, but provides a somewhat limited set of features
    • Cloudera Enterprise Edition with a 60-day Trial license
    • Cloudera Enterprise Edition with a license
  • Find the cluster hosts you specify via hostname and IP-address ranges
  • Connect to each host with SSH to install the Cloudera Manager Agent and other components
  • Install the Oracle JDK on the cluster hosts (if not already installed)
  • Install CDH packages or parcels, optionally including the Cloudera Impala and Cloudera Search packages or parcels
  • Configure Hadoop automatically and start the Hadoop services
  Important:

All hosts in the cluster must have some way to access installation files. This can be done one of two ways:

  • Internet access to allow the wizard to install software packages or parcels from archive.cloudera.com.
  • A custom internal repository that the host(s) can access. For example, for a Red Hat host, you could set up a Yum repository. See Creating and Using your own Repository for more information about this option.

To use Cloudera Manager:

  1. The first time you start the Cloudera Manager Admin Console, the install wizard starts up.
  2. Select whether you want to:
    • Install Cloudera Standard,
    • Try Cloudera Enterprise with a 60-day trial license, or
    • Install a license you have purchased for Cloudera Enterprise.
  3. If you have elected to install an existing license, browse to your Cloudera Manager License file and upload it.
  4. After you upload the Cloudera Manager license, or if you have elected to use a Trial license, restart the Cloudera Manager server:
    $ sudo service cloudera-scm-server restart
  5. As the Cloudera Manager server restarts, the user interface indicates its progress, and presents the login page when the restart has completed.
  6. Information is displayed indicating what the CDH installation includes. At this point, you can access online Help or the Support Portal if you wish. Click Continue to proceed with the installation.
  7. To enable Cloudera Manager to automatically discover your cluster hosts where you want to install CDH, enter the cluster hostnames or IP addresses. You can also specify hostname and IP address ranges. For example:

    Use this Expansion Range

    To Specify these Hosts

    10.1.1.[1-4]

    10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4

    host[1-3].company.com

    host1.company.com, host2.company.com, host3.company.com

    host[07-10].company.com

    host07.company.com, host08.company.com, host09.company.com, host10.company.com

    You can specify multiple addresses and address ranges by separating them by commas, semicolons, tabs, or blank spaces, or by placing them on separate lines. Use this technique to make more specific searches instead of searching overly wide ranges. The scan results will include all addresses scanned, but only scans that reach hosts running SSH will be selected for inclusion in your cluster by default.

      Note:

    If you don't know the IP addresses of all of the hosts, you can enter an address range that spans over unused addresses and then deselect the hosts that do not exist (and are not discovered) later in this procedure. However, keep in mind that wider ranges will require more time to scan.

  8. Click Search. Cloudera Manager identifies the hosts on your cluster to allow you to configure them for CDH. If there are a large number of hosts on your cluster, wait a few moments to allow them to be discovered and shown in the wizard. If the search is taking too long, you can stop the scan by clicking Abort Scan. To find additional hosts, click New Search, add the host names or IP addresses and click Search again.
      Note:

    Cloudera Manager scans hosts by checking for network connectivity. If there are some hosts where you want to install CDH that are not shown in the list, make sure you have network connectivity between the Cloudera Manager Server host and those hosts. Common causes of loss of connectivity are firewalls and interference from SELinux.

  9. Verify that the number of hosts shown matches the number of hosts where you want to install CDH. Deselect host entries that do not exist and deselect the hosts where you do not want to install CDH.

    Click Continue

  10. Select the repository type you want to use for the installation.

    Installing from parcels is recommended, if they are available for the version you want to install.

      Note: Parcels are available for CDH4.1.3 or later, and for Impala and Solr. To install CDH3 or to install an earlier version of CDH4, select Packages.

Installation using Parcels

  1. Choose the parcel you want to install. The choices you see depend on the repositories you have chosen – a repository may contain multiple parcels.
      Note: By default only the parcel for the latest version of CDH, Impala and Solr are shown. However, you can add parcels for previous versions as a custom repository as described below. For example, you can find the locations of the previous CDH4 parcels at http://archive.cloudera.com/cdh4/parcels/.

    If you have parcels in a custom repository, or if you want to install a previous version of CDH, you can specify the repository and Cloudera Manager will add those parcels to the list shown on this page. If you are installing CDH4.3 and want to use Sentry, you can add the Sentry parcel using this mechanism (the Sentry parcel for CDH4.3 is at http://archive.cloudera.com/sentry/parcels/latest/). Sentry is included with CDH4.4 or later.

    1. Click More Options to show the custom repository field.
    2. Enter the URL of the repository you want into the field provided, and click the + Add button. The URL you specify here will also be added to the list of remote repositories referenced in the Remote Parcel Repository URLs property. If you have multiple repositories configured, you will see all the unique parcels contained in all your repositories.
  2. Select the specific releases of Impala and Solr to install on your hosts. You may choose either the latest version or use a custom repository. Choose None if you do not want to install that product.
  3. Select the specific release of the Cloudera Manager Agent to install on your hosts. You may choose either the version that matches with the Cloudera Manager Server you are currently using, or you can specify an installation from a custom repository.
  4. If available, select the specific release of Impala to install on your hosts. You may choose either the latest version or use a custom repository. If you do not want to install Impala, select None.
  5. If you opted to use custom repositories for installation files, you may provide a GPG key URL that will apply for all repositories.
  6. Click Continue. You are now asked to provide your credentials, following the instructions at Provide credentials for authenticating with hosts.

Installation using Packages

  1. Select the major release of CDH to install. This is often CDH4.
  2. Select the specific release of CDH to install from within the major version you selected. You may choose a custom repository.
  3. Select the specific releases of Impala and Solr to install on your hosts, assuming you have selected an appropriate CDH4 version. You may choose either the latest version or use a custom repository. Choose None if you do not want to install that product.
  4. Select the specific release of Cloudera Manager to install on your hosts. You may choose either the version that matches with the Cloudera Manager Server you are currently using or you can specify an installation at a custom repository.
  5. If you opted to use custom repositories for installation files, you may provide a GPG key URL that will apply for all repositories.
  6. Click Continue.

Provide credentials for authenticating with hosts

  1. Select root or enter the user name for an account that has password-less sudo permissions.
  2. Select an authentication method.
    • If you choose to use password authentication, enter and confirm the password.
    • If you choose to use public-key authentication provide a passphrase and path to the required key files.
    • You can choose to specify an alternate SSH port. The default value is 22.
    • You can specify the maximum number of host installations to run at once. The default value is 10.
  3. Click Continue to begin installing the Cloudera Manager Agent and Daemons on the cluster hosts. If you are installing from packages, the process also installs CDH (and Impala, if you've selected it) on your hosts.

Install Cloudera Manager and CDH components

  1. If you are installing from packages, the wizard configures package repositories, installs the Oracle JDK, CDH, and the Cloudera Manager Agent, and then starts the Cloudera Manager Agent. The status of installation on each host is displayed. You can also click the Details link for individual hosts to view detailed information about the installation and error messages if installation fails on any hosts.
    1. When the Continue button appears at the bottom of the screen, the installation process is completed. If the installation has completed successfully on some hosts but failed on others, you can click Continue if you want to skip installation on the failed hosts and continue to the next screen to start configuring CDH on the successful hosts.
  2. If you are installing from parcels, the wizard installs the Oracle JDK and the Cloudera Manager Agent using packages, as described above. The status of installation on each host is displayed.
    1. When the Cloudera Manager Agent, the JDK etc. have been installed, click Continue to proceed to the cluster installation section. During the parcel installation, progress is indicated for the three phases of the parcel installation process (Download, Distribution, and Activation) in a single progress bar. If you are installing multiple parcels you will see a progress bar for each parcel.
    2. When the Continue button appears at the bottom of the screen, the installation process is completed.
  3. When you continue, the Host Inspector runs to validate the installation, and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Continue.

Choose the services you want to start on your cluster

  1. Choose the combination of services to install: Core Hadoop, Real-Time Delivery (previously known as HBase Services), Real-Time Query (which includes HDFS, Hive and Impala), All Services, or Custom Services.
      Note:
    • Some services depend on others; for example, HBase requires HDFS and ZooKeeper.
    • Most of the combinations install MapReduce v1. Choose the Custom Services option to install MapReduce v2 (YARN) or use the Add Service functionality to add YARN after installation completes.
  2. If you are installing the Enterprise Edition, choose whether to install Cloudera Navigator. Cloudera Navigator is independently licensed from the core Cloudera Enterprise offering.
  3. Click Inspect Role Assignments to see how the wizard will assign roles for the services you have chosen, and change them if you need to. The wizard evaluates the hardware configurations of the cluster hosts to determine the best machines for each role. For example, the wizard assigns the NameNode role to the machine that best meets the NameNode requirements. The wizard also configures other options, such as the number of map and reduce slots for TaskTracker, on the basis of the size of the cluster and the physical characteristics of each machines, such as the number of CPUs, amount of RAM, and disk space. These assignments are typically acceptable, but you can reassign services to nodes of your choosing, if desired.
  4. Click Continue when you are satisfied with the assignments.
  5. On the Database Setup page, configure settings for the Activity Monitor, Service Monitor, Report Manager, Host Monitor, and Hive metastore databases.
    • Leave the default settings of Use Embedded Database to have Cloudera Manager create and configure all required databases.
    • Select Custom to specify external databases, and enter the required information for the databases that you created when you set up your databases for Cloudera Manager. You must provide the Database host, database type, database name, username, and password.
    • Click Test Connection to confirm that Cloudera Manager can communicate with the databases using the information you have supplied. If the test succeeds in all cases, click Continue; otherwise check and correct the information you have provided for the databases and then try the test again. (Note that for Hive, if you are using the embedded database, you may see a message saying the connection will be created at a later point in the installation process.)
  6. Review the Configuration Changes to be applied.
    Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed. For example, you might confirm the NameNode Data Directory and the DataNode Data Directory for HDFS or confirm the TaskTracker Local Data Directory List or JobTracker Local Data Directory for MapReduce.
      Warning: DataNode data directories should not be placed on NAS devices.
  7. Click Continue. The wizard starts the services on your cluster.
  8. When all of the services are started, click Continue. You will see a success message indicating that your cluster has been successfully started.
  9. Click Continue to proceed to the Cloudera Manager Services page.

Step 4: Change the Default Administrator Password

As soon as possible after running the wizard and beginning to use Cloudera Manager, you should change the default administrator password.

To change the administrator password:

  1. From the Administration tab, select Users.
  2. Click the Change Password button next to the admin account.
  3. Enter a new password twice and then click Submit.

Step 5: Test the Installation

Now that you have finished with the CDH and Cloudera Manager installation, you are ready to test the installation. For testing instructions, see Testing the Installation.

  Note:

If you change the hostname or port where the Cloudera Manager is running, or you enable TLS security, you must restart the Cloudera Management Services to update the URL to the Server. For instructions, see Restarting a Service.