Installation Path C - Manual Installation Using Cloudera Manager Tarballs

Before proceeding with this path for a new installation, review Cloudera Manager Deployment. If you are upgrading a Cloudera Manager existing installation, see Upgrading Cloudera Manager.

Before You Begin

Install the Oracle JDK

See Java Development Kit Installation.

Install and Configure Databases

Read Cloudera Manager and Managed Service Data Stores. If you are using an external database, install and configure a database as described in MySQL Database, Oracle Database, or External PostgreSQL Database.

(CDH 5 only) On RHEL 5 and CentOS 5, Install Python 2.6 or 2.7

CDH 5 Hue will only work with the default Python version of the operating system on which it is being installed. For example, on RHEL/CentOS 6 you will need Python 2.6 to start Hue. However, RHEL 5 and CentOS 5 users will have to download Python 2.6 from the EPEL repository as described below.
To install packages from the EPEL repository, download the appropriate repository rpm packages to your machine and then install Python using yum. For example, use the following commands for RHEL 5 or CentOS 5:
$ su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm'
...
$ yum install python26

Install the Cloudera Manager Server and Agents

Tarballs contain both the Cloudera Manager Server and Cloudera Manager Agent in a single file. Download tarballs from the locations listed in Cloudera Manager Version and Download Information. Copy the tarballs and unpack them on all hosts on which you intend to install Cloudera Manager Server and Cloudera Manager Agents, in a directory of your choosing. If necessary, create a new directory to accommodate the files you extract from the tarball. For instance, if /opt/cloudera-manager does not exist, create it using a command similar to:
$ sudo mkdir /opt/cloudera-manager
When you have a directory to which to extract the contents of the tarball, extract the contents. For example, to copy a tar file to your home directory and extract the contents of all tar files to the /opt/ directory, use a command similar to the following:
$ sudo tar xzf cloudera-manager*.tar.gz -C /opt/cloudera-manager

The files are extracted to a subdirectory named according to the Cloudera Manager version being extracted. For example, files could extract to /opt/cloudera-manager/cm-5.0/. This full path is needed later and is referred to as tarball_root directory.

Perform Configuration Required by Single User Mode

If you choose to create a Cloudera Manager deployment that employs single user mode, perform the configuration steps described in Single User Mode Requirements.

Create Users

The Cloudera Manager Server and managed services need a user account to complete tasks. When installing Cloudera Manager from tarballs, you much create this user account on all hosts manually. Because Cloudera Manager Server and managed services are configured to use the user account cloudera-scm by default, creating a user with this name is the simplest approach. After creating such a user, it is automatically used after installation is complete.

To create a user cloudera-scm, use a command such as the following:
$ sudo useradd --system --home=/opt/cloudera-manager/cm-5.0/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
For the preceding useradd command, ensure the --home argument path matches your environment. This argument varies according to where you place the tarball and the version number varies among releases. For example, the --home location could be /opt/cm-5.0/run/cloudera-scm-server.

Create the Cloudera Manager Server Local Data Storage Directory

  1. Create the following directory: /var/lib/cloudera-scm-server.
  2. Change the owner of the directory so that the cloudera-scm user and group have ownership of the directory. For example:
    $ sudo mkdir /var/log/cloudera-scm-server
    $ sudo chown cloudera-scm:cloudera-scm /var/log/cloudera-scm-server

Configure Cloudera Manager Agents

  • On every Cloudera Manager Agent host, configure the Cloudera Manager Agent to point to the Cloudera Manager Server by setting the following properties in the tarball_root/etc/cloudera-scm-agent/config.ini configuration file:
    Property Description
    server_host Name of the host where Cloudera Manager Server is running.
    server_port Port on the host where Cloudera Manager Server is running.
  • By default, a tarball install has a var subdirectory where state is stored that in a non-tarball install is stored in /var. Cloudera recommends that you reconfigure the tarball install to use an external directory as the /var equivalent (/var or any other directory outside the tarball) so that when you upgrade Cloudera Manager, the new tarball installation can access this state. Configure the install to use an external directory for storing state by editing tarball_root/etc/default/cloudera-scm-agent and setting the CMF_VAR variable to the location of the /var equivalent. If you don't reuse the state directory between different tarball installations, the potential exists for duplicate Cloudera Manager Agent entries to occur in the Cloudera Manager database.

Custom Cloudera Manager Users and Directories

Cloudera Manager is built to use a default set of directories and user accounts. You can use the default locations and accounts, but there is also the option to change these settings. In some cases, changing these settings is required. For most installations, you can skip ahead to Configure a Database for the Cloudera Manager Server. By default, Cloudera Manager services creates directories in /var/log and /var/lib. The directories the Cloudera Manager installer attempts to create are:
  • /var/log/cloudera-scm-headlamp
  • /var/log/cloudera-scm-firehose
  • /var/log/cloudera-scm-alertpublisher
  • /var/log/cloudera-scm-eventserver
  • /var/lib/cloudera-scm-headlamp
  • /var/lib/cloudera-scm-firehose
  • /var/lib/cloudera-scm-alertpublisher
  • /var/lib/cloudera-scm-eventserver
  • /var/lib/cloudera-scm-server
If you are using a custom user and directory for Cloudera Manager, you must create these directories on the Cloudera Manager Server host and assign ownership of these directories to your user manually. Issues might arise if any of these directories already exist. The Cloudera Manager installer makes no changes to existing directories. In such a case, Cloudera Manager is unable to write to any existing directories for which it does not have proper permissions and services may not perform as expected. To resolve such situations, do one of the following:
  • Change ownership of existing directories:
    1. Change the directory owner to the Cloudera Manager user. If the Cloudera Manager user and group are cloudera-scm and you needed to take ownership of the headlamp log directory, you would issue a command similar to the following:
      $ sudo chown -R cloudera-scm:cloudera-scm /var/log/cloudera-scm-headlamp
    2. Repeat the process of using chown to change ownership for all existing directories to the Cloudera Manager user.
  • Use alternate directories for services:
    1. If the directories you plan to use do not exist, create them now. For example to create /var/cm_logs/cloudera-scm-headlamp for use by the cloudera-scm user, you might use the following commands:
      sudo mkdir /var/cm_logs/cloudera-scm-headlamp
      sudo chown cloudera-scm /var/cm_logs/cloudera-scm-headlamp
    2. Connect to the Cloudera Manager Admin Console.
    3. Select Clusters > Cloudera Management Service
    4. Select Scope > role name.
    5. Click the Configuration tab.
    6. Enter a term in the Search field to find the settings to be changed. For example, you might enter /var or directory.
    7. Update each value with the new locations for Cloudera Manager to use.
    8. Click Save Changes to commit the changes.

Configure a Database for the Cloudera Manager Server

Depending on whether you are using an external database, or the embedded PostgreSQL database, do one of the following:

Create Parcel Directories

  1. On the Cloudera Manager Server host, create a parcel repository directory:
    $ sudo mkdir -p /opt/cloudera/parcel-repo
  2. Change the directory ownership to be the username you are using to run Cloudera Manager:
    $ sudo chown username:groupname /opt/cloudera/parcel-repo
    where username and groupname are the user and group names (respectively) you are using to run Cloudera Manager. For example, if you use the default username cloudera-scm, you would give the command:
    $ chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
  3. On each cluster host, create a parcels directory:
    $ sudo mkdir -p /opt/cloudera/parcels
  4. Change the directory ownership to be the username you are using to run Cloudera Manager:
    $ sudo chown username:groupname /opt/cloudera/parcels
    where username and groupname are the user and group names (respectively) you are using to run Cloudera Manager. For example, if you use the default username cloudera-scm, you would give the command:
    $ sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcels

Start the Cloudera Manager Server

The way in which you start the Cloudera Manager Server varies according to what account you want the server to run under:
  • As root:
    $ sudo tarball_root/etc/init.d/cloudera-scm-server start 
  • As another user. If you run as another user, ensure the user you created for Cloudera Manager owns the location to which you extracted the tarball including the newly created database files. If you followed the earlier examples and created the directory /opt/cloudera-manager and the user cloudera-scm, you could use the following command to change ownership of the directory:
    $ sudo chown -R cloudera-scm:cloudera-scm /opt/cloudera-manager

    Once you have established ownership of directory locations, you can start Cloudera Manager Server using the user account you chose. For example, you might run the Cloudera Manager Server as cloudera-service. In such a case there are following options:

    • Run the following command:
      $ sudo -u cloudera-service tarball_root/etc/init.d/cloudera-scm-server start 
    • Edit the configuration files so the script internally changes the user. Then run the script as root:
      1. Remove the following line from tarball_root/etc/default/cloudera-scm-server:
        export CMF_SUDO_CMD=" "
      2. Change the user and group in tarball_root/etc/init.d/cloudera-scm-server to the user you want the server to run as. For example, to run as cloudera-service, change the user and group as follows:
        USER=cloudera-service
        GROUP=cloudera-service
      3. Run the server script as root:
        $ sudo tarball_root/etc/init.d/cloudera-scm-server start 
  • To start the Cloudera Manager Server automatically after a reboot:
    1. Run the following commands on the Cloudera Manager Server host:
      • RHEL-compatible and SLES
        $ cp tarball_root/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server
        $ chkconfig cloudera-scm-server on
      • Debian/Ubuntu
        $ cp tarball_root/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server
        $ update-rc.d cloudera-scm-server defaults
    2. On the Cloudera Manager Server host, open the /etc/init.d/cloudera-scm-server file and change the value of CMF_DEFAULTS from ${CMF_DEFAULTS:-/etc/default} to tarball_root/etc/default.
If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.

Start the Cloudera Manager Agents

The way in which you start the Cloudera Manager Agent varies according to what account you want the Agent to run under:
  • To start the Cloudera Manager Agent, run this command on each Agent host:
    $ sudo tarball_root/etc/init.d/cloudera-scm-agent start
    When the Agent starts, it contacts the Cloudera Manager Server.
  • If you are running single user mode, start Cloudera Manager Agent using the user account you chose. For example, you might run the Cloudera Manager Agent as cloudera-scm. In such a case there are following options:
    • Run the following command:
      $ sudo -u cloudera-scm tarball_root/etc/init.d/cloudera-scm-agent start 
    • Edit the configuration files so the script internally changes the user, then run the script as root:
      1. Remove the following line from tarball_root/etc/default/cloudera-scm-agent:
        export CMF_SUDO_CMD=" "
      2. Change the user and group in tarball_root/etc/init.d/cloudera-scm-agent to the user you want the Agent to run as. For example, to run as cloudera-scm, change the user and group as follows:
        USER=cloudera-scm
        GROUP=cloudera-scm
      3. Run the Agent script as root:
        $ sudo tarball_root/etc/init.d/cloudera-scm-agent start 
  • To start the Cloudera Manager Agents automatically after a reboot:
    1. Run the following commands on each Agent host:
      • RHEL-compatible and SLES
        $ cp tarball_root/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent
        $ chkconfig cloudera-scm-agent on
      • Debian/Ubuntu
        $ cp tarball_root/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent
        $ update-rc.d cloudera-scm-agent defaults
    2. On each Agent, open the tarball_root/etc/init.d/cloudera-scm-agent file and change the value of CMF_DEFAULTS from ${CMF_DEFAULTS:-/etc/default} to tarball_root/etc/default.

Install Dependencies

When installing with tarballs and parcels, some services may require additional dependencies that are not provided by Cloudera. On each host, install required packages:
  • Red-hat compatible
    • chkconfig
    • python (2.7 required for CDH 5)
    • bind-utils
    • psmisc
    • libxslt
    • zlib
    • sqlite
    • cyrus-sasl-plain
    • cyrus-sasl-gssapi
    • fuse
    • portmap
    • fuse-libs
    • redhat-lsb
  • SLES
    • chkconfig
    • python (2.7 required for CDH 5)
    • bind-utils
    • psmisc
    • libxslt
    • zlib
    • sqlite
    • cyrus-sasl-plain
    • cyrus-sasl-gssapi
    • fuse
    • portmap
    • python-xml
    • libfuse2
  • Debian/Ubuntu
    • lsb-base
    • psmisc
    • bash
    • libsasl2-modules
    • libsasl2-modules-gssapi-mit
    • zlib1g
    • libxslt1.1
    • libsqlite3-0
    • libfuse2
    • fuse-utils or fuse
    • rpcbind

Start and Log into the Cloudera Manager Admin Console

The Cloudera Manager Server URL takes the following form http://Server host:port, where Server host is the fully-qualified domain name or IP address of the host where the Cloudera Manager Server is installed and port is the port configured for the Cloudera Manager Server. The default port is 7180.
  1. Wait several minutes for the Cloudera Manager Server to complete its startup. To observe the startup process you can perform tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
  2. In a web browser, enter http://Server host:7180, where Server host is the fully-qualified domain name or IP address of the host where the Cloudera Manager Server is running. The login screen for Cloudera Manager Admin Console displays.
  3. Log into Cloudera Manager Admin Console. The default credentials are: Username: admin Password: admin. Cloudera Manager does not support changing the admin username for the installed account. You can change the password using Cloudera Manager after you run the installation wizard. While you cannot change the admin username, you can add a new user, assign administrative privileges to the new user, and then delete the default admin account.

Choose Cloudera Manager Edition and Hosts

  1. When you start the Cloudera Manager Admin Console, the install wizard starts up. Click Continue to get started.
  2. Choose which edition to install:
    • Cloudera Express, which does not require a license, but provides a somewhat limited set of features.
    • Cloudera Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed
    • Cloudera Enterprise with one of the following license types:
      • Basic Edition
      • Flex Edition
      • Data Hub Edition
    If you choose Cloudera Express or Cloudera Enterprise Data Hub Edition Trial, you can elect to upgrade the license at a later time. See Managing Licenses.
  3. If you have elected Cloudera Enterprise, install a license:
    1. Click Upload License.
    2. Click the document icon to the left of the Select a License File text field.
    3. Navigate to the location of your license file, click the file, and click Open.
    4. Click Upload.
    Click Continue to proceed with the installation.
  4. Click Continue in the next screen. The Specify Hosts page displays.
  5. Click the Currently Managed Hosts tab.
  6. Choose the hosts to add to the cluster.
  7. Click Continue. The Select Repository page displays.

Choose Software Installation Method and Install Software

  1. Click Use Parcels to install CDH and managed services using parcels and then do the following:
    1. Use Parcels
      1. Choose the parcels to install. The choices you see depend on the repositories you have chosen – a repository may contain multiple parcels. Only the parcels for the latest supported service versions are configured by default.
        You can add additional parcels for previous versions by specifying custom repositories. For example, you can find the locations of the previous CDH 4 parcels at https://username:password@archive.cloudera.com/p/cdh4/parcels/. Or, if you are installing CDH 4.3 and want to use policy-file authorization, you can add the Sentry parcel using this mechanism.
        1. To specify the parcel directory, local parcel repository, add a parcel repository, or specify the properties of a proxy server through which parcels are downloaded, click the More Options button and do one or more of the following:
          • Parcel Directory and Local Parcel Repository Path - Specify the location of parcels on cluster hosts and the Cloudera Manager Server host. If you change the default value for Parcel Directory and have already installed and started Cloudera Manager Agents, restart the Agents:
            $ sudo service cloudera-scm-agent restart
          • Parcel Repository - In the Remote Parcel Repository URLs field, click the button and enter the URL of the repository. The URL you specify is added to the list of repositories listed in the Configuring Cloudera Manager Server Parcel Settings page and a parcel is added to the list of parcels on the Select Repository page. If you have multiple repositories configured, you will see all the unique parcels contained in all your repositories.
          • Proxy Server - Specify the properties of a proxy server.
        2. Click OK.
    2. Click Continue. Cloudera Manager installs the CDH and managed service parcels. During the parcel installation, progress is indicated for the phases of the parcel installation process in separate progress bars. If you are installing multiple parcels you will see progress bars for each parcel. When the Continue button at the bottom of the screen turns blue, the installation process is completed. Click Continue.
  2. Click Continue. The Host Inspector runs to validate the installation, and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Finish. The Cluster Setup screen displays.

Add Services

The following instructions describe how to use the Cloudera Manager wizard to configure and start CDH and managed services.

  1. In the first page of the Add Services wizard you choose the combination of services to install and whether to install Cloudera Navigator:
    • Click the radio button next to the combination of services to install:
      CDH 4 CDH 5
      • Core Hadoop - HDFS, MapReduce, ZooKeeper, Oozie, Hive, and Hue
      • Core with HBase
      • Core with Impala
      • All Services - HDFS, MapReduce, ZooKeeper, HBase, Impala, Oozie, Hive, Hue, and Sqoop
      • Custom Services - Any combination of services.
      • Core Hadoop - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, and Sqoop
      • Core with HBase
      • Core with Impala
      • Core with Search
      • Core with Spark
      • All Services - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, Sqoop, HBase, Impala, Solr, Spark, and Key-Value Store Indexer
      • Custom Services - Any combination of services.
      As you select the services, keep the following in mind:
      • Some services depend on other services; for example, HBase requires HDFS and ZooKeeper. Cloudera Manager tracks dependencies and installs the correct combination of services.
      • In a Cloudera Manager deployment of a CDH 4 cluster, the MapReduce service is the default MapReduce computation framework. Choose Custom Services to install YARN or use the Add Service functionality to add YARN after installation completes.
      • In a Cloudera Manager deployment of a CDH 5 cluster, the YARN service is the default MapReduce computation framework. Choose Custom Services to install MapReduce or use the Add Service functionality to add MapReduce after installation completes.
      • The Flume service can be added only after your cluster has been set up.
    • If you have chosen Data Hub Edition Trial or Cloudera Enterprise, optionally select the Include Cloudera Navigator checkbox to enable Cloudera Navigator. See the Cloudera Navigator Documentation.
    Click Continue. The Customize Role Assignments screen displays.
  2. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same set of hosts to which the HDFS DataNode role is assigned. These assignments are typically acceptable, but you can reassign them if necessary.

    Click a field below a role to display a dialog containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts or Custom to display the pageable hosts dialog.

    The following shortcuts for specifying hostname patterns are supported:
    • Range of hostnames (without the domain portion)
      Range Definition Matching Hosts
      10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
      host[1-3].company.com host1.company.com, host2.company.com, host3.company.com
      host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com
    • IP addresses
    • Rack name

    Click the View By Host button for an overview of the role assignment by hostname ranges.

  3. When you are satisfied with the assignments, click Continue. The Database Setup screen displays.
  4. On the Database Setup page, configure settings for required databases:
    1. Enter the database host, database type, database name, username, and password for the database that you created when you set up the database.
    2. Click Test Connection to confirm that Cloudera Manager can communicate with the database using the information you have supplied. If the test succeeds in all cases, click Continue; otherwise check and correct the information you have provided for the database and then try the test again. (For some servers, if you are using the embedded database, you will see a message saying the database will be created at a later step in the installation process.) The Review Changes screen displays.
  5. Review the configuration changes to be applied. Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed. Click Continue. The wizard starts the services.
  6. When all of the services are started, click Continue. You will see a success message indicating that your cluster has been successfully started.
  7. Click Finish to proceed to the Cloudera Manager Admin Console Home Page.

(Optional) Change the Cloudera Manager User

After configuring your services, the installation wizard attempts to automatically start the Cloudera Management Service under the assumption that it will run using cloudera-scm. If you configured this service to run using a user other than cloudera-scm, then the Cloudera Management Service roles do not start automatically. In such a case, change the service configuration to use the user account that you selected:
  1. Connect to the Cloudera Manager Admin Console.
  2. Do one of the following:
    • Select Clusters > Cloudera Management Service > Cloudera Management Service.
    • On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management Service link.
  3. Click the Configuration tab.
  4. Use the search box to find the property to be changed. For example, you might enter "system" to find the System User and System Group properties.
  5. Make any changes required to the System User and System Group to ensure Cloudera Manager uses the proper user accounts.
  6. Click Save Changes.
After making this configuration change, manually start the Cloudera Management Service roles.

Change the Default Administrator Password

As soon as possible after running the wizard and beginning to use Cloudera Manager, change the default administrator password:
  1. Right-click the logged-in username at the far right of the top navigation bar and select Change Password.
  2. Enter the current password and a new password twice, and then click Update.

Test the Installation

You can test the installation following the instructions in Testing the Installation.