Installing Cloudera Manager and CDH

This section introduces options for installing Cloudera Manager, CDH, and managed services. You can install:

Cloudera Manager Deployment

A Cloudera Manager deployment consists of the following software components:
  • Oracle JDK
  • Cloudera Manager Server and Agent packages
  • Supporting database software
  • CDH and managed service software
This section describes the three main installation paths for creating a new Cloudera Manager deployment and the criteria for choosing an installation path. If your cluster already has an installation of a previous version of Cloudera Manager, follow the instructions in Cloudera Upgrade.
The Cloudera Manager installation paths share some common phases, but the variant aspects of each path support different user and cluster host requirements:
  • Demonstration and proof of concept deployments - There are three installation options:
    • Installation Path A - Automated Installation by Cloudera Manager (Non-Production Mode) - Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Server, embedded PostgreSQL database, Cloudera Manager Agent, CDH, and managed service software on cluster hosts. Cloudera Manager also configures databases for the Cloudera Manager Server and Hive Metastore and optionally for Cloudera Management Service roles. This path is recommended for demonstration and proof-of-concept deployments, but is not recommended for production deployments because its not intended to scale and may require database migration as your cluster grows. To use this method, server and cluster hosts must satisfy the following requirements:
      • Provide the ability to log in to the Cloudera Manager Server host using a root account or an account that has password-less sudo permission.
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirements for further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the required installation files.
    • Installation Path B - Installation Using Cloudera Manager Parcels or Packages - you install the Oracle JDK, Cloudera Manager Server, and embedded PostgreSQL database packages on the Cloudera Manager Server host. You have two options for installing Oracle JDK, Cloudera Manager Agent, CDH, and managed service software on cluster hosts: manually install it yourself or use Cloudera Manager to automate installation.
      In order for Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirements for further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the required installation files.
  • Production deployments - require you to first manually install and configure a production database for the Cloudera Manager Server and Hive Metastore. There are two installation options:
    • Installation Path B - Installation Using Cloudera Manager Parcels or Packages - you install the Oracle JDK and Cloudera Manager Server packages on the Cloudera Manager Server host. You have two options for installing Oracle JDK, Cloudera Manager Agent, CDH, and managed service software on cluster hosts: manually install it yourself or use Cloudera Manager to automate installation.
      In order for Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirements for further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the required installation files.
    • Installation Path C - Manual Installation Using Cloudera Manager Tarballs - you install the Oracle JDK, Cloudera Manager Server, and Cloudera Manager Agent software using tarballs and use Cloudera Manager to automate installation of CDH and managed service software as parcels.

Cloudera Manager Installation Phases

The following table describes the phases of installing Cloudera Manager and a Cloudera Manager deployment of CDH and managed services. Every phase is required, but you can accomplish each phase in multiple ways, depending on your organization's policies and requirements. The six phases are grouped into three installation paths based on how the Cloudera Manager Server and database software are installed on the Cloudera Manager Server and cluster hosts. The criteria for choosing an installation path are discussed in Cloudera Manager Deployment.

Cloudera Installation Phases
Phase      
Phase 1: Install JDK

Install the JDK required by Cloudera Manager Server, Management Service, and CDH.

There are two options:
  • Use the Cloudera Manager Installer to install a supported version of the Oracle JDK in /usr/java and on all hosts in the cluster.
  • Use the command line to manually install supported versions of the Oracle JDK and set the JAVA_HOME environment variable to the install directory on all hosts.
Phase 2: Set up Databases

Install, configure, and start the databases that are required by the Cloudera Manager Server, Cloudera Management Service, and that are optional for some CDH services.

There are two options:
  • Use the Cloudera Manager Installer to install, configure, and start an embedded PostgresSQL database.
  • Use command-line package installation tools like yum to install, configure, and install the database
  Path A Path B Path C
Phase 3: Install Cloudera Manager Server

Install and start Cloudera Manager Server on one host.

Use the Cloudera Manager Installer to install its packages and the server. Requires Internet access and sudo privileges on the host. Use Linux package install commands (like yum) to install Cloudera Manager Server.

Update database properties.

Use service commands to start Cloudera Manager Server.

Use Linux commands to unpack tarballs and service commands to start the server.
Phase 4: Install Cloudera Manager Agents

Install and start the Cloudera Manager Agent on all hosts.

Use the Cloudera Manager Installation wizard to install the Agents on all hosts. There are two options:
  • Use Linux package install commands (like yum) to install Cloudera Manager Agents on all hosts.
  • Use the Cloudera Manager Installation wizard to install the Agents on all hosts.
Use Linux commands to unpack tarballs and service commands to start the agents on all hosts.
Phase 5: Install CDH and Managed Service software

Install, configure, and start CDH and managed services on all hosts.

Use the Cloudera Manager Installation wizard to install CDH and other managed services. There are two options:
  • Use the Cloudera Manager Installation wizard to install CDH and other managed services.
  • Use Linux package install commands (like yum) to install CDH and other managed services on all hosts.
Use Linux commands to unpack tarballs and service commands to start CDH and managed services on all hosts.
Phase 6: Create, Configure and Start CDH and Managed Services

Configure and start CDH and managed services.

Use the Cloudera Manager Installation wizard to install CDH and other managed services, assign roles to hosts, and configure the cluster. Many configurations are automated. Use the Cloudera Manager Installation wizard to install CDH and other managed services, assign roles to hosts, and configure the cluster. Many configurations are automated. Use the Cloudera Manager Installation wizard to install CDH and other managed services, assign roles to hosts, and configure the cluster. Many configurations are automated.

You can also use the Cloudera Manager API to manage a cluster, which can be useful for scripting preconfigured deployments.

Cloudera Manager Installation Software

Cloudera Manager provides the following software for the supported installation paths:
  • Installation path A (non-production) - A small self-executing Cloudera Manager installation program to install the Cloudera Manager Server and other packages. The Cloudera Manager installer, which you install on the host where you want the Cloudera Manager Server to run, performs the following:
    1. Installs the package repositories for Cloudera Manager and the Oracle Java Development Kit (JDK).
    2. Installs the Cloudera Manager packages.
    3. Installs and configures an embedded PostgreSQL database for use by the Cloudera Manager Server, some Cloudera Management Service roles, some managed services, and Cloudera Navigator roles.
  • Installation paths B and C - Cloudera Manager package repositories for manually installing the Cloudera Manager Server, Agent, and embedded database packages.
  • Installation path B - The Cloudera Manager Installation wizard for automating installation of Cloudera Manager Agent package.
  • All installation paths - The Cloudera Manager Installation wizard for automating CDH and managed service installation and configuration on the cluster hosts. Cloudera Manager provides two methods for installing CDH and managed services: parcels and packages. Parcels simplify the installation process and allow you to download, distribute, and activate new versions of CDH and managed services from within Cloudera Manager. After you install Cloudera Manager and connect to the Cloudera Manager Admin Console for the first time, use the Cloudera Manager Installation wizard to:
    1. Discover cluster hosts.
    2. Optionally install the Oracle JDK.
    3. Optionally install CDH, managed service, and Cloudera Manager Agent software on cluster hosts.
    4. Select services.
    5. Map service roles to hosts.
    6. Edit service configurations.
    7. Start services.
If you abort the software installation process, the Installation wizard automatically reverts and rolls back the installation process for any uninstalled components. (Installation that has completed successfully on a host is not rolled back on that host.)

Unmanaged Deployment

In an deployment not managed by Cloudera Manager, you are responsible for managing all phases of the lifecycle of CDH and managed service components on each host: installation, configuration, and service lifecycle operations such as start and stop. This section describes alternatives for installing CDH 5 software in an unmanaged deployment.

  • Command-line methods:
    • Download and install the CDH 5 "1-click Install" package
    • Add the CDH 5 repository
    • Build your own CDH 5 repository
    If you use one of these command-line methods, the first (downloading and installing the "1-click Install" package) is recommended in most cases because it is simpler than building or adding a repository.
  • Tarball You can download a tarball from CDH downloads. Keep the following points in mind:
    • Installing CDH 5 from a tarball installs YARN.
    • In CDH 5, there is no separate tarball for MRv1. Instead, the MRv1 binaries, examples, and so on, are delivered in the Hadoop tarball. The scripts for running MRv1 are in the bin-mapreduce1 directory in the tarball, and the MRv1 examples are in the examples-mapreduce1 directory.
    See Installing and Deploying CDH Using the Command Line for detailed instructions for each of these options.