Cloudera Manager 5.2.0

A Single Interface to Manage Your CDH Cluster

Cloudera Manager is a unified management interface that makes it easy to install, configure, and manage a CDH cluster. It automatically ships with Cloudera Enterprise or Cloudera Express to help you get up and running with Hadoop faster.

Cloudera Manager 5.2.0 works with both CDH 4 and CDH 5 and is available with:
Cloudera Express - Easily deploy, manage, monitor and perform diagnostics on your Hadoop cluster
Cloudera Enterprise – Includes all the above capabilities, plus advanced management features and support including zero downtime upgrades and backup and disaster recovery.

Cloudera Manager is the recommended tool for installing Cloudera Enterprise or Cloudera Express. It automatically downloads with Cloudera Enterprise or Cloudera Express. Cloudera Manager with Enterprise requires a license.

When installing Cloudera Express you will have the option to unlock Cloudera Enterprise features for a free 60-day trial.

Once the trial has concluded, the Cloudera Enterprise features will be disabled until you obtain and upload a license.


What's New in Cloudera Manager 5.2.0

  • Services - the following new services have been added:
    • Isilon - supports the EMC Isilon distributed filesystem
    • KMS - the Java keystore-based key management server
    • Key Trustee - the enterprise-grade key management server using Navigator Key Trustee
    • Spark - running Spark applications on YARN. The existing Spark service has been renamed Spark (Standalone).
  • Accumulo - Kerberos authentication is now supported. If you have been using advanced configuration snippets (safety valves) to configure Kerberos with Accumulo, you may now remove those settings and have Cloudera Manager generate the principal and keytab file for you.
  • HDFS Data at Rest Encryption
      Important: The HDFS Data at Rest Encryption feature included in CDH 5.2.0 has several known limitations. Therefore, Cloudera does not currently support this feature and it is not recommended for production use. If you're interested in trying the feature out in a test environment, contact your account team.
    HDFS now implements transparent, end-to-end encryption of data read from and written to HDFS by creating encryption zones. An encryption zone is a directory in HDFS with all of its contents, that is, every file and subdirectory in it, encrypted. You can use one of the following services to store, manage, and access encryption zone keys:
    • KMS - The Hadoop Key Management Service with a file-based Java keystore; maintains a single copy of keys, using simple password-based protection.
    • Key Trustee - The enterprise-grade key management service that replaces the file-based Java key store by leveraging the advanced key-management capabilities of Navigator Key Trustee. Navigator Key Trustee is designed for secure, authenticated administration and cryptographically strong storage of keys on multiple redundant servers that can be located outside the cluster.
    For more information, see HDFS Data At Rest Encryption.
  • HBase - Support for configuring hedged reads has been added for HBase. The default configuration is to turn hedged reads off. Cloudera Manager will emit two properties, dfs.client.hedged.read.threadpool.size (default: 0) and dfs.client.hedged.read.threshold.millis (default: 500ms) to hbase-site.xml. For more information, see Hedged Reads
  • ZooKeeper - the RMI port can be configured. The port is configured using the JDK7 flag -Dcom.sun.management.jmxremote.rmi.port. The default value is set to be same as the JMX Agent port. Also, a special value of 0 or -1 disables the setting and a random port is used. The configuration has no effect on versions lower than Oracle JDK 7u4.
  • Cloudera Manager Agent configuration
    • The supervisord port can now be configured in the Agent configuration supervisord_port. The change takes effect the next time supervisord is restarted (not simply when the Agent is restarted).
    • Added an Agent configuration local_filesystem_whitelist that allows configuring the list of local filesystems that should always be monitored.
  • Proxy user configuration
    • All services' proxy user configuration properties have been moved to the HDFS service. Other services running on the cluster inherit the configuration values provided in HDFS. If you have previously configured a service to have values different from those configured in HDFS, then the proxy user configuration properties will be moved to that service's Advanced Configuration Snippet (Safety Valve) for core-site.xml to retain existing behavior.

      Oozie and Solr are exceptions to this. Oozie proxy user configuration properties have been moved to Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml if they differ from HDFS. Solr proxy user configuration properties have been moved to Solr Service Environment Advanced Configuration Snippet (Safety Valve) if they differ from HDFS.

  • Resource management - YARN and Llama integrated resource management and Llama high availability wizard.
  • New and changed user roles - BDR Administrator, Cluster Administrator, Navigator Administrator, and User Administrator. The Administrator role has been renamed Full Administrator. See Cloudera Manager User Accounts.
  • Configuration UI
    • Cluster-wide configuration - you can view all modified settings and configure log directories, disk space thresholds, and port settings.
    • New configuration layout - the new layout provides an alternate way to view configuration pages. In the classic layout, pages are organized by role group and categories within the role groups. The new layout allows you to filter on configuration status, category, and scope. On each configuration page you can easily switch between the classic and new layout.
        Important: The classic layout is the default. All the configuration procedures described in the Cloudera Manager documentation assume the classic layout.

Cloudera Manager Requirements

Cloudera Manager interacts with numerous entities such as operating systems, databases, and browsers. This topic provides information about which major release version and minor release version of each entity is supported. After installing each entity, upgrade to the latest patch version and apply any other appropriate updates. An available update may be specific to the operating system on which it is installed. For example, if you are using CentOS in your environment, you could choose 6 as the major version and 4 as the minor version to indicate that you are using CentOS 6.4. After installing this operating system, apply all relevant CentOS 6.4 upgrades and patches.

The following sections describe various requirements for Cloudera Manager.

  Note: In some cases, such as some browsers, a minor version may not be listed.

Supported Operating Systems

Cloudera Manager supports the following operating systems:
  • RHEL-compatible systems
    • Red Hat Enterprise Linux and CentOS 5.7, 64-bit
    • Red Hat Enterprise Linux and CentOS 5.10, 64-bit
    • Red Hat Enterprise Linux and CentOS 6.4, 64-bit
    • Red Hat Enterprise Linux and CentOS 6.4 in SE Linux Mode
    • Red Hat Enterprise Linux and CentOS 6.5, 64-bit
    • Oracle Enterprise Linux 5.6 (UEK R2), 64-bit
    • Oracle Enterprise Linux 6.4 (UEK R2), 64-bit
    • Oracle Enterprise Linux 6.5 (UEK R2, UEK R3), 64-bit
  • SLES - SUSE Linux Enterprise Server 11, 64-bit. Service Pack 2 or later is required for CDH 5, and Service Pack 1 or later is required for CDH 4. To use the embedded PostgreSQL database that is installed when you follow Installation Path A - Automated Installation by Cloudera Manager, the Updates repository must be active. The SUSE Linux Enterprise Software Development Kit 11 SP1 is required on hosts running the Cloudera Manager Agents.
  • Debian - Wheezy (7.0 and 7.1), Squeeze (6.0) (deprecated), 64-bit
  • Ubuntu - Trusty (14.04), Precise (12.04), Lucid (10.04) (deprecated), 64-bit
  Note:
  • Debian Squeeze and Ubuntu Lucid are supported only for CDH 4.
  • Using the same version of the same operating system on all cluster hosts is strongly recommended.

Supported JDK Versions

Cloudera Manager supports Oracle JDK1.7.0_67 and Oracle JDK 1.6.0_31, and optionally installs them during installation and upgrade. For further information, see Java Development Kit Installation.

Supported Browsers

The Cloudera Manager Admin Console, which you use to install, configure, manage, and monitor services, supports the following browsers:
  • Firefox 11 or later
  • Google Chrome
  • Internet Explorer 9 or later
  • Safari 5 or later

Supported Databases

Cloudera Manager requires several databases. The Cloudera Manager Server stores information about configured services, role assignments, configuration history, commands, users, and running processes in a database of its own. You must also specify a database for the Activity Monitor and Reports Manager management services.

The database you use must be configured to support UTF8 character set encoding. The embedded PostgreSQL database that is installed when you follow Installation Path A - Automated Installation by Cloudera Manager automatically provides UTF8 encoding. If you install a custom database, you may need to enable UTF8 encoding. The commands for enabling UTF8 encoding are described in each database topic under Cloudera Manager and Managed Service Databases.

After installing a database, upgrade to the latest patch version and apply any other appropriate updates. Available updates may be specific to the operating system on which it is installed.

Cloudera Manager and its supporting services can use the following databases:
Database OS CDH Version
MySQL 5.0 RHEL 5.7 CDH 4
MySQL 5.1 RHEL 6.4, Ubuntu 10.04, Debian 6.0 CDH 4
MySQL 5.5 SLES 11, Ubuntu 12.04, Debian 7.0 CDH 5
MySQL 5.6 CentOS 6.4 CDH 5.1
Oracle 11gr2 Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 CDH 4 and CDH 5
PostgreSQL 8.4 RHEL5.7, RHEL 6.4, Ubuntu 10.04, Ubuntu 12.04, Debian 6.0 CDH 4
PostgreSQL 9.1 SLES 11, Debian 7.0 CDH 5
PostgreSQL 9.2 CentOS 6.4 CDH 5.1
PostgreSQL 9.3 CentOS 6.4 CDH 5.2

For information about the databases supported by CDH, see CDH 4 Supported Databases and CDH 5 Supported Databases.

Supported CDH and Managed Service Versions

The following versions of CDH and managed services are supported:
  Warning: Cloudera Manager 5 does not support CDH 3 and you cannot upgrade Cloudera Manager 4 to Cloudera Manager 5 if you have a cluster running CDH 3. Therefore, to upgrade CDH 3 clusters to CDH 4 using Cloudera Manager you must use Cloudera Manager 4.
  • CDH 4 and CDH 5. The latest released versions of CDH 4 and CDH 5 are strongly recommended. For information on CDH 4 requirements, see CDH 4 Requirements and Supported Versions. For information on CDH 5 requirements, see CDH 5 Requirements and Supported Versions.
  • Cloudera Impala - Cloudera Impala is included with CDH 5. Cloudera Impala 1.2.1 with CDH 4.1.0 or later. For more information on Cloudera Impala requirements with CDH 4, see Cloudera Impala Requirements.
  • Cloudera Search - Cloudera Search is included with CDH 5. Cloudera Search 1.2.0 with CDH 4.6.0. For more information on Cloudera Search requirements with CDH 4, see Cloudera Search Requirements.
  • Apache Spark - 0.90 or later with CDH 4.4.0 or later.
  • Apache Accumulo - 1.4.3 with CDH 4.3.0, 1.4.4 with CDH 4.5.0, and 1.6.0 with CDH 4.6.0.
For more information, see the Product Compatibility Matrix.

Resource Requirements

Cloudera Manager requires the following resources:
  • Disk Space
    • Cloudera Manager Server
      • 5 GB on the partition hosting /var.
      • 500 MB on the partition hosting /usr.
      • For parcels, the space required depends on the number of parcels you download to the Cloudera Manager Server and distribute to Agent hosts. You can download multiple parcels of the same product, of different versions and builds. If you are managing multiple clusters, only one parcel of a product/version/build/distribution is downloaded on the Cloudera Manager Server—not one per cluster. In the local parcel repository on the Cloudera Manager Server, the approximate sizes of the various parcels are as follows:
        • CDH 4.6 - 700 MB per parcel; CDH 5 - 1 GB per parcel
        • Impala - 200 MB per parcel
        • Solr - 400 MB per parcel
    • Cloudera Management Service - The Host Monitor and Service Monitor databases are stored on the partition hosting /var. Ensure that you have at least 20 GB available on this partition. For more information, see Data Storage for Monitoring Data.
    • Agents - On Agent hosts each unpacked parcel requires about three times the space of the downloaded parcel on the Cloudera Manager Server. By default unpacked parcels are located in /opt/cloudera/parcels.
  • RAM - 4 GB is recommended for most cases and is required when using Oracle databases. 2 GB may be sufficient for non-Oracle deployments with fewer than 100 hosts. However, to run the Cloudera Manager Server on a machine with 2 GB of RAM, you must tune down its maximum heap size (by modifying -Xmx in /etc/default/cloudera-scm-server). Otherwise the kernel may kill the Server for consuming too much RAM.
  • Python - Cloudera Manager uses Python. All supported operating systems include Python version 2.4 or later. Cloudera Manager and CDH 4 require Python 2.4 or later, but Hue in CDH 5 requires Python 2.6 or 2.7.

Networking and Security Requirements

The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:

  • Cluster hosts must have a working network name resolution system and correctly formatted /etc/hosts file. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must
    • Contain consistent information about hostnames and IP addresses across all hosts
    • Not contain uppercase hostnames
    • Not contain duplicate IP addresses
    A properly formatted /etc/hosts file should be similar to the following example:
    127.0.0.1	localhost.localdomain	localhost
    192.168.1.1	cluster-01.example.com	cluster-01
    192.168.1.2	cluster-02.example.com	cluster-02
    192.168.1.3	cluster-03.example.com	cluster-03 
  • In most cases, the Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation or upgrade wizard. You must log in using a root account or an account that has password-less sudo permission. For authentication during the installation and upgrade procedures, you must either enter the password or upload a public and private key pair for the root or sudo user account. If you want to use a public and private key pair, the public key must be installed on the cluster hosts before you use Cloudera Manager.

    Cloudera Manager uses SSH only during the initial install or upgrade. Once the cluster is set up, you can disable root SSH access or change the root password. Cloudera Manager does not save SSH credentials, and all credential information is discarded when the installation is complete. For more information, see Permission Requirements.

  • The Cloudera Manager Agent runs as root so that it can make sure the required directories are created and that processes and files are owned by the appropriate user (for example, the hdfs and mapred users).
  • No blocking is done by Security-Enhanced Linux (SELinux).
  • IPv6 must be disabled.
  • No blocking by iptables or firewalls; port 7180 must be open because it is used to access Cloudera Manager after installation. Cloudera Manager communicates using specific ports, which must be open.
  • For RedHat and CentOS, the /etc/sysconfig/network file on each host must contain the hostname you have just set (or verified) for that host.
  • Cloudera Manager and CDH use several user accounts and groups to complete their tasks. The set of user accounts and groups varies according to the components you choose to install. Do not delete these accounts or groups and do not modify their permissions and rights. Ensure that no existing systems prevent these accounts and groups from functioning. For example, if you have scripts that delete user accounts not in a whitelist, add these accounts to the list of permitted accounts. Cloudera Manager, CDH, and managed services create and use the following accounts and groups:
    Account Type Product
    cloudera-scm
    User and group Cloudera Manager
    flume
    User and group CDH 4, CDH 5
    hadoop
    Group CDH 4, CDH 5
    hbase
    User and group CDH 4, CDH 5
    hdfs
    User and group. Must also be a member of the hadoop group. CDH 4, CDH 5
    hive
    User and group CDH 4, CDH 5
    httpfs
    User and group CDH 4, CDH 5
    hue
    User and group CDH 4, CDH 5
    impala
    User and group. Must also be member of the hdfs and hive groups. CDH 4.1 or later, CDH 5
    llama
    User and group CDH 5
    mapred
    User and group. Must also be a member of the hadoop group. CDH 4, CDH 5
    oozie
    User and group CDH 4, CDH 5
    solr
    User and group CDH 4.3 and later, CDH 5
    spark
    User and group Spark, CDH 5
    sqoop
    User and group CDH 4, CDH 5
    sqoop2
    User. Must be member of the sqoop group. CDH 4.2 and later, CDH 5
    yarn
    User and group CDH 4, CDH 5
    zookeeper
    User and group CDH 4, CDH 5

Cloudera Manager Quick Start Guide

This quick start guide describes how to quickly create a new installation of Cloudera Manager 5, CDH 5, and managed services on a cluster of four hosts. The resulting deployment can be used for demonstrations and proof of concept applications, but is not recommended for production.


Requirements

The four hosts in the cluster must satisfy the following requirements:
  • The hosts must have at least 10 GB RAM
  • You must have root or password-less sudo access to the hosts
  • If using root, the hosts must accept the same root password
  • The hosts must have Internet access to allow the wizard to install software from archive.cloudera.com
  • Run a supported OS:
    • RHEL-compatible systems
      • Red Hat Enterprise Linux and CentOS 5.7, 64-bit
      • Red Hat Enterprise Linux and CentOS 6.2, 64-bit
      • Red Hat Enterprise Linux and CentOS 6.4, 64-bit
      • Oracle Enterprise Linux 6.4, 64-bit
      • Oracle Enterprise Linux 5.6, 64-bit
    • SLES - SUSE Linux Enterprise Server 11, 64-bit. Service Pack 2 or later is required. The Updates repository must be active and SUSE Linux Enterprise Software Development Kit 11 SP1 is required.
    • Debian - Debian 7.0 and 7.1, 64-bit
    • Ubuntu - Ubuntu 12.04, 64-bit
If your environment does not satisfy these requirements, the procedure described in this guide may not be appropriate for you. For information about other Cloudera Manager installation options and requirements, see the Cloudera Manager Installation Guide.

Download and Run the Cloudera Manager Server Installer

  1. Download the Cloudera Manager installer binary from Cloudera Manager Downloads to the cluster host where you want to install the Cloudera Manager Server.
    1. Under Cloudera Manager 5.x downloads, click Download Cloudera Manager Express Edition.
    2. Optionally register and click Submit or click the Just take me to the download page link.
    The cloudera-manager-installer.bin file downloads.
  2. Change cloudera-manager-installer.bin to have executable permission.
    $ chmod u+x cloudera-manager-installer.bin
  3. Run the Cloudera Manager Server installer.
    $ sudo ./cloudera-manager-installer.bin
  4. Read the Cloudera Manager README and then press Return or Enter to choose Next.
  5. Read the Cloudera Manager Standard License and then press Return or Enter to choose Next. Use the arrow keys and press Return or Enter to choose Yes to confirm you accept the license.
  6. Read the Oracle Binary Code License Agreement and then press Return or Enter to choose Next.
  7. When the installation completes, the complete URL provided for the Cloudera Manager Admin Console, including the port number, which is 7180 by default. Press Return or Enter to choose OK to continue.
  8. Press Return or Enter to choose OK to exit the installer.
  9. On RHEL 5 and CentOS 5, install Python 2.6 or 2.7. Download the appropriate repository rpm packages to the Cloudera Manager Server host and then install Python using yum. For example, use the following commands:
    $ su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm'
    ...
    $ yum install python26
  Note: If the installation is interrupted for some reason, you may need to clean up before you can re-run it. See Uninstalling Cloudera Manager and Managed Software in the Cloudera Manager Installation Guide.

Start the Cloudera Manager Admin Console

  1. Wait several minutes for the Cloudera Manager Server to complete its startup. To observe the startup process you can perform tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
  2. In a web browser, enter http://Server host:7180, where Server host is the fully-qualified domain name or IP address of the host where you installed the Cloudera Manager Server. The login screen for Cloudera Manager Admin Console displays.
  3. Log into Cloudera Manager Admin Console with the credentials: Username: admin Password: admin.

Install and Configure Software Using the Cloudera Manager Wizard

Installing and configuring Cloudera Manager, CDH, and managed service software on the cluster hosts involves the following three main steps.

Choose Cloudera Manager Edition and Specify Hosts

  1. Choose Cloudera Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed. The trial allows you to create all CDH and managed services supported by Cloudera Manager. Click Continue.
  2. Information is displayed indicating what edition of Cloudera Manager will be installed and the services you can choose from. Click Continue.
  3. Specify the hosts on which to install CDH and managed services. You can specify hostnames and/or IP addresses and ranges, for example: 10.1.1.[1-4] or host[1-3].company.com. You can specify multiple addresses and address ranges by separating them by commas, semicolons, tabs, or blank spaces, or by placing them on separate lines.
  4. Click Search. Cloudera Manager identifies the hosts on your cluster. Verify that the number of hosts shown matches the number of hosts where you want to install services. Deselect host entries that do not exist and deselect the hosts where you do not want to install services. Click Continue. The Select Repository page displays.

Install CDH and Managed Service Software

  1. Keep the default distribution method Use Parcels and the default version of CDH 5.
  2. For the Cloudera Manager Agent, keep the default Matched release for this Cloudera Manager Server. Click Continue twice.
  3. Specify host SSH login properties:
    1. Keep the default login root or enter the user name for an account that has password-less sudo permission.
    2. If you choose to use password authentication, enter and confirm the password.
    Click Continue. Cloudera Manager installs the Oracle JDK and the Cloudera Manager Agent packages on each host and starts the Agent.
  4. Click Continue. Cloudera Manager installs CDH. During the parcel installation, progress is indicated for the two phases of the parcel installation process (Download and Distribution) in a separate progress bars. When the Continue button appears at the bottom of the screen, the installation process is completed.
  5. Click Continue. The Host Inspector runs to validate the installation, and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Finish. The Cluster Setup page displays.

Add and Configure Services

  1. Click the All Services radio button to create HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, Sqoop, HBase, Impala, Solr, Spark, and Key-Value Store Indexer services. Click Continue. The Customize Role Assignments page displays.
  2. Configure the following role assignments:
    • Click the text field under the HBase Thrift Server role. In the host selection dialog that displays, check the checkbox next to any host and click OK at the bottom right.
    • Click the text field under the Server role of the ZooKeeper service. In the host selection dialog that displays, uncheck the checkbox next to the host assigned by default (the master host) and check checkboxes next to the remaining three hosts. Click OK at the bottom right.
    Click Continue. The Database Setup page displays.
  3. Leave the default setting of Use Embedded Database to have Cloudera Manager create and configure all required databases in an embedded PostgreSQL database. Click Test Connection. When the test completes with a , click Continue. The Review Changes page displays.
  4. Review the configuration changes to be applied. Click Continue. The Command Progress page displays.
  5. The wizard performs 32 steps to configure and starts the services. When the startup completes, click Continue.
  6. A success message displays indicating that the cluster has been successfully started. Click Finish to proceed to the Home page.

Test the Installation

The Home page should look something like this:

On the left side of the screen is a list of services currently running with their status information. All the services should be running with Good Health , however there may be a small number of configuration warnings indicated by a wrench icon and a numbered badge , which you can ignore.

You can click each service to view more detailed information about the service. You can also test your installation by running a MapReduce job or interacting with the cluster with a Hue application.

Running a MapReduce Job

  1. Log into a cluster host.
  2. Run the Hadoop PiEstimator example:
    sudo -u hdfs hadoop jar \ 
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
    pi 10 100
  3. View the result of running the job by selecting the following from the top navigation bar in the Cloudera Manager Admin Console: Clusters > Cluster 1 > YARN (MR2 Included) Applications. You will see an entry like the following:

Testing with Hue

A good way to test the cluster is by running a job. In addition, you can test the cluster by running one of the Hue web applications. Hue is a graphical user interface that allows you to interact with your clusters by running applications that let you browse HDFS, manage a Hive metastore, and run Hive, Impala, and Search queries, Pig scripts, and Oozie workflows.
  1. In the Cloudera Manager Admin Console Home page, click the Hue service.
  2. Click the Hue Web UI tab, which opens Hue in a new window.
  3. Log in with the credentials, username: hdfs, password: hdfs.
  4. Choose an application in the navigation bar at the top of the browser window.

For more information, see the Hue User Guide.