Cloudera Manager 5.4.3

Easily Manage Hadoop in Production

Cloudera Manager makes it easy manage Hadoop deployments of any scale in production. Quickly deploy, configure, and monitor your cluster through an intuitive UI - complete with rolling upgrades, backup and disaster recovery, and customizable alerting.

Cloudera Manager is available as an integrated and supported part of Cloudera Enterprise.

Cloudera Manager is the recommended tool for installing Cloudera Enterprise or Cloudera Express. It automatically downloads with Cloudera Enterprise or Cloudera Express. Cloudera Manager with Enterprise requires a license.

When installing Cloudera Express you will have the option to unlock Cloudera Enterprise features for a free 60-day trial.

Once the trial has concluded, the Cloudera Enterprise features will be disabled until you obtain and upload a license.


Issues Fixed in Cloudera Manager 5.4.3

Improve Impala queries coordinator node metrics handling

For Impala queries that returned very few rows, Cloudera Manager could fail to report information such as HDFS I/O metrics on the Impala Query Monitoring and Query Detail pages. The discrepancy was typically relatively small because those queries often did very little work.

 

Upgrade Wizard fails during CDH 5.4.1/Cloudera Manager 5.4.1 to CDH 5.4.3/Cloudera Manager 5.4.3 upgrade

The Upgrade Wizard no longer times out while waiting for Cloudera Manager Agents to detect the new CDH version.

 

Performance issues when changing configurations on HDFS

Fixed a performance issue where HDFS configuration pages responded slowly.

 

Typo in Cloudera Manager metrics reference

The word “Concerning” was misspelled in many metrics reference pages.

 

Issues with Navigator field audit_log_max_file_size

The log4j appender changed from RollingFileAppender to RollingFileWithoutDeleteAppender.

 

The Isilon client configuration core-site.xml file does not contain proxy users

The parameters are available in the Cloudera Manager Admin Console, but the configurations are not emitted in the core-site.xml file.

 

Solr gateway role should not have a log4j.properties advanced configuration snippet

The Solr gateway role does not have a log4j.properties file.

 

The Cloudera Manager Agent force_start's hard stop commands did not set all invariants

This resulted in NPE being reported in Cloudera Manager logs when accessing active and recent command operations.

 

Configuration staleness icons appear to be enabled for users in read-only role

When moused over, the icons change to a hand indicating that they are active. However, users in the read-only role cannot act on changed configurations.

 

Setting yarn.resourcemanager.am.max-retries throws error

Observed when setting the Maximum Number of Attempts for MapReduce Jobs and then setting ApplicationMaster Maximum Attempts, which also sets yarn.resourcemanager.am.max-retries.

 

Cloudera Manager reports the wrong value for Impala bytes read from cache

Instead of cached bytes it reported the value of short circuit bytes.

 

Fixed cross-site scripting vulnerabilities

A variety of possible cross-site scripting vulnerabilities have been fixed.

 

Location of Number of rows drop-down changed

On pages where multiple rows display, the drop-down menu where users select the number of rows to display on a page now appears at the bottom of all lists.

 

Minimum allowed value change for YARN property

The Max Shuffle Connections property now allows a value of 0, which indicates no limit on the number of connections.

 

Upgrade error

A bug was fixed that prevented upgrades from CDH 4.7.1 to CDH 5.4.3.

 

Change to Parcels page

On the Parcels page, the first cluster in the list is now automatically selected by default.

 

All Password Input Fields do not allow auto complete

All password input fields in Cloudera Manager do not allow auto complete.

 

TLS Keystore Configuration Error

It is no longer possible to delete the values of the Path to TLS Keystore File and Keystore Password properties and save them while the Use TLS Encryption for Admin Console property is enabled.

 

Host configuration properties and Agent restart messages

Some host configuration properties no longer incorrectly state that an Agent restart is required.

 

More detailed error messages for failed role migration

If there is a failure validating the NameNode or JournalNode data directories while migrating roles, Cloudera Manager now displays detailed error information, including error codes.

 

New property to configure Oozie shared library upload timeout

To prevent timeouts due to slow disks or networks, a new Oozie property, Oozie Upload ShareLib Command Timeout, has been added to set the timeout.

 

New Cluster-Wide Configuration Pages

The following new Cluster-Wide configuration pages have been added:
  • Databases
  • Local Data Directories
  • Local Data Files
  • Navigator Settings
  • Service Dependencies

To access these pages in Cloudera Manager, select Cluster > Cluster Name > Configuration.

 

Naming of Health Tests

The names of some Health Tests have changed to use consistent capitalization.

 

Impala Monitoring Queries for Per-node peak memory

Impala queries that report per-node peak memory were incorrect when the value is zero.

 

Enable Hive on Spark Property

The description of the Enable Hive on Spark property has been updated to remind the user that the Enable Spark on YARN property must also be selected.

 

Role Trigger property in Flume

Setting a value for the Flume Role Triggers property no longer causes validation warnings.

 

Restart of Service Monitor leaves files that can fill the disk

Restarts of the Service Monitor no longer leave extraneous copies of files that unnecessarily take up disk space.

 

HiveServer 2 properties omit Java options

Setting any of the following properties no longer causes Java options to be omitted:
  • Allow URIs in Database Policy File
  • HiveServer2 TLS/SSL Certificate Trust Store File
  • HiveServer2 TLS/SSL Certificate Trust Store Password

 

CDH Parcel distribution reports HTTP 503 errors

Cloudera Manager no longer displays HTTP 503 errors during distribution of the CDH parcel to a large cluster.

 

Diagnostic bundle reports incorrect status for SELinux

Diagnostic bundles sometimes reported SELinux as disabled when it was actually enabled. The bundle now reports the correct status.

 

Hue configuration warnings do not link to correct page

On the Cloudera Manager page that displays Hue configuration issues, the links now take the user to the correct page where the user can correct the configuration.

 

Date display in Cloudera Manager log viewer

The month and date have been added before the time value in logs displayed in Cloudera Manager.

 

Disabling Hive Metastore Canary Test

When you disable the Hive Metastore health test by deselecting the Hive Metastore Canary Health property, the Hive Canary is now also disabled.

 

Agent failure when TLS 1.0 is disabled

If TLS 1.0 is disabled, the Agent now tries to negotiate the connection using TLS 1.1 or TLS 1.2.

 

Slowness when displaying details of a stale configuration

The details page now displays more quickly when a user clicks on the Stale Configuration icon.

 

Slowness observed when accessing replication page in Cloudera Manager

When you access the replication page in Cloudera Manager, the page responds slowly due to a large number of replication history records. The number of displayed historical records has been changed from 100 to 20.

 

Log Searches for Cloudera Manager Server

Searching the Cloudera Manager Server logs now works as expected.

 

Failed TLS Configuration and Cloudera Manager Restart

If the TLS configuration has errors, Cloudera Manager now falls back to non-TLS operation when restarting.

 

New headers added

New headers have been added to Cloudera Manager HTTP response headers to protect against vulnerabilities.

 

Hive Logging property restored

The Enable Explain Logging (hive.log.explain.output) property was removed in an earlier release and is now included in the configurations.

 

Hive Metastore Update NameNodes Command

A 150 second timeout was removed from the Update Hive Metastore NameNodes command to prevent timeouts on deployments that use Hive extensively.

 

Kafka Parcel Installation

Cloudera Manager now correctly detects the Kafka version for parcel installation.

 

Agent restart failure

In a condition where an Agent restart was required due to a Hive configuration change and a subsequent disk failure, the Agent now restarts as expected.

 

Error message wording

Some Cloudera Manager error messages referred to Cloudera Manager as “CM”. These messages now use the full name “Cloudera Manager”.

 

Oozie metrics failures

Retrieval of Oozie metrics sometimes fails due to timeout issues which are now resolved.

 

NameNode Role Migration Failures

When a NameNode role migration fails due to the destination role data directories being non-empty or having incorrect permissions, you no longer need to complete the migration manually. An error message displays and you can now correct the problem and re-run the command.


AWS S3 HBase configuration property renamed to Amazon S3

Several configuration properties for HBase have been renamed from AWS S3 to Amazon S3, in order to use the correct product name.

 

NodeManager Host Resources page display for the NodeManager Recovery Directory

The NodeManager Recovery Directory now displays on the NodeManager host resources page.

 

Host Inspector page now includes link to Show Inspector Results

The Host Inspector page now displays a link to a page that displays detailed results.

 

Initialization Script Improvements

The Cloudera Manager Agent initialization script now checks correctly for running processes.

 

Default Value for Hue parameter changed

The default value for the Hue cherrypy_server_threads property has been changed from 10 to 50.

 

Express Installation Wizard Package Installation Page CDH Version

The Express Installation Wizard Package installation page no longer allows the user to proceed without selecting a CDH version.

 

Host Component page display

The Host Component page now displays the package version for the KMS Trustee Key Provider.

 

Installation Wizard hangs during package installation

The Installation Wizard hangs during a CDH package installation and the status displays as “Acquiring Installation Lock”. A bug was fixed where the Agent incorrectly failed to release a lock until the Agent is restarted.

 

Minimum allocation violation not caught by Cloudera Manager

NodeManager did not start because Cloudera Manager did not correctly validate memory and CPU settings against their minimum values.

 

Impala core dump directories are now configurable

Three new properties that specify the location of core dump directories have been added to the Impala configurations:
  • Catalog Server Core Dump Directory
  • Impala Daemon Core Dump Directory
  • StateStore Core Dump Directory

 

Typo in Sqoop DB path suffix (SqoopParams.DERBY_SUFFIX)

Sqoop 2 appears to lose data when upgrading to CDH 5.4. This is due to Cloudera Manager erroneously configuring the Derby path with "repositoy" instead of "repository". The correct path name is now used.

 

Agent fails when retrieving log files with very long messages

When searching or retrieving large log files using the Agent, the Agent no longer consumes near 100% CPU until it is restarted. This can also happen then the collect host statistics command is issued.

 

Automated Solr SSL configuration may fail silently

Cloudera Manager 5.4.1 offers simplified SSL configuration for Solr. This process uses a solrctl command to configure the urlSchemeSolr cluster property. The solrctl command produces the same results as the Solr REST API call /solr/admin/collections?action=CLUSTERPROP&name=urlScheme&val=https. For example, the call might appear as: https://example.com:8983/solr/admin/collections?action=CLUSTERPROP&name=urlScheme&val=https

Cloudera Manager automatically executes this command during Solr service startup. If this command fails, the Solr service startup now reports an error.

 

Removing the default value of a property fails

For example, when you access the Automatically Downloaded Parcels property on the following page: Home > Administration > Settings and remove the default CDH value, the following error message displays: "Could not find config to delete with template name: parcel_autodownload_products". This error has been fixed.

Supported Operating Systems

Cloudera Manager supports the following operating systems:
  • RHEL-compatible
    • Red Hat Enterprise Linux and CentOS
      • 5.7, 64-bit
      • 6.4, 64-bit
      • 6.5 in SE Linux mode
      • 6.5, 64-bit
      • 6.6, 64-bit
    • Oracle Enterprise Linux with default kernel and Unbreakable Enterprise Kernel, 64-bit
      • 5.6 (UEK R2)
      • 6.4 (UEK R2)
      • 6.5 (UEK R2, UEK R3)
      • 6.6 (UEK R3)
  • SLES - SUSE Linux Enterprise Server 11, 64-bit. Service Pack 2 or later is required for CDH 5, and Service Pack 1 or later is required for CDH 4. To use the embedded PostgreSQL database that is installed when you follow Installation Path A - Automated Installation by Cloudera Manager, the Updates repository must be active. The SUSE Linux Enterprise Software Development Kit 11 SP1 is required on hosts running the Cloudera Manager Agents.
  • Debian - Wheezy (7.0 and 7.1), Squeeze (6.0) (deprecated), 64-bit
  • Ubuntu - Trusty (14.04), Precise (12.04), Lucid (10.04) (deprecated), 64-bit
  Note:
  • Debian Squeeze and Ubuntu Lucid are supported only for CDH 4.
  • Using the same version of the same operating system on all cluster hosts is strongly recommended.

Supported JDK Versions

Cloudera Manager supports Oracle JDK 1.7.0_75 and 1.8.0_40 when it's managing CDH 5.x, and Oracle JDK 1.6.0_31 and 1.7.0_75 when it's managing CDH 4.x. Cloudera Manager supports Oracle JDK 1.7.0_75 and 1.8.0_40 when it's managing both CDH 4.x and CDH 5.x clusters. Oracle JDK 1.6.0_31 and 1.7.0_75 can be installed during the installation and upgrade. For further information, see Java Development Kit Installation.

Supported Browsers

The Cloudera Manager Admin Console, which you use to install, configure, manage, and monitor services, supports the following browsers:
  • Mozilla Firefox 11 and higher
  • Google Chrome
  • Internet Explorer 9 and higher. Internet Explorer 11 Native Mode.
  • Safari 5 and higher

Supported Databases

Cloudera Manager requires several databases. The Cloudera Manager Server stores information about configured services, role assignments, configuration history, commands, users, and running processes in a database of its own. You must also specify a database for the Activity Monitor and Reports Manager management services.

  Important: When processes restart, the configuration for each of the services is redeployed using information that is saved in the Cloudera Manager database. If this information is not available, your cluster will not start or function correctly. You must therefore schedule and maintain regular backups of the Cloudera Manager database in order to recover the cluster in the event of the loss of this database.
See Backing Up Databases.

The database you use must be configured to support UTF8 character set encoding. The embedded PostgreSQL database that is installed when you follow Installation Path A - Automated Installation by Cloudera Manager automatically provides UTF8 encoding. If you install a custom database, you may need to enable UTF8 encoding. The commands for enabling UTF8 encoding are described in each database topic under Cloudera Manager and Managed Service Data Stores.

After installing a database, upgrade to the latest patch version and apply any other appropriate updates. Available updates may be specific to the operating system on which it is installed.

Cloudera Manager and its supporting services can use the following databases:
  • MySQL - 5.5 and 5.6
  • Oracle 11gR2
  • PostgreSQL - 8.4, 9.2, and 9.3
Cloudera supports the shipped version of MySQL and PostgreSQL for each supported Linux distribution. Each database is supported for all components in Cloudera Manager and CDH subject to the notes in CDH 4 Supported Databases and CDH 5 Supported Databases.

Supported CDH and Managed Service Versions

The following versions of CDH and managed services are supported:
  Warning: Cloudera Manager 5 does not support CDH 3 and you cannot upgrade Cloudera Manager 4 to Cloudera Manager 5 if you have a cluster running CDH 3.Therefore, to upgrade CDH 3 clusters to CDH 4 using Cloudera Manager, you must use Cloudera Manager 4.
  • CDH 4 and CDH 5. The latest released versions of CDH 4 and CDH 5 are strongly recommended. For information on CDH 4 requirements, see CDH 4 Requirements and Supported Versions. For information on CDH 5 requirements, see CDH 5 Requirements and Supported Versions.
  • Cloudera Impala - Cloudera Impala is included with CDH 5. Cloudera Impala 1.2.1 with CDH 4.1.0 or later. For more information on Cloudera Impala requirements with CDH 4, see Cloudera Impala Requirements.
  • Cloudera Search - Cloudera Search is included with CDH 5. Cloudera Search 1.2.0 with CDH 4.6.0. For more information on Cloudera Search requirements with CDH 4, see Cloudera Search Requirements.
  • Apache Spark - 0.90 or later with CDH 4.4.0 or later.
  • Apache Accumulo - 1.4.3 with CDH 4.3.0, 1.4.4 with CDH 4.5.0, and 1.6.0 with CDH 4.6.0.
For more information, see the Product Compatibility Matrix.

Resource Requirements

Cloudera Manager requires the following resources:
  • Disk Space
    • Cloudera Manager Server
      • 5 GB on the partition hosting /var.
      • 500 MB on the partition hosting /usr.
      • For parcels, the space required depends on the number of parcels you download to the Cloudera Manager Server and distribute to Agent hosts. You can download multiple parcels of the same product, of different versions and builds. If you are managing multiple clusters, only one parcel of a product/version/build/distribution is downloaded on the Cloudera Manager Server—not one per cluster. In the local parcel repository on the Cloudera Manager Server, the approximate sizes of the various parcels are as follows:
        • CDH 4.6 - 700 MB per parcel; CDH 5 (which includes Impala and Search) - 1.5 GB per parcel (packed), 2 GB per parcel (unpacked)
        • Cloudera Impala - 200 MB per parcel
        • Cloudera Search - 400 MB per parcel
    • Cloudera Management Service -The Host Monitor and Service Monitor databases are stored on the partition hosting /var. Ensure that you have at least 20 GB available on this partition.For more information, see Data Storage for Monitoring Data.
    • Agents - On Agent hosts each unpacked parcel requires about three times the space of the downloaded parcel on the Cloudera Manager Server. By default unpacked parcels are located in /opt/cloudera/parcels.
  • RAM - 4 GB is recommended for most cases and is required when using Oracle databases. 2 GB may be sufficient for non-Oracle deployments with fewer than 100 hosts. However, to run the Cloudera Manager Server on a machine with 2 GB of RAM, you must tune down its maximum heap size (by modifying -Xmx in /etc/default/cloudera-scm-server). Otherwise the kernel may kill the Server for consuming too much RAM.
  • Python - Cloudera Manager and CDH 4 require Python 2.4 or later, but Hue in CDH 5 and package installs of CDH 5 require Python 2.6 or 2.7. All supported operating systems include Python version 2.4 or later.

Networking and Security Requirements

The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:

  • Cluster hosts must have a working network name resolution system and correctly formatted /etc/hosts file. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must
    • Contain consistent information about hostnames and IP addresses across all hosts
    • Not contain uppercase hostnames
    • Not contain duplicate IP addresses

    Also, do not use aliases, either in /etc/hosts or in configuring DNS. A properly formatted /etc/hosts file should be similar to the following example:

    127.0.0.1	localhost.localdomain	localhost
    192.168.1.1	cluster-01.example.com	cluster-01
    192.168.1.2	cluster-02.example.com	cluster-02
    192.168.1.3	cluster-03.example.com	cluster-03 
    
  • In most cases, the Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation or upgrade wizard. You must log in using a root account or an account that has password-less sudo permission. For authentication during the installation and upgrade procedures, you must either enter the password or upload a public and private key pair for the root or sudo user account. If you want to use a public and private key pair, the public key must be installed on the cluster hosts before you use Cloudera Manager.

    Cloudera Manager uses SSH only during the initial install or upgrade. Once the cluster is set up, you can disable root SSH access or change the root password. Cloudera Manager does not save SSH credentials, and all credential information is discarded when the installation is complete. For more information, see Permission Requirements.

  • If single user mode is not enabled, the Cloudera Manager Agent runs as root so that it can make sure the required directories are created and that processes and files are owned by the appropriate user (for example, the hdfs and mapred users).
  • No blocking is done by Security-Enhanced Linux (SELinux).
  • IPv6 must be disabled.
  • No blocking by iptables or firewalls; port 7180 must be open because it is used to access Cloudera Manager after installation. Cloudera Manager communicates using specific ports, which must be open.
  • For RedHat and CentOS, the /etc/sysconfig/network file on each host must contain the hostname you have just set (or verified) for that host.
  • Cloudera Manager and CDH use several user accounts and groups to complete their tasks. The set of user accounts and groups varies according to the components you choose to install. Do not delete these accounts or groups and do not modify their permissions and rights. Ensure that no existing systems prevent these accounts and groups from functioning. For example, if you have scripts that delete user accounts not in a whitelist, add these accounts to the list of permitted accounts. Cloudera Manager, CDH, and managed services create and use the following accounts and groups:
Table 1. Users and Groups

Component (Version)

Unix User ID Groups Notes
Cloudera Manager (all versions) cloudera-scm cloudera-scm Cloudera Manager processes such as the Cloudera Manager Server and the monitoring roles run as this user.
The Cloudera Manager keytab file must be named cmf.keytab since that name is hard-coded in Cloudera Manager.
  Note: Applicable to clusters managed by Cloudera Manager only.
Apache Accumulo (Accumulo 1.4.3 and higher) accumulo accumulo Accumulo processes run as this user.
Apache Avro   No special users.
Apache Flume (CDH 4, CDH 5) flume flume The sink that writes to HDFS as this user must have write privileges.
Apache HBase (CDH 4, CDH 5) hbase hbase The Master and the RegionServer processes run as this user.
HDFS (CDH 4, CDH 5) hdfs hdfs, hadoop The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.
Apache Hive (CDH 4, CDH 5) hive hive

The HiveServer2 process and the Hive Metastore processes run as this user.

A user must be defined for Hive access to its Metastore DB (e.g. MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml.

Apache HCatalog (CDH 4.2 and higher, CDH 5) hive hive

The WebHCat service (for REST access to Hive functionality) runs as the hive user.

HttpFS (CDH 4, CDH 5) httpfs httpfs

The HttpFS service runs as this user. See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.

Hue (CDH 4, CDH 5) hue hue

Hue services run as this user.

Cloudera Impala (CDH 4.1 and higher, CDH 5) impala impala, hadoop, hdfs, hive Impala services run as this user.
Apache Kafka (Cloudera Distribution of Kafka 1.2.0) kafka kafka Kafka services run as this user.
Java KeyStore KMS (CDH 5.2.1 and higher) kms kms The Java KeyStore KMS service runs as this user.
Key Trustee KMS (CDH 5.3 and higher) kms kms The Key Trustee KMS service runs as this user.
Key Trustee Server (CDH 5.4 and higher) keytrustee keytrustee The Key Trustee Server service runs as this user.
Llama (CDH 5) llama llama Llama runs as this user.
Apache Mahout   No special users.
MapReduce (CDH 4, CDH 5) mapred mapred, hadoop Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos.
Apache Oozie (CDH 4, CDH 5) oozie oozie The Oozie service runs as this user.
Parquet   No special users.
Apache Pig   No special users.
Cloudera Search (CDH 4.3 and higher, CDH 5) solr solr The Solr processes run as this user.
Apache Spark (CDH 5) spark spark The Spark History Server process runs as this user.
Apache Sentry (incubating) (CDH 5.1 and higher) sentry sentry The Sentry service runs as this user.
Apache Sqoop (CDH 4, CDH 5) sqoop sqoop This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.
Apache Sqoop2 (CDH 4.2 and higher, CDH 5) sqoop2 sqoop, sqoop2 The Sqoop2 service runs as this user.
Apache Whirr   No special users.
YARN (CDH 4, CDH 5) yarn yarn, hadoop Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos.
Apache ZooKeeper (CDH 4, CDH 5) zookeeper zookeeper The ZooKeeper processes run as this user. It is not configurable.

Cloudera Manager and CDH QuickStart Guide

This quick start guide describes how to quickly create a new installation of Cloudera Manager 5, CDH 5, and managed services on a cluster of four hosts. The resulting deployment can be used for demonstrations and proof of concept applications, but is not recommended for production.

 

Requirements

The four hosts in the cluster must satisfy the following requirements:
  • The hosts must have at least 10 GB RAM
  • You must have root or password-less sudo access to the hosts
  • If using root, the hosts must accept the same root password
  • The hosts must have Internet access to allow the wizard to install software from archive.cloudera.com
  • Run a supported OS:
    • RHEL-compatible
      • Red Hat Enterprise Linux and CentOS
        • 5.7, 64-bit
        • 6.4, 64-bit
        • 6.5 in SE Linux mode
        • 6.5, 64-bit
        • 6.6, 64-bit
      • Oracle Enterprise Linux with default kernel and Unbreakable Enterprise Kernel, 64-bit
        • 5.6 (UEK R2)
        • 6.4 (UEK R2)
        • 6.5 (UEK R2, UEK R3)
        • 6.6 (UEK R3)
    • SLES - SUSE Linux Enterprise Server 11, 64-bit. Service Pack 2 or later is required. The Updates repository must be active and SUSE Linux Enterprise Software Development Kit 11 SP1 is required.
    • Debian - Wheezy (7.0 and 7.1), 64-bit
    • Ubuntu - Trusty (14.04) and (Precise) 12.04, 64-bit
If your environment does not satisfy these requirements, the procedure described in this guide may not be appropriate for you. For information about other Cloudera Manager installation options and requirements, see Installing Cloudera Manager, CDH, and Managed Services.

Download and Run the Cloudera Manager Server Installer

  1. Download the Cloudera Manager installer binary from Cloudera Manager 5.4.0 Downloads to the cluster host where you want to install the Cloudera Manager Server.
    1. Click Download Cloudera Express or Download Cloudera Enterprise. See Cloudera Express and Cloudera Enterprise Features.
    2. Register and click Submit.
    3. Download the installer:
      wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
      
  2. Change cloudera-manager-installer.bin to have executable permission.
    $ chmod u+x cloudera-manager-installer.bin
    
  3. Run the Cloudera Manager Server installer.
    $ sudo ./cloudera-manager-installer.bin
    
  4. Read the Cloudera Manager README and then press Return or Enter to choose Next.
  5. Read the Cloudera Express License and then press Return or Enter to choose Next. Use the arrow keys and press Return or Enter to choose Yes to confirm you accept the license.
  6. Read the Oracle Binary Code License Agreement and then press Return or Enter to choose Next.
  7. When the installation completes, the complete URL provided for the Cloudera Manager Admin Console, including the port number, which is 7180 by default. Press Return or Enter to choose OK to continue.
  8. Press Return or Enter to choose OK to exit the installer.
  9. On RHEL 5 and CentOS 5, install Python 2.6 or 2.7. Download the appropriate repository rpm packages to the Cloudera Manager Server host and then install Python using yum. For example, use the following commands:
    $ su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm'
    ...
    $ yum install python26
    
  Note: If the installation is interrupted for some reason, you may need to clean up before you can re-run it. See Uninstalling Cloudera Manager and Managed Software.

Start the Cloudera Manager Admin Console

  1. Wait several minutes for the Cloudera Manager Server to complete its startup. To observe the startup process, run tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
  2. In a web browser, enter http://Server host:7180, where Server host is the fully qualified domain name or IP address of the host where the Cloudera Manager Server is running. The login screen for Cloudera Manager Admin Console displays.
  3. Log into Cloudera Manager Admin Console with the credentials: Username: admin Password: admin.

Install and Configure Software Using the Cloudera Manager Wizard

Installing and configuring Cloudera Manager, CDH, and managed service software on the cluster hosts involves the following three main steps.

Choose Cloudera Manager Edition and Specify Hosts

  1. Choose Cloudera Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed. The trial allows you to create all CDH and managed services supported by Cloudera Manager. Click Continue.
  2. Information is displayed indicating what edition of Cloudera Manager will be installed and the services you can choose from. Click Continue. The Specify hosts for your CDH cluster installation screen displays.
  3. Specify the four hosts on which to install CDH and managed services. You can specify hostnames and/or IP addresses and ranges, for example: 10.1.1.[1-4] or host[1-3].company.com. You can specify multiple addresses and address ranges by separating them by commas, semicolons, tabs, or blank spaces, or by placing them on separate lines.
  4. Click Search. Cloudera Manager identifies the hosts on your cluster. Verify that the number of hosts shown matches the number of hosts where you want to install services. Deselect host entries that do not exist and deselect the hosts where you do not want to install services. Click Continue. The Select Repository screen displays.

Install CDH and Managed Service Software

  1. Keep the default distribution method Use Parcels and the default version of CDH 5. Leave the Additional Parcels selections at None.
  2. For the Cloudera Manager Agent, keep the default Matched release for this Cloudera Manager Server. Click Continue. The JDK Installation Options screen displays.
  3. Select the Install Oracle Java SE Development Kit (JDK) checkbox to allow Cloudera Manager to install the JDK on each cluster host or uncheck if you plan to install it yourself. Leave the Install Java Unlimited Strength Encryption Policy Files checkbox deselected. Click Continue. The Enable Single User Mode screen displays.
  4. Leave the Single User Mode checkbox deselected and click Continue. The Provide SSH login credentials page displays.
  5. Specify host SSH login properties:
    1. Keep the default login root or enter the user name for an account that has password-less sudo permission.
    2. If you choose to use password authentication, enter and confirm the password.
  6. Click Continue. Cloudera Manager installs the Oracle JDK and the Cloudera Manager Agent packages on each host and starts the Agent.
  7. Click Continue. The Installing Selected Parcels screen displays. Cloudera Manager installs CDH. During the parcel installation, progress is indicated for the phases of the parcel installation process in separate progress bars. When the Continue button at the bottom of the screen turns blue, the installation process is completed.
  8. Click Continue. The Host Inspector runs to validate the installation, and provides a summary of what it finds, including all the versions of the installed components. Click Finish. The Cluster Setup screen displays.

Add and Configure Services

  1. Click the All Services radio button to create HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, Sqoop, HBase, Impala, Solr, Spark, and Key-Value Store Indexer services. Click Continue. The Customize Role Assignments screen displays.
  2. Configure the following role assignments:
    • Click the text field under the HBase Thrift Server role. In the host selection dialog that displays, select the checkbox next to any host and click OK at the bottom right.
    • Click the text field under the Server role of the ZooKeeper service. In the host selection dialog that displays, uncheck the checkbox next to the host assigned by default (the master host) and select checkboxes next to the remaining three hosts. Click OK at the bottom right.
    Click Continue. The Database Setup screen displays.
  3. Leave the default setting of Use Embedded Database to have Cloudera Manager create and configure all required databases in an embedded PostgreSQL database. Click Test Connection. When the test completes, click Continue. The Review Changes screen displays.
  4. Review the configuration changes to be applied. Click Continue. The Command Progress page displays.
  5. The wizard performs 32 steps to configure and starts the services. When the startup completes, click Continue.
  6. A success message displays indicating that the cluster has been successfully started. Click Finish to proceed to the Home page.

Test the Installation

The Home page looks something like this:

On the left side of the screen is a list of services currently running with their status information. All the services should be running with Good Health , however there may be a small number of configuration warnings indicated by a wrench icon and a number , which you can ignore.

You can click each service to view more detailed information about the service. You can also test your installation by running a MapReduce job or interacting with the cluster with a Hue application.

Running a MapReduce Job

  1. Log into a cluster host.
  2. Run the Hadoop PiEstimator example:
    sudo -u hdfs hadoop jar \ 
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
    pi 10 100
    
  3. View the result of running the job by selecting the following from the top navigation bar in the Cloudera Manager Admin Console: Clusters > Cluster 1 > Activities > YARN Applications. You will see an entry like the following:

Testing with Hue

A good way to test the cluster is by running a job. In addition, you can test the cluster by running one of the Hue web applications. Hue is a graphical user interface that allows you to interact with your clusters by running applications that let you browse HDFS, manage a Hive metastore, and run Hive, Impala, and Search queries, Pig scripts, and Oozie workflows.
  1. In the Cloudera Manager Admin Console Home page, click the Hue service.
  2. Click the Hue Web UI link, which opens Hue in a new window.
  3. Log in with the credentials, username: hdfs, password: hdfs.
  4. Choose an application in the navigation bar at the top of the browser window.

For more information, see the Hue User Guide.