Cloudera Manager 5.1.0

A Single Interface to Manage Your CDH Cluster

Cloudera Manager is a unified management interface that makes it easy to install, configure, and manage a CDH cluster. It automatically ships with Cloudera Enterprise or Cloudera Express to help you get up and running with Hadoop faster.

Cloudera Manager 5.1 works with both CDH 4 and CDH 5 and is available with:
Cloudera Express - Easily deploy, manage, monitor and perform diagnostics on your Hadoop cluster
Cloudera Enterprise – Includes all the above capabilities, plus advanced management features and support including zero downtime upgrades and backup and disaster recovery.

Cloudera Manager is the recommended tool for installing Cloudera Enterprise or Cloudera Express. It automatically downloads with Cloudera Enterprise or Cloudera Express. Cloudera Manager with Enterprise requires a license.

When installing Cloudera Express you will have the option to unlock Cloudera Enterprise features for a free 60-day trial.

Once the trial has concluded, the Cloudera Enterprise features will be disabled until you obtain and upload a license.


What's New in Cloudera Manager 5.1.0

  • SSL Encryption:
    • Supports several new SSL-related configuration parameters for HDFS, MapReduce, YARN and HBase, which allow you to configure and enable encrypted shuffle and encrypted web UIs for these services. See Configuring SSL Encryption in Cloudera Manager.
    • Cloudera Manager now also supports the monitoring of HDFS, MapReduce, YARN, and HBase when SSL is enabled for these services. New configuration parameters allow you to specify the location and password of the truststore used to verify certificates in HTTPS communication with CDH services and the Cloudera Manager Server.
  • Sentry Service:
    • A new Sentry service that stores the authorization metadata in an underlying relational database and allows you to use Grant/Revoke statements to modify privileges. See The Sentry Service.
    • You can also configure the Sentry service to allow Pig, MapReduce, and WebHCat queries access to Sentry-secured data stored in Hive. See Configuring Pig and HCatalog for the Sentry Service.
  • Kerberos Authentication:
    • Now supports a Kerberos cluster using an Active Directory KDC.
    • New wizard to enable Kerberos on an existing cluster. The wizard works with both MIT KDC and Active Directory KDC.
    • Ability to configure and deploy Kerberos client configuration (krb5.conf) on a cluster.
  • Spark Service - added the History Server role
  • Impala - added support for Llama ApplicationMaster High Availability
  • User Roles - there are two new roles: Operator and Configurator that support fine-grained access to Cloudera Manager features. See Cloudera Manager User Accounts.
  • Monitoring
    • Updates to Oozie monitoring
    • New Hive Metastore Canary
  • UI - The UI has been updated to improve scalability. The Home page Status tab can be configured to display clusters in a full or summary format. There is a new Cluster page for each cluster. The Hosts and Instances pages have added faceted filters.

What's New in Cloudera Manager 5.0.2

A number of issues have been fixed. See Fixed Issues in Cloudera Manager 5.0.2 for details.

What's New in Cloudera Manager 5.0.1

A number of issues have been fixed. See Fixed Issues in Cloudera Manager 5.0.1 for details.

  • Monitoring
    • The Java Garbage Collection Duration health test for the Service Monitor, Host Monitor, and Activity Monitor has been replaced with the new Java Pause Duration health test.

What's New in Cloudera Manager 5.0.0

  • Service and Configuration Management
    • HDFS - cache management
  • Resource Management - Impala admission control
  • Monitoring
    • Host disks overview
    • Impala best practices
    • HBase table statistics
    • HDFS cache statistics

What's New in Cloudera Manager 5.0.0 Beta 2

  • Service and Configuration Management
    • HDFS
      • HDFS NFS Gateway role
      • Supports restoration of HDFS data from a snapshot
    • YARN
      • YARN Resource Manager High Availability
      • Resource pool scheduler
    • Support for Spark service
    • Support for Accumulo service
    • Support for service extensibility
    • Support to set up Oozie server High Availability
    • Granular configuration staleness UI
    • Support for setting maximum file descriptors
  • Monitoring
    • Support for monitoring the Cloudera Search/Solr service
    • New "failed" and "killed" badges displayed for unsuccessful YARN applications
    • More attributes available for filtering displays of YARN applications and Impala queries
    • New operational reports added for HBase tables and namespaces, Impala queries, and YARN applications
    • Support for creating user-defined triggers for metrics accessible via charts/tsquery
        Important: Because triggers are a new and evolving feature, backward compatibility between releases is not guaranteed at this time.
    • Charting improvements
      • New table chart type
      • New options for displaying data and metadata from charts
      • Support for exporting data from charts to CSV or JSON files
  • Administrative Settings
    • Added a new role type with limited administrator capabilities
    • Cloudera Manager Server and all JVMs will create a heap dump if they run out of memory
    • Configure the location of the parcel directory and specify whether and when to remove old parcels from cluster hosts

What's New in Cloudera Manager 5.0.0 Beta 1

  • CDH Version
    • Supports both CDH 4 and CDH 5
    • CDH 4 to CDH 5 upgrade wizard
    • Support for YARN as a production execution environment
      • MapReduce (MRv1) to YARN (MRv2) configuration import
      • YARN-based resource management for Impala 1.2
  • JDK Version - Cloudera Manager 5 supports and installs both JDK 6 and JDK 7.
  • Resource Management
    • Static and dynamic partitioning of resources: provides a wizard for configuring static partitioning of resources (cgroups) across core services (HBase, HDFS, MapReduce, Solr, YARN) and dynamic allocation of resources for YARN and Impala.
    • Pool, resource group, and queue administration for YARN and Impala.
    • Usage monitoring and trending
  • Monitoring
    • YARN service monitoring
    • YARN (MRv2) job monitoring
    • Configurable histograms of Impala query and YARN job attributes that can be used to quickly filter query and application lists
    • Scalable back-end database for monitoring metrics
    • Charting improvements
      • New chart types: histogram and heatmap
      • New scale types: logarithmic and power
      • Updates to tsquery language: new attribute values to support YARN and new functions to support new chart types
  • Extensibility
    • Ability to manage both ISV applications and non-CDH services (for example, Accumulo, Spark, and so on)
    • Working with select ISVs as part of Beta 1
  • Single Sign-On - Support for SAML to enable single sign-on
  • Parcels
    • Dependency enforcement to ensure incompatible parcels are not used together
    • Option to not cache downloaded parcels, to save disk space
    • Improved error reporting for management operations
  • Backup and Disaster Recovery (BDR)
    • HBase and HDFS snapshots: Supports scheduling snapshots on a recurring basis.
    • Support for YARN (MRv2): Replication jobs can now run using YARN (MRv2) instead of MRv1.
    • Global replication page: All scheduled snapshots (HDFS and HBase) and replication jobs for either HDFS or Hive are shown on a single Replications page
  • Other
    • Global Search box
    • Several usability improvements
    • Comprehensive detection of configuration changes that require service restarts, refresh and redeployment of client configurations.

Cloudera Manager 5.1.x System Requirements:

Supported Operating Systems

Supported JDK Versions

Supported Databases

Supported Supported CDH and Other Managed Service Versions

Resource Requirements

Networking and Security Requirements

Cloudera Manager Quick Start Guide

This quick start guide describes how to quickly create a new installation of Cloudera Manager 5, CDH 5, and managed services on a cluster of four hosts. The resulting deployment can be used for demonstrations and proof of concept applications, but is not recommended for production.


Requirements

The four hosts in the cluster must satisfy the following requirements:
  • The hosts must have at least 10 GB RAM
  • You must have root or password-less sudo access to the hosts
  • If using root, the hosts must accept the same root password
  • The hosts must have Internet access to allow the wizard to install software from archive.cloudera.com
  • Run a supported OS:
    • RHEL-compatible systems
      • Red Hat Enterprise Linux and CentOS 5.7, 64-bit
      • Red Hat Enterprise Linux and CentOS 6.2, 64-bit
      • Red Hat Enterprise Linux and CentOS 6.4, 64-bit
      • Oracle Enterprise Linux 6.4, 64-bit
      • Oracle Enterprise Linux 5.6, 64-bit
    • SLES - SUSE Linux Enterprise Server 11, 64-bit. Service Pack 2 or later is required. The Updates repository must be active and SUSE Linux Enterprise Software Development Kit 11 SP1 is required.
    • Debian - Debian 7.0 and 7.1, 64-bit
    • Ubuntu - Ubuntu 12.04, 64-bit
If your environment does not satisfy these requirements, the procedure described in this guide may not be appropriate for you. For information about other Cloudera Manager installation options and requirements, see the Cloudera Manager Installation Guide.

Download and Run the Cloudera Manager Server Installer

  1. Download the Cloudera Manager installer binary from Cloudera Manager Downloads to the cluster host where you want to install the Cloudera Manager Server.
    1. Under Cloudera Manager 5.x downloads, click Download Cloudera Manager Express Edition.
    2. Optionally register and click Submit or click the Just take me to the download page link.
    The cloudera-manager-installer.bin file downloads.
  2. Change cloudera-manager-installer.bin to have executable permission.
    $ chmod u+x cloudera-manager-installer.bin
  3. Run the Cloudera Manager Server installer.
    $ sudo ./cloudera-manager-installer.bin
  4. Read the Cloudera Manager README and then press Return or Enter to choose Next.
  5. Read the Cloudera Manager Standard License and then press Return or Enter to choose Next. Use the arrow keys and press Return or Enter to choose Yes to confirm you accept the license.
  6. Read the Oracle Binary Code License Agreement and then press Return or Enter to choose Next.
  7. When the installation completes, the complete URL provided for the Cloudera Manager Admin Console, including the port number, which is 7180 by default. Press Return or Enter to choose OK to continue.
  8. Press Return or Enter to choose OK to exit the installer.
  9. On RHEL 5 and CentOS 5, install Python 2.6 or 2.7. Download the appropriate repository rpm packages to the Cloudera Manager Server host and then install Python using yum. For example, use the following commands:
    $ su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm'
    ...
    $ yum install python26
  Note: If the installation is interrupted for some reason, you may need to clean up before you can re-run it. See Uninstalling Cloudera Manager and Managed Software in the Cloudera Manager Installation Guide.

Start the Cloudera Manager Admin Console

  1. Wait several minutes for the Cloudera Manager Server to complete its startup. To observe the startup process you can perform tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
  2. In a web browser, enter http://Server host:7180, where Server host is the fully-qualified domain name or IP address of the host where you installed the Cloudera Manager Server. The login screen for Cloudera Manager Admin Console displays.
  3. Log into Cloudera Manager Admin Console with the credentials: Username: admin Password: admin.

Install and Configure Software Using the Cloudera Manager Wizard

Installing and configuring Cloudera Manager, CDH, and managed service software on the cluster hosts involves the following three main steps.

Choose Cloudera Manager Edition and Specify Hosts

  1. Choose Cloudera Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed. The trial allows you to create all CDH and managed services supported by Cloudera Manager. Click Continue.
  2. Information is displayed indicating what edition of Cloudera Manager will be installed and the services you can choose from. Click Continue.
  3. Specify the hosts on which to install CDH and managed services. You can specify hostnames and/or IP addresses and ranges, for example: 10.1.1.[1-4] or host[1-3].company.com. You can specify multiple addresses and address ranges by separating them by commas, semicolons, tabs, or blank spaces, or by placing them on separate lines.
  4. Click Search. Cloudera Manager identifies the hosts on your cluster. Verify that the number of hosts shown matches the number of hosts where you want to install services. Deselect host entries that do not exist and deselect the hosts where you do not want to install services. Click Continue. The Select Repository page displays.

Install CDH and Managed Service Software

  1. Keep the default distribution method Use Parcels and the default version of CDH 5.
  2. For the Cloudera Manager Agent, keep the default Matched release for this Cloudera Manager Server. Click Continue twice.
  3. Specify host SSH login properties:
    1. Keep the default login root or enter the user name for an account that has password-less sudo permission.
    2. If you choose to use password authentication, enter and confirm the password.
    Click Continue. Cloudera Manager installs the Oracle JDK and the Cloudera Manager Agent packages on each host and starts the Agent.
  4. Click Continue. Cloudera Manager installs CDH. During the parcel installation, progress is indicated for the two phases of the parcel installation process (Download and Distribution) in a separate progress bars. When the Continue button appears at the bottom of the screen, the installation process is completed.
  5. Click Continue. The Host Inspector runs to validate the installation, and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Finish. The Cluster Setup page displays.

Add and Configure Services

  1. Click the All Services radio button to create HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, Sqoop, HBase, Impala, Solr, Spark, and Key-Value Store Indexer services. Click Continue. The Customize Role Assignments page displays.
  2. Configure the following role assignments:
    • Click the text field under the HBase Thrift Server role. In the host selection dialog that displays, check the checkbox next to any host and click OK at the bottom right.
    • Click the text field under the Server role of the ZooKeeper service. In the host selection dialog that displays, uncheck the checkbox next to the host assigned by default (the master host) and check checkboxes next to the remaining three hosts. Click OK at the bottom right.
    Click Continue. The Database Setup page displays.
  3. Leave the default setting of Use Embedded Database to have Cloudera Manager create and configure all required databases in an embedded PostgreSQL database. Click Test Connection. When the test completes with a , click Continue. The Review Changes page displays.
  4. Review the configuration changes to be applied. Click Continue. The Command Progress page displays.
  5. The wizard performs 32 steps to configure and starts the services. When the startup completes, click Continue.
  6. A success message displays indicating that the cluster has been successfully started. Click Finish to proceed to the Home page.

Test the Installation

The Home page should look something like this:

On the left side of the screen is a list of services currently running with their status information. All the services should be running with Good Health , however there may be a small number of configuration warnings indicated by a wrench icon and a numbered badge , which you can ignore.

You can click each service to view more detailed information about the service. You can also test your installation by running a MapReduce job or interacting with the cluster with a Hue application.

Running a MapReduce Job

  1. Log into a cluster host.
  2. Run the Hadoop PiEstimator example:
    sudo -u hdfs hadoop jar \ 
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
    pi 10 100
  3. View the result of running the job by selecting the following from the top navigation bar in the Cloudera Manager Admin Console: Clusters > Cluster 1 > YARN (MR2 Included) Applications. You will see an entry like the following:

Testing with Hue

A good way to test the cluster is by running a job. In addition, you can test the cluster by running one of the Hue web applications. Hue is a graphical user interface that allows you to interact with your clusters by running applications that let you browse HDFS, manage a Hive metastore, and run Hive, Impala, and Search queries, Pig scripts, and Oozie workflows.
  1. In the Cloudera Manager Admin Console Home page, click the Hue service.
  2. Click the Hue Web UI tab, which opens Hue in a new window.
  3. Log in with the credentials, username: hdfs, password: hdfs.
  4. Choose an application in the navigation bar at the top of the browser window.

For more information, see the Hue User Guide.