Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Easily Manage Hadoop in Production

Cloudera Manager makes it easy to manage Hadoop deployments of any scale in production. Quickly deploy, configure, and monitor your cluster through an intuitive UI - complete with rolling upgrades, backup and disaster recovery, and customizable alerting.

 

Cloudera Manager is available as an integrated and supported part of Cloudera Enterprise.

Cloudera Manager 5.5.2



The recommended tool for installing Cloudera Enterprise

This download installs Cloudera Enterprise or Cloudera Express.

 

Cloudera Enterprise requires a license; however, when installing Cloudera Express you will have the option to unlock Cloudera Enterprise features for a free 60-day trial.

 

Once the trial has concluded, the Cloudera Enterprise features will be disabled until you obtain and upload a license.

Thank you for choosing Cloudera Manager, your download instructions are below:

Automated Installation

Ideal for trying Cloudera enterprise data hub, the installer will download Cloudera Manager from Cloudera's website and guide you through the setup process.

 

Pre-requisites: multiple, Internet-connected Linux machines, with SSH access, and significant free space in /var and /opt.

$ wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin

$ chmod u+x cloudera-manager-installer.bin

$ sudo ./cloudera-manager-installer.bin

 

Production Installation

Users setting up Cloudera enterprise data hub for production use are encouraged to follow the installation instructions in our documentation. These instructions suggest explicitly provisioning the databases used by Cloudera Manager and walk through explicitly which packages need installation.

Sign in or complete our product interest form to continue.

Please Read and Accept our Terms

Cloudera Manager supports the following operating systems:

  • RHEL-compatible
    • Red Hat Enterprise Linux and CentOS, 64-bit
      • 5.7
      • 6.4
      • 6.5
      • 6.5 in SE Linux mode
      • 6.6
      • 6.6 in SE Linux mode
      • 6.7
      • 7.1
    • Oracle Enterprise Linux with default kernel and Unbreakable Enterprise Kernel, 64-bit
      • 5.6 (UEK R2)
      • 6.4 (UEK R2)
      • 6.5 (UEK R2, UEK R3)
      • 6.6 (UEK R3)
      • 6.7
      • 7.1

    Important: Cloudera supports RHEL 7 with the following limitations:

     

  • SLES - SUSE Linux Enterprise Server 11, 64-bit. Service Pack 2 or later is required if Cloudera Manager is used to manage CDH 5, and Service Pack 1 or later is required if Cloudera Manager is used to manage CDH 4. If you follow Installation Path A - Automated Installation by Cloudera Manager, the Updates repository must be active to use the embedded PostgreSQL database. The SUSE Linux Enterprise Software Development Kit 11 SP1 is required on hosts running the Cloudera Manager Agents.
  • Debian - Wheezy (7.0 and 7.1), Squeeze (6.0) (deprecated), 64-bit
  • Ubuntu - Trusty (14.04), Precise (12.04), Lucid (10.04) (deprecated), 64-bit

Note:

  • Debian Squeeze and Ubuntu Lucid are supported only for CDH 4.
  • Using the same version of the same operating system on all cluster hosts is strongly recommended.

 

 

 

 

 

Selected tab: SupportedOperatingSystems

The version of Oracle JDK supported by Cloudera Manager depends on the version of CDH that is being managed. The following table lists the JDK versions supported on a Cloudera Manager 5.5 cluster running the latest CDH 4 and CDH 5. For further information on supported JDK versions for previous versions of Cloudera Manager and CDH, see JDK Compatibility.

 

Important: There is one exception to the minimum supported and recommended JDK versions in the following table. If Oracle releases a security patch that affects server-side Java before the next minor release of Cloudera products, the Cloudera support policy covers customers using the patch.

CDH Version Managed (Latest) Minimum Supported JDK Version Recommended JDK Version
CDH 5 1.7.0_55 1.7.0_80
1.8.0_31

Cloudera recommends that you not use JDK 1.8.0_40.

1.8.0_60
CDH 4 and CDH 5 1.7.0_55 1.7.0_80
1.8.0_31 1.8.0_60
CDH 4 1.7.0_55 1.7.0_80

 

Cloudera Manager can install Oracle JDK 1.7.0_67 during installation and upgrade. If you prefer to install the JDK yourself, follow the instructions in Java Development Kit Installation.

Selected tab: SupportedJDKVersions

The Cloudera Manager Admin Console, which you use to install, configure, manage, and monitor services, supports the following browsers:

  • Mozilla Firefox 24 and 31.
  • Google Chrome.
  • Internet Explorer 9 and higher. Internet Explorer 11 Native Mode.
  • Safari 5 and higher.
Selected tab: SupportedBrowsers

Cloudera Manager requires several databases. The Cloudera Manager Server stores information about configured services, role assignments, configuration history, commands, users, and running processes in a database of its own. You must also specify a database for the Activity Monitor and Reports Manager roles.

 

Important: When processes restart, the configuration for each of the services is redeployed using information that is saved in the Cloudera Manager database. If this information is not available, your cluster will not start or function correctly. You must therefore schedule and maintain regular backups of the Cloudera Manager database in order to recover the cluster in the event of the loss of this database. See Backing Up Databases.

 

 

The database you use must be configured to support UTF8 character set encoding. The embedded PostgreSQL database that is installed when you follow Installation Path A - Automated Installation by Cloudera Manager automatically provides UTF8 encoding. If you install a custom database, you may need to enable UTF8 encoding. The commands for enabling UTF8 encoding are described in each database topic under Cloudera Manager and Managed Service Data Stores.

 

After installing a database, upgrade to the latest patch version and apply any other appropriate updates. Available updates may be specific to the operating system on which it is installed.

 

Cloudera Manager and its supporting services can use the following databases:

  • MariaDB 5.5
  • MySQL - 5.5 and 5.6
  • Oracle 11gR2 and 12c
  • PostgreSQL - 9.2, 9.3, and 9.4

Cloudera supports the shipped version of MariaDB, MySQL and PostgreSQL for each supported Linux distribution. Each database is supported for all components in Cloudera Manager and CDH subject to the notes in CDH 4 Supported Databases and CDH 5 Supported Databases.

 

 

 

 

Selected tab: SupportedDatabases

The following versions of CDH and managed services are supported:

Warning: Cloudera Manager 5 does not support CDH 3 and you cannot upgrade Cloudera Manager 4 to Cloudera Manager 5 if you have a cluster running CDH 3. Therefore, to upgrade CDH 3 clusters to CDH 4 using Cloudera Manager, you must use Cloudera Manager 4.

  • CDH 4 and CDH 5. The latest released versions of CDH 4 and CDH 5 are strongly recommended. For information on CDH 4 requirements, see CDH 4 Requirements and Supported Versions. For information on CDH 5 requirements, see CDH 5 Requirements and Supported Versions.
  • Cloudera Impala - Cloudera Impala is included with CDH 5. Cloudera Impala 1.2.1 with CDH 4.1.0 or later. For more information on Cloudera Impala requirements with CDH 4, seeCloudera Impala Requirements.
  • Cloudera Search - Cloudera Search is included with CDH 5. Cloudera Search 1.2.0 with CDH 4.6.0. For more information on Cloudera Search requirements with CDH 4, seeCloudera Search Requirements.
  • Apache Spark - 0.90 or later with CDH 4.4.0 or later.
  • Apache Accumulo - 1.4.3 with CDH 4.3.0, 1.4.4 with CDH 4.5.0, and 1.6.0 with CDH 4.6.0.

For more information, see the Product Compatibility Matrix.

 

 

 

 

 

Selected tab: SupportedCDHandManagedServiceVersions

Cloudera Manager requires the following resources:

  • Disk Space
    • Cloudera Manager Server
      • 5 GB on the partition hosting /var.
      • 500 MB on the partition hosting /usr.
      • For parcels, the space required depends on the number of parcels you download to the Cloudera Manager Server and distribute to Agent hosts. You can download multiple parcels of the same product, of different versions and builds. If you are managing multiple clusters, only one parcel of a product/version/build/distribution is downloaded on the Cloudera Manager Server—not one per cluster. In the local parcel repository on the Cloudera Manager Server, the approximate sizes of the various parcels are as follows:
        • CDH 4.6 - 700 MB per parcel; CDH 5 (which includes Impala and Search) - 1.5 GB per parcel (packed), 2 GB per parcel (unpacked)
        • Cloudera Impala - 200 MB per parcel
        • Cloudera Search - 400 MB per parcel
    • Cloudera Management Service -The Host Monitor and Service Monitor databases are stored on the partition hosting /var. Ensure that you have at least 20 GB available on this partition.For more information, see Data Storage for Monitoring Data.
    • Agents - On Agent hosts each unpacked parcel requires about three times the space of the downloaded parcel on the Cloudera Manager Server. By default unpacked parcels are located in /opt/cloudera/parcels.
  • RAM - 4 GB is recommended for most cases and is required when using Oracle databases. 2 GB may be sufficient for non-Oracle deployments with fewer than 100 hosts. However, to run the Cloudera Manager Server on a machine with 2 GB of RAM, you must tune down its maximum heap size (by modifying -Xmx in /etc/default/cloudera-scm-server). Otherwise the kernel may kill the Server for consuming too much RAM.
  • Python - Cloudera Manager and CDH 4 require Python 2.4 or later, but Hue in CDH 5 and package installs of CDH 5 require Python 2.6 or 2.7. All supported operating systems include Python version 2.4 or later.
  • Perl - Cloudera Manager requires perl.
Selected tab: ResourceRequirements

The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:

  • Cluster hosts must have a working network name resolution system and correctly formatted /etc/hosts file. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must
    • Contain consistent information about hostnames and IP addresses across all hosts
    • Not contain uppercase hostnames
    • Not contain duplicate IP addresses

    Also, do not use aliases, either in /etc/hosts or in configuring DNS. A properly formatted /etc/hosts file should be similar to the following example:

    127.0.0.1 localhost.localdomain localhost
    192.168.1.1 cluster-01.example.com cluster-01
    192.168.1.2 cluster-02.example.com cluster-02
    192.168.1.3 cluster-03.example.com cluster-03

  • In most cases, the Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation or upgrade wizard. You must log in using a root account or an account that has password-less sudo permission. For authentication during the installation and upgrade procedures, you must either enter the password or upload a public and private key pair for the root or sudo user account. If you want to use a public and private key pair, the public key must be installed on the cluster hosts before you use Cloudera Manager.

    Cloudera Manager uses SSH only during the initial install or upgrade. Once the cluster is set up, you can disable root SSH access or change the root password. Cloudera Manager does not save SSH credentials, and all credential information is discarded when the installation is complete. For more information, see Permission Requirements for Package-based Installations and Upgrades of CDH.

  • If single user mode is not enabled, the Cloudera Manager Agent runs as root so that it can make sure the required directories are created and that processes and files are owned by the appropriate user (for example, the hdfs and mapred users).
  • No blocking is done by Security-Enhanced Linux (SELinux).

    Important: Cloudera Enterprise is supported on platforms with Security-Enhanced Linux (SELinux) enabled. However, policies need to be provided by other parties or created by the administrator of the cluster deployment. Cloudera is not responsible for policy support nor policy enforcement, nor for any issues with such. If you experience issues with SELinux, contact your OS support provider.

  • IPv6 must be disabled.
  • No blocking by iptables or firewalls; port 7180 must be open because it is used to access Cloudera Manager after installation. Cloudera Manager communicates using specific ports, which must be open.
  • For RHEL and CentOS, the /etc/sysconfig/network file on each host must contain the hostname you have just set (or verified) for that host.
  • Cloudera Manager and CDH use several user accounts and groups to complete their tasks. The set of user accounts and groups varies according to the components you choose to install. Do not delete these accounts or groups and do not modify their permissions and rights. Ensure that no existing systems prevent these accounts and groups from functioning. For example, if you have scripts that delete user accounts not in a whitelist, add these accounts to the list of permitted accounts. Cloudera Manager, CDH, and managed services create and use the following accounts and groups:

Table 2. Users and Groups

Component (Version)

Unix User ID Groups Notes
Cloudera Manager (all versions) cloudera-scm cloudera-scm Cloudera Manager processes such as the Cloudera Manager Server and the monitoring roles run as this user.

The Cloudera Manager keytab file must be named cmf.keytab since that name is hard-coded in Cloudera Manager.

Note: Applicable to clusters managed by Cloudera Manager only.

 

Apache Accumulo (Accumulo 1.4.3 and higher) accumulo accumulo Accumulo processes run as this user.
Apache Avro   No special users.
Apache Flume (CDH 4, CDH 5) flume flume The sink that writes to HDFS as this user must have write privileges.
Apache HBase (CDH 4, CDH 5) hbase hbase The Master and the RegionServer processes run as this user.
HDFS (CDH 4, CDH 5) hdfs hdfs, hadoop The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.
Apache Hive (CDH 4, CDH 5) hive hive

The HiveServer2 process and the Hive Metastore processes run as this user.

A user must be defined for Hive access to its Metastore DB (e.g. MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This isjavax.jdo.option.ConnectionUserName in hive-site.xml.

Apache HCatalog (CDH 4.2 and higher, CDH 5) hive hive

The WebHCat service (for REST access to Hive functionality) runs as the hiveuser.

HttpFS (CDH 4, CDH 5) httpfs httpfs

The HttpFS service runs as this user. See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.

Hue (CDH 4, CDH 5) hue hue

Hue services run as this user.

Cloudera Impala (CDH 4.1 and higher, CDH 5) impala impala, hadoop, hive Impala services run as this user.
Apache Kafka (Cloudera Distribution of Kafka 1.2.0) kafka kafka Kafka services run as this user.
Java KeyStore KMS (CDH 5.2.1 and higher) kms kms The Java KeyStore KMS service runs as this user.
Key Trustee KMS (CDH 5.3 and higher) kms kms The Key Trustee KMS service runs as this user.
Key Trustee Server (CDH 5.4 and higher) keytrustee keytrustee The Key Trustee Server service runs as this user.
Kudu kudu kudu Kudu services run as this user.
Llama (CDH 5) llama llama Llama runs as this user.
Apache Mahout   No special users.
MapReduce (CDH 4, CDH 5) mapred mapred, hadoop Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos.
Apache Oozie (CDH 4, CDH 5) oozie oozie The Oozie service runs as this user.
Parquet   No special users.
Apache Pig   No special users.
Cloudera Search (CDH 4.3 and higher, CDH 5) solr solr The Solr processes run as this user.
Apache Spark (CDH 5) spark spark The Spark History Server process runs as this user.
Apache Sentry (incubating) (CDH 5.1 and higher) sentry sentry The Sentry service runs as this user.
Apache Sqoop (CDH 4, CDH 5) sqoop sqoop This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.
Apache Sqoop2 (CDH 4.2 and higher, CDH 5) sqoop2 sqoop, sqoop2 The Sqoop2 service runs as this user.
Apache Whirr   No special users.
YARN (CDH 4, CDH 5) yarn yarn, hadoop Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos.
Apache ZooKeeper (CDH 4, CDH 5) zookeeper zookeeper The ZooKeeper processes run as this user. It is not configurable.

 

 

Selected tab: NetworkingandSecurityRequirements
Selected tab: SystemRequirements

 

Issues Fixed in Cloudera Manager 5.5.2

 

Cross-site scripting vulnerability using malformed strings in the Parcel Remote URLs list

An attacker could set a malformed string in the Parcel Remote URLs list in the database and trigger the attack when a user accesses the Administration Settings page. This attack is now prevented.

 

Starting/stopping roles for Flume instance succeeds but displays nothing in popup

In Cloudera Manager 5.5.0, running a Flume start/stop service command would succeed, but display an empty popup. This is now fixed.

 

Role process commands missing stderr and stdout in command details

In Cloudera Manager 5.5, certain commands did not show links to stderr or stdout in the Cloudera Manager UI even if they were executed recently. These could still be found in/var/run/cloudera-scm-agent/process/ on that host. In Cloudera Manager 5.5.2 stderr and stdout should appear as they did before.

Note that links to stderr and stdout for commands may disappear if another command is run on that role. This is expected. The logs can still be found in /var/run/cloudera-scm-agent/process/ on that host.

 

Updating the Hive NameNode location multiple times could lead to data corruption

Multiple updates to the Hive NameNode location could cause Hive Metastore database corruption. Issuing the same command multiple times no longer produces problems.

 

Cloudera Manager skips NameNode logs in the diagnostic bundle

Scheduled diagnostic bundles did not include recent role logs. Diagnostic bundles collected manually (not scheduled) worked as expected. Now, scheduled diagnostic bundles include the latest NameNode logs.

 

Kafka 2.0 fails to deploy on large clusters as reserved.broker.max.id defaults to 1000

Large Kafka clusters would not start when Cloudera Manager-generated broker IDs exceeded the value set by reserved.broker.max.id. The default value ofbroker.id.generation.enable has now been set to false to disable the reserved.broker.max.id configuration property and avoid collisions.

 

Cloudera Manager fails to propagate HBase coprocessors to the gateway nodes

Cloudera Manager does not propagate HBase coprocessors to the gateway nodes. As a result, tools that depend on the HBase security subsystem, such as theloadIncrementalHFiles tool, do not use security features, even in secure environments.

Workaround: Add the following properties to the HBase Client Advanced Configuration Snippet (Safety Valve) for hbase-site.xml and restart all HBase clients:

<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint</value>
</property>

 

HDFS nameservices call returns incorrect High Availability roles

This fixes an issue where HDFS nameservices API calls returned incorrect active/standby status for HA roles. For example, the output would show "active" for a standby role and vice versa.

 

Hive requires the hive principal for HiveServer2 host as well as load balancer

An issue with HiveServer2 missing its principal and keytab when Hive is load-balanced has been fixed.

 

More descriptive error message for service trying to start on a decommissioned host

Cloudera Manager now displays a more descriptive error message when it skips the "Start" command because all roles are started or decommissioned or on a decommissioned host.

 

UI shows repeated errors when loading replications page

Issue fixed with loading the replication page when a previous replication command fails without launching a MapReduce job.

 

Allow option for external users to be assigned roles in the local database

In Cloudera Manager 5.5, role assignments for external users was disabled, which caused upgrade issues. The fix rolls back the change but instead of using a union approach, implements the following precedence rules:

  • If a user is assigned a role in Cloudera Manager, this local role is used.
  • Otherwise, a user's LDAP group association determines the user role.
 

Clean up usercache directories on migration from unsecure to secure mode

Fixes an issue that led to YARN jobs failing after migration from unsecure to secure mode.

 

Spark REST API does not work when parcels are used

The REST API for retrieving data from a live Spark UI or from the Spark History Server has been fixed.

 

In secure clusters, DataNode fails to start when dfs.data.transfer.protection is set and DataNode ports are changed to unprivileged ports

Before this fix, the only way to run the DataNode on unprivileged ports (port number > 1024) in a Kerberized cluster with DataNode Data Transfer Protection enabled, was to use single-user mode. Now this configuration works for both regular and single-user mode installs.

Both Hadoop SSL and DataNode Data Transfer Protection are still required for unprivileged DataNode ports to work in a Kerberized cluster. This configuration is supported only in CDH 5.2 and higher.

 

New validation warning for non-recommended secure DataNode configurations in CDH 5.2 and higher

On a Kerberos-enabled cluster running CDH 5.2 and higher, there are two recommended DataNode configurations. Use SASL/TLS with DataNode Data Transfer Protection enabled to encrypt the connection, or use only privileged ports to communicate. The supported combinations of HDFS configuration properties follow:

  • Security through SASL/TLS (preferred):
    • DataNode Data Transfer Protection - Enabled
    • Hadoop TLS/SSL - Enabled
    • DataNode Transceiver Port - Non-privileged (that is, port number >= 1024)
    • Secure DataNode Web UI Port (TLS/SSL) - Non-privileged (that is, port number >= 1024)
  • Security through privileged ports:
    • DataNode Data Transfer Protection - Disabled
    • Hadoop TLS/SSL - Disabled
    • DataNode Transceiver Port - Privileged (that is, port number < 1024)
    • DataNode HTTP Web UI Port - Privileged (that is, port number < 1024)

    Any configuration other than these results in a validation warning or error from Cloudera Manager. In particular, the following configuration, which is allowed by HDFS but is not recommended, results in a (dismissible) validation warning:

  • (Not Recommended) Security without enabling DataNode Data Transfer Protection:
    • DataNode Data Transfer Protection - Disabled
    • Hadoop TLS/SSL - Enabled
    • DataNode Transceiver Port - Privileged (that is, port number < 1024)
    • Secure DataNode Web UI Port (TLS/SSL) - Non-privileged (that is, port number >= 1024)

 

All configurations other than the three listed result in Cloudera Manager displaying a validation error.

 

Sensitive environment parameters not redacted for CSDs

Passwords in environment variables for CSDs are now redacted.

 

Update Apache Commons Collections library in Cloudera Manager due to major security vulnerability

The Apache Commons Collections library has been upgraded to 3.2.2 to fix a critical security vulnerability.

 

Remove plaintext keystore password from /api/v6/cm/config

With the addition of the JVM parameter, \-Dcom.cloudera.api.redaction=true, sensitive configuration values are redacted from the API.

 

JVM parameter to redact passwords now redacts the password salt and hash

When API redaction is turned on using the JVM argument -Dcom.cloudera.api.redaction=true, it also redacts the user's pwHash and pwSalt values. Passwords for Cloudera Manager Peers are also redacted.

 

Oozie keystore and truststore passwords now redacted

Oozie's Java keystore and truststore passwords are no longer sent in clear text on the command line.

 

Several Replication UI fixes

  • The Replication UI Last Run column now sorts correctly based on dates.
  • Collecting diagnostic data for failed Hive Replication commands no longer fails.
  • Finished schedules are no longer shown as running when the user is watching the page.
  • Scheduled time now accurately translates between browser time and server time.
  • Actions menu is now enabled for replication schedules with commands running. Previously, it would be disabled while a replication schedule was running, which blocked changing the configuration for future runs.
 

Fix Java detection

Java detection in the "Components" view for a host was fixed to account for Java versions installed using symlinks in /usr/java (such as /usr/java/default). A similar fix was made to the host inspector's Java detection logic.

 

When Host Monitor is stopped, cluster/services status in Cloudera Manager API returns Good instead of N/A or Unknown

If the Host Monitor is down, the service details page will still be able to present non-host related health status.

 

 

 

 

 

Selected tab: WhatsNew

Cloudera Enterprise Extensions

ODBC and JDBC Drivers

The Cloudera ODBC and JDBC Drivers for Hive and Impala enable your enterprise users to access Hadoop data through Business Intelligence (BI) applications with ODBC/JDBC support.

Data Transfer Connectors

Sqoop Connectors are used to transfer data between Apache Hadoop systems and external databases or Enterprise Data Warehouses. These connectors allow Hadoop and platforms like CDH to complement existing architecture with seamless data transfer.

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University

Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.