Cloudera Manager 5.4.0

A unified interface to manage your enterprise data hub

Cloudera Manager is a unified management interface that makes it easy to install, configure, and manage a CDH cluster. It automatically ships with Cloudera Enterprise or Cloudera Express to help you get up and running with Hadoop faster.

Cloudera Manager 5.4.0 works with both CDH 4 and CDH 5 and is available with:
Cloudera Express - Easily deploy, manage, monitor and perform diagnostics on your CDH cluster
Cloudera Enterprise – Includes all the above capabilities, plus advanced management features and support including zero downtime upgrades and backup and disaster recovery.

Cloudera Manager is the recommended tool for installing Cloudera Enterprise or Cloudera Express. It automatically downloads with Cloudera Enterprise or Cloudera Express. Cloudera Manager with Enterprise requires a license.

When installing Cloudera Express you will have the option to unlock Cloudera Enterprise features for a free 60-day trial.

Once the trial has concluded, the Cloudera Enterprise features will be disabled until you obtain and upload a license.


What's New in Cloudera Manager 5

The following sections describe what's new in each Cloudera Manager 5 release.

What's New in Cloudera Manager 5.4.0

  • OS - Added support for RHEL 6.6 and CentOS 6.6.
  • Cloudera Manager prevents installing or upgrading to a CDH version that is too new for the Cloudera Manager version. When using parcels, it prevents parcel installation. When using packages, it prevents creating services.
  • Installation and add service wizards now support the Oozie database.
  • New wizard for NameNode, Failover Controller, and JournalNode role migration.
  • Parcel page layout redesigned in terms of layout, performance and ease of use. A new parcel per host detail view is added.
  • Configuration
    • Configuration pages use the new layout by default. The new layout is dramatically improved in terms of layout, performance, and ease of use. The existing layout is accessible via the Switch to the classic layout link.
    • New configuration actions:
      • Configuration can now be applied to all clusters as well as for a specific cluster.
      • Several new configuration views have been added to show all non-default values across all clusters and the Cloudera Management Service, as well as differences across all clusters and multiple services of the same type.
      • One-click differences in configuration settings for a specific service across multiple clusters.
  • Support
    • Include a Cloudera support ticket with YARN application support bundles.
    • Reduce the size of support bundles by specifying log data of interest to include in the bundle.
  • HDFS
    • Support for HDFS DataNode hot swap.
    • Option to include replication of extended attributes during HDFS replication. HDFS ACLs will now be replicated along with permissions.
  • Added support for Hive on Spark. For more information, see Hive on Spark.
      Important: Hive on Spark is included in CDH 5.4.0 but is not currently supported nor recommended for production use. If you are interested in this feature, try it out in a test environment until we address the issues and limitations needed for production-readiness.
  • Security
    • Secure impersonation support for the Hue HBase app.
    • Redaction of sensitive data in log files and in SQL query history.
    • Support for custom Kerberos principals.
    • Added commands for regenerating Kerberos keytabs at service and host levels. These commands will clear existing keytabs from affected role instances and then trigger the Generate Credentials command to create new keytabs.
    • Kerberos support for Sqoop 2.
    • Kerberos and SSL/TLS support for Flume Thrift Source and Sink.
    • Solr SSL/TLS support.
    • Navigator Key Trustee Server can be installed and monitored by Cloudera Manager.
    • HBase Indexer integration with Sentry (File-based) for authorization.

What's New in Cloudera Manager 5.3.3

A number of issues have been fixed. See Issues Fixed in Cloudera Manager 5.3.3.

What's New in Cloudera Manager 5.3.2

A number of issues have been fixed. See Issues Fixed in Cloudera Manager 5.3.2.

What's New in Cloudera Manager 5.3.1

A number of issues have been fixed. See Issues Fixed in Cloudera Manager 5.3.1.

What's New in Cloudera Manager 5.3.0

  • JDK 1.8 - Cloudera Manager adds support for Oracle JDK 1.8.
  • Single user mode - The Cloudera Manager Agent and all service processes can now be run as a single configured user in environments where running as root is not permitted. See Single User Mode Requirements.
  • CDH upgrade wizard enhanced - The CDH upgrade wizard now supports minor and maintenance version upgrade as well as major version upgrade.
  • Oozie Sharelib - The Oozie Sharelib can be updated without restarting the Oozie service.
  • Read-only users prevented from viewing process logs or environment - Read-only users can no longer view the environment or logs of a process. This is to prevent read-only users from seeing potentially sensitive information.
  • New icons for the KMS and Key Trustee services.
  • Data-at-rest encryption
      Important: Cloudera provides two solutions:
    • Navigator Encrypt is production ready and available to Cloudera customers licensed for Cloudera Navigator. Navigator Encrypt operates at the Linux volume level, so it can encrypt cluster data inside and outside HDFS. Consult your Cloudera account team for more information.
    • HDFS Encryption is production ready and operates at the HDFS directory level, enabling encryption to be applied only to HDFS folders where needed.
    HDFS encryption implements transparent, end-to-end encryption of data read from and written to HDFS by creating encryption zones. An encryption zone is a directory in HDFS with every file and subdirectory in it encrypted. Use one of the following services to store, manage, and access encryption zone keys:
    • KMS (File) - The Hadoop Key Management Server with a file-based Java keystore; maintains a single copy of keys, using simple password-based protection.
    • KMS (Navigator Key Trustee) - An enterprise-grade key management service that replaces the file-based Java keystore and leverages the advanced key-management capabilities of Cloudera Navigator Key Trustee. Navigator Key Trustee is designed for secure, authenticated administration and cryptographically strong storage of keys on multiple redundant servers that can be located outside the cluster.
    For more information, see HDFS Data At Rest Encryption.
  • The Cloudera Manager Server now reports the correct number of physical cores and hyper-threading cores if hyper-threading is enabled.
  • Client configurations - Client configurations are now managed so that they are redeployed when a machine is re-imaged.
      Important: The changes to client configurations affect some API calls, as follows:
    • When a host ceases to have a client configuration assigned to it, Cloudera Manager will remove it, rather than leaving it behind. If a host has a client configuration assigned and the client configuration is missing, Cloudera Manager will recreate it.
    • If you currently use the API command deployClientConfig to deploy the client configurations for a particular service, and you pass a specific set of role names to this call to narrow the set of hosts that receive the new client configuration, then you should be aware that:
      • The API command will continue to generate and deploy the client configuration only to the hosts that correspond to the specified role names.
      • Any other hosts that previously had deployed client configurations, but do not have gateway roles assigned to them, will have those client configurations removed from them. This is the new behavior.
    • The behavior of the cluster level deployClientConfig command, and calling the service level command with no arguments, is unchanged. The command still deploys a new client configuration to all hosts with roles corresponding to the specified service or cluster.
    • As this change is due to internal functional changes inside CM, it is not restricted to any new API level. The deployClientConfig command in all API levels is affected.
  • Configuration
    • NameNode configuration - The decommissioning parameters dfs.namenode.replication.max-streams and dfs.namenode.replication.max-streams-hard-limit are now available.
    • Hue debug options - Two service-level configuration parameters have been added to the Hue service to enable Django debug mode and debugging of internal server error responses.

What's New in Cloudera Manager 5.2.5

A number of issues have been fixed, see Issues Fixed in Cloudera Manager 5.2.5.

What's New in Cloudera Manager 5.2.4

There are no changes for Cloudera Manager 5.2.4. It was released to provide the Cloudera Navigator fix in What's New in Cloudera Navigator 2.1.4.

 

  Note: Although there is a CDH 5.2.3 release, there is no synchronous Cloudera Manager 5.2.3 release.

What's New in Cloudera Manager 5.2.2

  • HDFS Decommissioning - The following decommissioning properties have been exposed in Cloudera Manager 5.2.2.
    • Maximum number of replication threads on a Datanode (dfs.namenode.replication.max-streams)
    • Hard limit on the number of replication threads on a Datanode (dfs.namenode.replication.max-streams-hard-limit)
  • New icons for the KMS and Key Trustee services.

What's New in Cloudera Manager 5.2.1

This release fixes the “POODLE” vulnerability and a number of other issues. See Issues Fixed in Cloudera Manager 5.2.1.
  • The YARN yarn.nodemanager.recovery.dir property can be configured.
  • A health check indicates whether the HDFS metadata upgrade has not been finalized.

What's New in Cloudera Manager 5.2.0

  • OS and database support - Adds support for Ubuntu Trusty (version 14.04) and PostgreSQL 9.3.
  • Services - the following new services have been added:
    • Isilon - supports the EMC Isilon distributed filesystem.
    • KMS - the Java keystore-based key management server.
    • Key Trustee - the enterprise-grade key management server using Cloudera Navigator Key Trustee.
    • Spark - running Spark applications on YARN. The existing Spark service has been renamed Spark (Standalone).
  • Accumulo - Kerberos authentication is now supported. If you have been using advanced configuration snippets (safety valves) to configure Kerberos with Accumulo, you may now remove those settings and have Cloudera Manager generate the principal and keytab file for you.
  • HDFS Data at Rest Encryption -
      Note: Cloudera provides the following two solutions for data at rest encryption:
    • Navigator Encrypt - is production ready and available for Cloudera customers licensed for Cloudera Navigator. Navigator Encrypt operates at the Linux volume level, so it can encrypt cluster data inside and outside HDFS. Talk to your Cloudera account team for more information about this capability.
    • HDFS Encryption - included in CDH 5.2.0 operates at the HDFS folder level, enabling encryption to be applied only to HDFS folders where needed. This feature has several known limitations. Therefore, Cloudera does not currently support this feature in CDH 5.2 and it is not recommended for production use. If you're interested in trying the feature out, upgrade to the latest version of CDH 5.

      HDFS now implements transparent, end-to-end encryption of data read from and written to HDFS by creating encryption zones. An encryption zone is a directory in HDFS with all of its contents, that is, every file and subdirectory in it, encrypted. You can use either the KMS or the Key Trustee service to store, manage, and access encryption zone keys. For more information, see HDFS Data At Rest Encryption.

  • HBase - Support for configuring hedged reads has been added for HBase. The default configuration is to turn hedged reads off. Cloudera Manager will emit two properties, dfs.client.hedged.read.threadpool.size (default: 0) and dfs.client.hedged.read.threshold.millis (default: 500ms) to hbase-site.xml. For more information, see Hedged Reads .
  • ZooKeeper - the RMI port can be configured. The port is configured using the JDK7 flag -Dcom.sun.management.jmxremote.rmi.port. The default value is set to be same as the JMX Agent port. Also, a special value of 0 or -1 disables the setting and a random port is used. The configuration has no effect on versions lower than Oracle JDK 7u4.
  • Cloudera Manager Agent configuration
    • The supervisord port can now be configured in the Agent configuration supervisord_port. The change takes effect the next time supervisord is restarted (not simply when the Agent is restarted).
    • Added an Agent configuration local_filesystem_whitelist that allows configuring the list of local filesystems that should always be monitored.
  • Proxy user configuration
    • All services' proxy user configuration properties have been moved to the HDFS service. Other services running on the cluster inherit the configuration values provided in HDFS. If you have previously configured a service to have values different from those configured in HDFS, then the proxy user configuration properties will be moved to that service's Advanced Configuration Snippet (Safety Valve) for core-site.xml to retain existing behavior.

      Oozie and Solr are exceptions to this. Oozie proxy user configuration properties have been moved to Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml if they differ from HDFS. Solr proxy user configuration properties have been moved to Solr Service Environment Advanced Configuration Snippet (Safety Valve) if they differ from HDFS.

  • Resource management - YARN and Llama integrated resource management and Llama high availability wizard.
  • New and changed user roles - BDR Administrator, Cluster Administrator, Navigator Administrator, and User Administrator. The Administrator role has been renamed Full Administrator. See Cloudera Manager User Accounts.
  • Configuration UI
    • Cluster-wide configuration - you can view all modified settings and configure log directories, disk space thresholds, and port settings.
    • New configuration layout - the new layout provides an alternate way to view configuration pages. In the classic layout, pages are organized by role group and categories within the role groups. The new layout allows you to filter on configuration status, category, and scope. On each configuration page you can easily switch between the classic and new layout.
        Important: The classic layout is the default. All the configuration procedures described in the Cloudera Manager documentation assume the classic layout.

What's New in Cloudera Manager 5.1.5

A number of issue have been fixed. See Fixed Issues in Cloudera Manager 5.1.5.

What's New in Cloudera Manager 5.1.4

A number of issues have been fixed. See Fixed Issues in Cloudera Manager 5.1.4.

What's New in Cloudera Manager 5.1.3

A number of issues have been fixed. See Fixed Issues in Cloudera Manager 5.1.3.

  • JDK Installation
    • Users who are adding or upgrading hosts can now choose not to install the JDK that ships with Cloudera Manager.

What's New in Cloudera Manager 5.1.2

A number of issues have been fixed. See Fixed Issues in Cloudera Manager 5.1.2.

  • New SAML configuration option
    • You can now specify the binding protocol to be used for AuthNResponses sent from the IDP to Cloudera Manager. Previously, Cloudera Manager would only use HTTP-Artifact, but it is now possible to choose HTTP-Post. HTTP-Artifact remains the default binding.

What's New in Cloudera Manager 5.1.1

An issue has been fixed. See Issues Fixed in Cloudera Manager 5.1.1.

What's New in Cloudera Manager 5.1.0

  Important: Cloudera Manager 5.1.0 is no longer available for download from the Cloudera website or from archive.cloudera.com due to the JCE policy file issue described in the Fixed Issues in Cloudera 5.1.1 section of the Release Notes. The download URL at archive.cloudera.com for Cloudera Manager 5.1.0 now forwards to Cloudera Manager 5.1.1 for the RPM-based distributions for Linux RHEL and SLES.
  • SSL Encryption
    • Supports several new SSL-related configuration parameters for HDFS, MapReduce, YARN and HBase, which allow you to configure and enable encrypted shuffle and encrypted web UIs for these services. See Configuring Encryption for Hadoop Services.
    • Cloudera Manager now also supports the monitoring of HDFS, MapReduce, YARN, and HBase when SSL is enabled for these services. New configuration parameters allow you to specify the location and password of the truststore used to verify certificates in HTTPS communication with CDH services and the Cloudera Manager Server.
  • Sentry Service
    • A new Sentry service that stores the authorization metadata in an underlying relational database and allows you to use Grant/Revoke statements to modify privileges. See The Sentry Service.
    • You can also configure the Sentry service to allow Pig, MapReduce, and WebHCat queries access to Sentry-secured data stored in Hive. See Configuring Pig and HCatalog for the Sentry Service.
  • Kerberos Authentication
    • Now supports a Kerberos cluster using an Active Directory KDC.
    • New wizard to enable Kerberos on an existing cluster. The wizard works with both MIT KDC and Active Directory KDC.
    • Ability to configure and deploy Kerberos client configuration (krb5.conf) on a cluster.
  • Spark Service - added the History Server role
  • Impala - added support for Llama ApplicationMaster High Availability
  • User Roles - there are two new roles: Operator and Configurator that support fine-grained access to Cloudera Manager features. See Cloudera Manager User Accounts.
  • Monitoring
    • Updates to Oozie monitoring
    • New Hive metastore canary
  • UI - The UI has been updated to improve scalability. The Home page Status tab can be configured to display clusters in a full or summary format. There is a new Cluster page for each cluster. The Hosts and Instances pages have added faceted filters.

What's New in Cloudera Manager 5.0.6

A number of issues have been fixed. See Fixed Issues in Cloudera Manager 5.0.6.

What's New in Cloudera Manager 5.0.5

A number of issues have been fixed. See Fixed Issues in Cloudera Manager 5.0.5.

What's New in Cloudera Manager 5.0.2

A number of issues have been fixed. See Issues Fixed in Cloudera Manager 5.0.2.

What's New in Cloudera Manager 5.0.1

A number of issues have been fixed. See Issues Fixed in Cloudera Manager 5.0.1.

  • Monitoring
    • The Java Garbage Collection Duration health test for the Service Monitor, Host Monitor, and Activity Monitor has been replaced with the new Java Pause Duration health test.

What's New in Cloudera Manager 5.0.0

  • Service and Configuration Management
    • HDFS - cache management
  • Resource Management - Impala admission control
  • Monitoring
    • Host disks overview
    • Impala best practices
    • HBase table statistics
    • HDFS cache statistics

What's New in Cloudera Manager 5.0.0 Beta 2

  • Service and Configuration Management
    • HDFS
      • HDFS NFS Gateway role
      • Supports restoration of HDFS data from a snapshot
    • YARN
      • YARN Resource Manager High Availability
      • Resource pool scheduler
    • Support for Spark service
    • Support for Accumulo service
    • Support for service extensibility
    • Support to set up Oozie server High Availability
    • Granular configuration staleness UI
    • Support for setting maximum file descriptors
  • Monitoring
    • Support for monitoring the Cloudera Search/Solr service
    • New "failed" and "killed" badges displayed for unsuccessful YARN applications
    • More attributes available for filtering displays of YARN applications and Impala queries
    • New operational reports added for HBase tables and namespaces, Impala queries, and YARN applications
    • Support for creating user-defined triggers for metrics accessible via charts/tsquery
        Important: Because triggers are a new and evolving feature, backward compatibility between releases is not guaranteed at this time.
    • Charting improvements
      • New table chart type
      • New options for displaying data and metadata from charts
      • Support for exporting data from charts to CSV or JSON files
  • Administrative Settings
    • Added a new role type with limited administrator capabilities.
    • Cloudera Manager Server and all JVMs will create a heap dump if they run out of memory.
    • Configure the location of the parcel directory and specify whether and when to remove old parcels from cluster hosts.

What's New in Cloudera Manager 5.0.0 Beta 1

  • CDH Version
    • Supports both CDH 4 and CDH 5
    • CDH 4 to CDH 5 upgrade wizard
    • Support for YARN as a production execution environment
      • MapReduce (MRv1) to YARN (MRv2) configuration import
      • YARN-based resource management for Impala 1.2
  • JDK Version - Cloudera Manager 5 supports and installs both JDK 6 and JDK 7.
  • Resource Management
    • Static and dynamic partitioning of resources: provides a wizard for configuring static partitioning of resources (cgroups) across core services (HBase, HDFS, MapReduce, Solr, YARN) and dynamic allocation of resources for YARN and Impala.
    • Pool, resource group, and queue administration for YARN and Impala.
    • Usage monitoring and trending.
  • Monitoring
    • YARN service monitoring
    • YARN (MRv2) job monitoring
    • Configurable histograms of Impala query and YARN job attributes that can be used to quickly filter query and application lists
    • Scalable back-end database for monitoring metrics
    • Charting improvements
      • New chart types: histogram and heatmap
      • New scale types: logarithmic and power
      • Updates to tsquery language: new attribute values to support YARN and new functions to support new chart types
  • Extensibility
    • Ability to manage both ISV applications and non-CDH services (for example, Accumulo, Spark, and so on)
    • Working with select ISVs as part of Beta 1
  • Single Sign-On - Support for SAML to enable single sign-on
  • Parcels
    • Dependency enforcement to ensure incompatible parcels are not used together
    • Option to not cache downloaded parcels, to save disk space
    • Improved error reporting for management operations
  • Backup and Disaster Recovery (BDR)
    • HBase and HDFS snapshots: Supports scheduling snapshots on a recurring basis.
    • Support for YARN (MRv2): Replication jobs can now run using YARN (MRv2) instead of MRv1.
    • Global replication page: All scheduled snapshots (HDFS and HBase) and replication jobs for either HDFS or Hive are shown on a single Replications page.
  • Other
    • Global Search box
    • Several usability improvements
    • Comprehensive detection of configuration changes that require service restarts, refresh and redeployment of client configurations

Incompatible Changes in Cloudera Manager 5

The following sections describe incompatible changes in each Cloudera Manager 5 release.

Incompatible Changes Introduced in Cloudera Manager 5.4.0

  • The Blacklisted Products property has been removed from the Hosts > Parcels configuration.

Incompatible Changes Introduced in Cloudera Manager 5.3.0

  • Oozie metrics - The Oozie metrics framework is now controlled by the Enable The Metrics Instrumentation Service flag, which is enabled by default. When enabled, the old 'instrumentation' REST end-point is disabled and metrics are available on the new 'metrics' REST end-point (hostname:port/v2/admin/metrics).

Incompatible Changes Introduced in Cloudera Manager 5.2.0

  • Due to various internal changes to configuration generation, all service and client configurations will be stale after upgrade. To propagate the updates, restart the cluster and redeploy client configurations.

Incompatible Changes Introduced in Cloudera Manager 5.1.0

  • The Limited Administrator role has been renamed Limited Operator. The Limited Operator role is no longer available in Cloudera Manager Express. If you upgrade a Cloudera Manager Express installation, users in the Limited Operator role will not be able to log in. A user in the Administrator role must assign the Read-Only or Administrator role to those users.

Incompatible Changes Introduced in Cloudera Manager 5.0.0

  • Cloudera Manager API
    • New upgradeCdh command, which upgrades CDH cluster versions. Use this command to upgrade clusters from CDH 4 to CDH 5. The upgradeServices command previously used to upgrade CDH cluster versions is no longer supported.
    • The hostId field now contains a unique UUID and no longer matches the hostName field. When referring to a host, both hostId and hostName are accepted. However, any API clients that were previously cross-referencing host records with external information by hostName, but were using the hostId field in the API, must be updated to use the hostName field. Clients updated in this manner will function correctly with older versions of Cloudera Manager because the hostName field has always been present.
    • The clusterName field displayed when viewing service and role references is now an internal name and may not match the external displayNamefield of the cluster.
  • CDH 5 Hue requires Python 2.6 and above, effectively dropping support for Python 2.4 and 2.5. Hue will install without Python 2.6, but will not start.
  • Cloudera Manager 5.0 includes a change to the value of the snmpTrapOID. Earlier releases set the value of snmpTrapOID (OID: .1.3.6.1.6.3.1.1.4.1.0) wrongly to clouderaManagerMIBNotifications (OID .1.3.6.1.4.1.38374.1.1.1). This is fixed in Cloudera Manager 5.0 with the correct value, which is clouderaManagerAlert (OID .1.3.6.1.4.1.38374.1.1.1.1). This change will break SNMP server setups that are configured to expect clouderaManagerMIBNotifications. Cloudera Manager administrators should configure their SNMP receivers to accept the corrected OID.
  • The default values for the following configurations have changed to include the JVM option -Djava.net.preferIPv4Stack=true, which sets the preferred protocol stack to IPv4 on dual-stack machines. Any values set to the old defaults will automatically be changed to the new default when upgrading to Cloudera Manager 5.
    • MapReduce client configuration:
      • hadoop-env.sh: added to HADOOP_CLIENT_OPTS
      • mapred-site.xml: added to mapred.child.java.opts
    • YARN client configuration:
      • hadoop-env.sh: added to YARN_OPTS
      • mapred-site.xml: added to yarn.app.mapreduce.am.command-opts, mapreduce.map.java.opts, and mapreduce.reduce.java.opts
    • HDFS client configuration: hadoop-env.sh: added to HADOOP_CLIENT_OPTS
    • Hive client configuration: hive-env.sh: added to HADOOP_CLIENT_OPTS
  • MapReduce health tests have been removed:
    • Job failure
    • Map backlog
    • Reduce backlog
    • Map locality
    If needed, the test can be replaced with a trigger. For example:
    • Looks at all the jobs that completed in the last hour and if there are more than 10% of failed jobs, change the health of the service to concerning:
      IF (select (jobs_failed_rate * 3600) as jobs_failed, ((jobs_failed_rate + jobs_completed_rate + jobs_killed_rate) * 3600) as all_jobs where roleType=JOBTRACKER AND serviceName=$SERVICENAME and last(jobs_failed_rate / (jobs_failed_rate + jobs_completed_rate + 
      jobs_killed_rate)) >= 10 ending at $END_TIME duration "PT3600S") DO health:concerning
      
    • If there are more than 50% maps waiting than total slots available, health goes concerning.
      IF (select waiting_maps / map_slots where roleType=JOBTRACKER and serviceName=$SERVICENAME and last(waiting_maps / map_slots) > 50) DO health:concerning
      
    • If there are more than 50% reduce waiting than total slots available, health goes concerning.
      IF (select waiting_reduces / reduce_slots where roleType=JOBTRACKER and serviceName=$SERVICENAME and last(waiting_reduces / reduce_slots) > 50) DO health:concerning
      
  • HDFS checkpointing metrics have been removed:
    • end_checkpoint_num_ops
    • end_checkpoint_avg_time
    • start_checkpoint_num_ops
    • start_checkpoint_avg_time

Incompatible Changes Introduced in Cloudera Manager 5.0.0 Beta 2

  • Impala releases earlier than 1.2.1 are no longer supported.
  • Some of the constants identifying health tests have changed. The following existed in Cloudera Manager 4:
    • FAILOVERCONTROLLER_FILE_DESCRIPTOR
    • FAILOVERCONTROLLER_HOST_HEALTH
    • FAILOVERCONTROLLER_LOG_DIRECTORY_FREE_SPACE
    • FAILOVERCONTROLLER_SCM_HEALTH
    • FAILOVERCONTROLLER_UNEXPECTED_EXITS

    They are now:

    • MAPREDUCE_FAILOVERCONTROLLER_FILE_DESCRIPTOR
    • MAPREDUCE_FAILOVERCONTROLLER_HOST_HEALTH
    • MAPREDUCE_FAILOVERCONTROLLER_LOG_DIRECTORY_FREE_SPACE
    • MAPREDUCE_FAILOVERCONTROLLER_SCM_HEALTH
    • MAPREDUCE_FAILOVERCONTROLLER_UNEXPECTED_EXITS

    and

    • HDFS_FAILOVERCONTROLLER_FILE_DESCRIPTOR
    • HDFS_FAILOVERCONTROLLER_HOST_HEALTH
    • HDFS_FAILOVERCONTROLLER_LOG_DIRECTORY_FREE_SPACE
    • HDFS_FAILOVERCONTROLLER_SCM_HEALTH
    • HDFS_FAILOVERCONTROLLER_UNEXPECTED_EXITS

    The reason for the change is to better distinguish between MapReduce and HDFS failover controller monitoring in the health system.

Incompatible Changes Introduced in Cloudera Manager 5.0.0 Beta 1

  • Services
    • Impala - With Cloudera Manager 4.8 (released in late November 2013), only Impala 1.2.1 is supported, due to the introduction of the Impala Catalog Server. However, CDH 5.0.0 Beta 1 was released with Impala 1.2.0 (Beta). Therefore, if you upgrade from Cloudera Manager 4.8 (with Impala 1.2.1) to Cloudera Manager 5.0.0 Beta 1, and then upgrade your CDH to CDH 5.0.0 Beta 1, your version of Impala will be downgraded to Impala 1.2.0 from 1.2.1. This will result in some loss of functionality. See New Features in Impala for a list of the new features in Impala 1.2.1 that are not in Impala 1.2.0 (Beta).
    • Hive - HiveServer2 is a mandatory role for Hive in CDH 5.
    • Hue - In CDH 5, Hue no longer has a Beeswax Server role. Hue now submits queries to HiveServer2.
    • HDFS - Cloudera Manager 5 does not support NFS-mounted shared edits directories for HDFS High Availability. It only supports the Quorum Journal method for shared edits. If you upgrade from Cloudera Manager 4 with a working CDH 4 High Availability configuration that uses NFS-mounted directories, your installation will continue to work until you disable High Availability. You will not be able to re-enable High Availability with NFS-mounted directories. Furthermore, you will not be able to upgrade to CDH 5 unless you disable High Availability, and you will need to use Quorum-based storage in order to re-enable High Availability after the upgrade.
    • YARN
      • The YARN (MRv2) configuration mapreduce.job.userlog.retain.hours has been replaced by yarn.log-aggregation.retain-seconds. Any existing value in mapreduce.job.userlog.retain.hours will be lost. However, this configuration never had any effect, so no functionality is affected.
      • The following configuration parameters were removed from YARN. These never had any effect, so no functionality is affected.
        • mapreduce.jobtracker.maxtasks.perjob
        • mapreduce.jobtracker.handler.count (non-functional duplicate of yarn.resourcemanager.resource-tracker.client.thread-count)
        • mapreduce.jobtracker.persist.jobstatus.active
        • mapreduce.jobtracker.persist.jobstatus.hours
        • mapreduce.job.jvm.numtasks
      • The following YARN configuration parameters were replaced. Only the YARN parameters were replaced. Old configurations will be lost, but they never had any effect so this does not affect functionality.
        • mapreduce.jobtracker.restart.recover replaced by yarn.resourcemanager.recovery.enabled (changed from Gateway to ResourceManager)
        • mapreduce.tasktracker.http.threads replaced by mapreduce.shuffle.max.connections
        • mapreduce.jobtracker.staging.root.dir replaced by yarn.app.mapreduce.am.staging-dir
      • Cloudera Manager 5 sets the default YARN Resource Scheduler to FairScheduler. If a cluster was previously running YARN with the FIFO scheduler, it will be changed to FairScheduler the next time YARN restarts. The FairScheduler is only supported with CDH 4.2.1 and later, and older clusters may hit failures and need to manually change the scheduler to FIFO or CapacityScheduler. See the Known Issues section of this Release Note for information on how to change the scheduler back to FIFO or CapacityScheduler.

Changed Features and Behaviors in Cloudera Manager 5

The following sections describe what’s changed in each Cloudera Manager 5 release.

  Note: Rolling upgrade is not supported between CDH 4 and CDH 5. Rolling upgrade will also not be supported from CDH 5.0.0 Beta 2 to any later releases, and may not be supported between any future beta versions of CDH 5 and the General Availability release of CDH 5.

What's Changed in Cloudera Manager 5.4.0

  • Cloudera Manager checks the specified version of CDH before an installation and upgrade to ensure that it is compatible with Cloudera Manager before proceeding. Specifically, for Cloudera Manager 5.4 that means no version of CDH newer than 5.4.x is supported (Cloudera Manager must be upgraded before upgrading to such a version of CDH). Cloudera Manager no longer shows these "too-new" versions of CDH. The 'latest' parcel repository URL will be replaced by the 'latest_supported' repository in the parcel configuration.
  • The minimum Java heap size for the Activity Monitor, Host Monitor, and Service Monitor has been changed from 50 MB to 256 MB.
  • Regenerating Kerberos principals will be denied if any roles that are using those principals are running. Stop those roles and then attempt to regenerate the principals.
  • In previous versions of Cloudera Manager, the 'version' attribute in tsquery had values that were integers, for example, 4 for CDH4, 5 for CDH5, -1 for Cloudera Manager. Starting in the Cloudera Manager 5.4, the values for the 'version' attribute are in release string format, for example "cdh5.0.0".
  • Hive
    • hive.exec.reducers.max default value changed from 999 to 1099
    • hive.exec.reducers.bytes.per.reducer default value changed from 1 GB to 64 MB
    • The default heap size for the Hive CLI is increased to 1 GB.
    • The property hive.log.explain.output is known to create instability of Cloudera Manager Agents in some specific circumstances, specially when the hive queries generate extremely large EXPLAIN outputs. Therefore, the property has been hidden from the Cloudera Manager configuration UI. The property can still be configured through the use of advanced configuration snippets.
  • Impala - The Impala Daemon now supports the Impala Maximum Log Files property which specifies the total number of log files per severity level that should be retained before they are deleted. By default, after upgrading to CDH 5.4 this property is set to 10, which means that Impala Daemons will only retain up to 10 log files for each severity level. Any additional files will be deleted.
  • HBase - Moved three settings for HBase coprocessors from Main to Advanced category:
    • Service Wide > HBase Coprocessor Abort on Error: move to 'Service Wide > Advanced > HBase Coprocessor Abort on Error'
    • 'Master Default Group > HBase Coprocessor Master Classes': move to 'Master Default Group > Advanced > HBase Coprocessor Master Classes'
    • RegionServer Default Group > HBase Coprocessor Region Classes': move to 'RegionServer Default Group > Advanced > HBase Coprocessor Region Classes'

What's Changed in Cloudera Manager 5.3.2

  • Turning on the internal HBase canary (not to be confused with Cloudera Manager monitoring canary) is optional. On new clusters, it will not be enabled by default. Existing clusters will continue to run the canary until it is disabled from the HBase configuration page.

What's Changed in Cloudera Manager 5.3.0

  • Cloudera Manager upgrade - If you have any active commands running before upgrade, the server will fail to start after upgrade. This includes commands a user might have run and also for commands Cloudera Manager automatically triggers, either in response to a state change, or something that's on a schedule.

What's Changed in Cloudera Manager 5.2.1

  • The default value of the YARN yarn.nodemanager.recovery.dir property has changed from {hadoop.tmp.dir}/yarn-nm-recovery to /var/lib/hadoop-yarn/yarn-nm-recovery.

What's Changed in Cloudera Manager 5.2.0

  • Rolling upgrade - As a result of a recent change in the way DataNodes handle block deletions during a rolling upgrade (HDFS-5907), the Trash directory may grow unexpectedly while the upgrade is in progress. Deleted blocks are kept during upgrade in case you want to roll back. The blocks are cleaned up after you finalize the upgrade.
  • Agent -
    • The hard_stop, hard_restart, and clean_restart commands now show a warning message about the impact of using these commands instead of performing the actions. To actually perform the actions, you use the hard_stop_confirmed, hard_restart_confirmed, and clean_restart_confirmed commands.
    • The default supervisord port is changed from 9001 to 19001
  • YARN application attributes renamed: slot_millis to slots_millis and fallow_slot_millis to fallow_slots_millis

What's Changed in Cloudera Manager 5.1.0

  • UI refresh for scalability
  • Revised authorization privilege model in Sentry. See Privilege Model.

What's Changed in Cloudera Manager 5.0.0

  • MapReduce now inherits topology from HDFS NameNode. Topology configuration for MapReduce JobTracker was removed. The configuration was redundant and the two parameters should always have been set to the same value.
  • UI
    • The Clusters tab no longer has Activities, Other, and Manage Resources sections.

What's Changed in Cloudera Manager 5.0.0 Beta 2

  • Product
    • Cloudera Backup and Disaster Recovery (BDR) is now included with Cloudera Enterprise.
    • Cloudera Standard has been renamed to Cloudera Express.
  • OS and packaging
    • The name of the Cloudera Manager embedded database package has changed from cloudera-manager-server-db to cloudera-manager-server-db-2. For details, read the upgrade and install topics for your OS.
    • Support for Ubuntu 10.04 and Debian 6.0 is deprecated.
  • HDFS - enabling High Availability automatically enables auto-failover, unlike in Cloudera Manager 4 where enable auto-failover was a separate command.
  • HBase
    • In CDH 5 there is no HBase canary because HBase is now monitored by a watchdog process. In CDH 4, the HBase canary is still used.
    • The RegionServer default heap size has been increased to 4GB.
  • Monitoring
    • Chart "Views" and actions related to views have been renamed to "Dashboard".
    • Changes to how attribute filters are displayed in the Impala queries and YARN applications screens
    • The outdated configuration indicator on the Home, service, and role pages has a new graphic and now has a tooltip that displays whether a cluster refresh or restart is required. There is a new indicator for changes that require redeploying client configurations. You can click an indicator to go to the new Stale Configurations page to view and resolve the conditions that gave rise to the indicator.
    • To match the naming convention of tsquery metrics, multiword Impala query and YARN application attribute names have changed from camel case to using an underscore separator. For example queryType has changed to query_type. For backward compatibility, camel case names are still supported.
  • UI
    • The main navigation bar in Cloudera Manager Admin Console has been reorganized. The Services tab has been replaced by a Clusters tab that contains links to individual services, which were previously under the Services tab, Activities and Reports sections, which were removed from the main bar, and a new Manage Resources section, which contains links to the new resource pools and service pools features. The All Services page has been removed.
    • The "Safety Valve" properties have been renamed "Advanced Configuration Snippet".
    • The screen for specifying assignment of roles to hosts has been redesigned for improved scalability and usability.
  • Misc
    • The io.compression.codecs property has moved from MapReduce to HDFS.

What's Changed in Cloudera Manager 5.0.0 Beta 1

  • When CDH 5 is installed, YARN is installed by default, rather than MapReduce, and is the default execution environment. MapReduce is deprecated in CDH 5 but is fully supported for backward compatibility through CDH 5. In CDH 4, MapReduce is still the default.
  • The setting for yarn.scheduler.maximum-allocation-mb has been increased to a default of 64GB.
  • The minimum heap size for the Solr service has been increased to 200MB (from 50MB previously) to enable it to better handle collection creation.

Cloudera Manager 5 Requirements and Supported Versions

The following sections describe the requirements and supported operating systems, databases, and browsers, including information about which major and minor release version of each entity is supported for Cloudera Manager. After installing each entity, upgrade to the latest patch version and apply any other appropriate updates. An available update may be specific to the operating system on which it is installed. For example, if you are using CentOS in your environment, you could choose 6 as the major version and 4 as the minor version to indicate that you are using CentOS 6.4. After installing this operating system, apply all relevant CentOS 6.4 upgrades and patches. In some cases, such as some browsers, a minor version may not be listed.

For the latest information on compatibility across all Cloudera products, see the Product Compatibility Matrix.

 

Supported Operating Systems

Cloudera Manager supports the following operating systems:
  • RHEL-compatible
    • Red Hat Enterprise Linux and CentOS
      • 5.7, 64-bit
      • 6.4, 64-bit
      • 6.5 in SE Linux mode
      • 6.5, 64-bit
      • 6.6, 64-bit
    • Oracle Enterprise Linux with default kernel and Unbreakable Enterprise Kernel, 64-bit
      • 5.6 (UEK R2)
      • 6.4 (UEK R2)
      • 6.5 (UEK R2, UEK R3)
      • 6.6 (UEK R3)
  • SLES - SUSE Linux Enterprise Server 11, 64-bit. Service Pack 2 or later is required for CDH 5, and Service Pack 1 or later is required for CDH 4. To use the embedded PostgreSQL database that is installed when you follow Installation Path A - Automated Installation by Cloudera Manager, the Updates repository must be active. The SUSE Linux Enterprise Software Development Kit 11 SP1 is required on hosts running the Cloudera Manager Agents.
  • Debian - Wheezy (7.0 and 7.1), Squeeze (6.0) (deprecated), 64-bit
  • Ubuntu - Trusty (14.04), Precise (12.04), Lucid (10.04) (deprecated), 64-bit
  Note:
  • Debian Squeeze and Ubuntu Lucid are supported only for CDH 4.
  • Using the same version of the same operating system on all cluster hosts is strongly recommended.

Supported JDK Versions

Cloudera Manager supports Oracle JDK 1.7.0_75 and 1.8.0_40 when it's managing CDH 5.x, and Oracle JDK 1.6.0_31 and 1.7.0_75 when it's managing CDH 4.x. Cloudera Manager supports Oracle JDK 1.7.0_75 and 1.8.0_40 when it's managing both CDH 4.x and CDH 5.x clusters. Oracle JDK 1.6.0_31 and 1.7.0_75 can be installed during the installation and upgrade. For further information, see Java Development Kit Installation.

Supported Browsers

The Cloudera Manager Admin Console, which you use to install, configure, manage, and monitor services, supports the following browsers:
  • Mozilla Firefox 11 and higher
  • Google Chrome
  • Internet Explorer 9 and higher. Internet Explorer 11 Native Mode.
  • Safari 5 and higher

Supported Databases

Cloudera Manager requires several databases. The Cloudera Manager Server stores information about configured services, role assignments, configuration history, commands, users, and running processes in a database of its own. You must also specify a database for the Activity Monitor and Reports Manager management services.

  Important: When processes restart, the configuration for each of the services is redeployed using information that is saved in the Cloudera Manager database. If this information is not available, your cluster will not start or function correctly. You must therefore schedule and maintain regular backups of the Cloudera Manager database in order to recover the cluster in the event of the loss of this database.
See Backing Up Databases.

The database you use must be configured to support UTF8 character set encoding. The embedded PostgreSQL database that is installed when you follow Installation Path A - Automated Installation by Cloudera Manager automatically provides UTF8 encoding. If you install a custom database, you may need to enable UTF8 encoding. The commands for enabling UTF8 encoding are described in each database topic under Cloudera Manager and Managed Service Data Stores.

After installing a database, upgrade to the latest patch version and apply any other appropriate updates. Available updates may be specific to the operating system on which it is installed.

Cloudera Manager and its supporting services can use the following databases:
  • MySQL - 5.5 and 5.6
  • Oracle 11gR2
  • PostgreSQL - 8.4, 9.2, and 9.3
Cloudera supports the shipped version of MySQL and PostgreSQL for each supported Linux distribution. Each database is supported for all components in Cloudera Manager and CDH subject to the notes in CDH 4 Supported Databases and CDH 5 Supported Databases.

Supported CDH and Managed Service Versions

The following versions of CDH and managed services are supported:
  Warning: Cloudera Manager 5 does not support CDH 3 and you cannot upgrade Cloudera Manager 4 to Cloudera Manager 5 if you have a cluster running CDH 3.Therefore, to upgrade CDH 3 clusters to CDH 4 using Cloudera Manager, you must use Cloudera Manager 4.
  • CDH 4 and CDH 5. The latest released versions of CDH 4 and CDH 5 are strongly recommended. For information on CDH 4 requirements, see CDH 4 Requirements and Supported Versions. For information on CDH 5 requirements, see CDH 5 Requirements and Supported Versions.
  • Cloudera Impala - Cloudera Impala is included with CDH 5. Cloudera Impala 1.2.1 with CDH 4.1.0 or later. For more information on Cloudera Impala requirements with CDH 4, see Cloudera Impala Requirements.
  • Cloudera Search - Cloudera Search is included with CDH 5. Cloudera Search 1.2.0 with CDH 4.6.0. For more information on Cloudera Search requirements with CDH 4, see Cloudera Search Requirements.
  • Apache Spark - 0.90 or later with CDH 4.4.0 or later.
  • Apache Accumulo - 1.4.3 with CDH 4.3.0, 1.4.4 with CDH 4.5.0, and 1.6.0 with CDH 4.6.0.
For more information, see the Product Compatibility Matrix.

Resource Requirements

Cloudera Manager requires the following resources:
  • Disk Space
    • Cloudera Manager Server
      • 5 GB on the partition hosting /var.
      • 500 MB on the partition hosting /usr.
      • For parcels, the space required depends on the number of parcels you download to the Cloudera Manager Server and distribute to Agent hosts. You can download multiple parcels of the same product, of different versions and builds. If you are managing multiple clusters, only one parcel of a product/version/build/distribution is downloaded on the Cloudera Manager Server—not one per cluster. In the local parcel repository on the Cloudera Manager Server, the approximate sizes of the various parcels are as follows:
        • CDH 4.6 - 700 MB per parcel; CDH 5 (which includes Impala and Search) - 1.5 GB per parcel (packed), 2 GB per parcel (unpacked)
        • Cloudera Impala - 200 MB per parcel
        • Cloudera Search - 400 MB per parcel
    • Cloudera Management Service -The Host Monitor and Service Monitor databases are stored on the partition hosting /var. Ensure that you have at least 20 GB available on this partition.For more information, see Data Storage for Monitoring Data.
    • Agents - On Agent hosts each unpacked parcel requires about three times the space of the downloaded parcel on the Cloudera Manager Server. By default unpacked parcels are located in /opt/cloudera/parcels.
  • RAM - 4 GB is recommended for most cases and is required when using Oracle databases. 2 GB may be sufficient for non-Oracle deployments with fewer than 100 hosts. However, to run the Cloudera Manager Server on a machine with 2 GB of RAM, you must tune down its maximum heap size (by modifying -Xmx in /etc/default/cloudera-scm-server). Otherwise the kernel may kill the Server for consuming too much RAM.
  • Python - Cloudera Manager and CDH 4 require Python 2.4 or later, but Hue in CDH 5 and package installs of CDH 5 require Python 2.6 or 2.7. All supported operating systems include Python version 2.4 or later.

Networking and Security Requirements

The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:

  • Cluster hosts must have a working network name resolution system and correctly formatted /etc/hosts file. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must
    • Contain consistent information about hostnames and IP addresses across all hosts
    • Not contain uppercase hostnames
    • Not contain duplicate IP addresses

    Also, do not use aliases, either in /etc/hosts or in configuring DNS. A properly formatted /etc/hosts file should be similar to the following example:

    127.0.0.1	localhost.localdomain	localhost
    192.168.1.1	cluster-01.example.com	cluster-01
    192.168.1.2	cluster-02.example.com	cluster-02
    192.168.1.3	cluster-03.example.com	cluster-03 
    
  • In most cases, the Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation or upgrade wizard. You must log in using a root account or an account that has password-less sudo permission. For authentication during the installation and upgrade procedures, you must either enter the password or upload a public and private key pair for the root or sudo user account. If you want to use a public and private key pair, the public key must be installed on the cluster hosts before you use Cloudera Manager.

    Cloudera Manager uses SSH only during the initial install or upgrade. Once the cluster is set up, you can disable root SSH access or change the root password. Cloudera Manager does not save SSH credentials, and all credential information is discarded when the installation is complete. For more information, see Permission Requirements.

  • If single user mode is not enabled, the Cloudera Manager Agent runs as root so that it can make sure the required directories are created and that processes and files are owned by the appropriate user (for example, the hdfs and mapred users).
  • No blocking is done by Security-Enhanced Linux (SELinux).
  • IPv6 must be disabled.
  • No blocking by iptables or firewalls; port 7180 must be open because it is used to access Cloudera Manager after installation. Cloudera Manager communicates using specific ports, which must be open.
  • For RedHat and CentOS, the /etc/sysconfig/network file on each host must contain the hostname you have just set (or verified) for that host.
  • Cloudera Manager and CDH use several user accounts and groups to complete their tasks. The set of user accounts and groups varies according to the components you choose to install. Do not delete these accounts or groups and do not modify their permissions and rights. Ensure that no existing systems prevent these accounts and groups from functioning. For example, if you have scripts that delete user accounts not in a whitelist, add these accounts to the list of permitted accounts. Cloudera Manager, CDH, and managed services create and use the following accounts and groups:
Table 1. Users and Groups

Component (Version)

Unix User ID Groups Notes
Cloudera Manager (all versions) cloudera-scm cloudera-scm Cloudera Manager processes such as the Cloudera Manager Server and the monitoring roles run as this user.
The Cloudera Manager keytab file must be named cmf.keytab since that name is hard-coded in Cloudera Manager.
  Note: Applicable to clusters managed by Cloudera Manager only.
Apache Accumulo (Accumulo 1.4.3 and higher) accumulo accumulo Accumulo processes run as this user.
Apache Avro   No special users.
Apache Flume (CDH 4, CDH 5) flume flume The sink that writes to HDFS as this user must have write privileges.
Apache HBase (CDH 4, CDH 5) hbase hbase The Master and the RegionServer processes run as this user.
HDFS (CDH 4, CDH 5) hdfs hdfs, hadoop The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.
Apache Hive (CDH 4, CDH 5) hive hive

The HiveServer2 process and the Hive Metastore processes run as this user.

A user must be defined for Hive access to its Metastore DB (e.g. MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml.

Apache HCatalog (CDH 4.2 and higher, CDH 5) hive hive

The WebHCat service (for REST access to Hive functionality) runs as the hive user.

HttpFS (CDH 4, CDH 5) httpfs httpfs

The HttpFS service runs as this user. See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.

Hue (CDH 4, CDH 5) hue hue

Hue services run as this user.

Cloudera Impala (CDH 4.1 and higher, CDH 5) impala impala, hadoop, hdfs, hive Impala services run as this user.
Apache Kafka (Cloudera Distribution of Kafka 1.2.0) kafka kafka Kafka services run as this user.
Java KeyStore KMS (CDH 5.2.1 and higher) kms kms The Java KeyStore KMS service runs as this user.
Key Trustee KMS (CDH 5.3 and higher) kms kms The Key Trustee KMS service runs as this user.
Key Trustee Server (CDH 5.4 and higher) keytrustee keytrustee The Key Trustee Server service runs as this user.
Llama (CDH 5) llama llama Llama runs as this user.
Apache Mahout   No special users.
MapReduce (CDH 4, CDH 5) mapred mapred, hadoop Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos.
Apache Oozie (CDH 4, CDH 5) oozie oozie The Oozie service runs as this user.
Parquet   No special users.
Apache Pig   No special users.
Cloudera Search (CDH 4.3 and higher, CDH 5) solr solr The Solr processes run as this user.
Apache Spark (CDH 5) spark spark The Spark History Server process runs as this user.
Apache Sentry (incubating) (CDH 5.1 and higher) sentry sentry The Sentry service runs as this user.
Apache Sqoop (CDH 4, CDH 5) sqoop sqoop This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.
Apache Sqoop2 (CDH 4.2 and higher, CDH 5) sqoop2 sqoop, sqoop2 The Sqoop2 service runs as this user.
Apache Whirr   No special users.
YARN (CDH 4, CDH 5) yarn yarn, hadoop Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos.
Apache ZooKeeper (CDH 4, CDH 5) zookeeper zookeeper The ZooKeeper processes run as this user. It is not configurable.

Cloudera Manager and CDH QuickStart Guide

This quick start guide describes how to quickly create a new installation of Cloudera Manager 5, CDH 5, and managed services on a cluster of four hosts. The resulting deployment can be used for demonstrations and proof of concept applications, but is not recommended for production.

 

Requirements

The four hosts in the cluster must satisfy the following requirements:
  • The hosts must have at least 10 GB RAM
  • You must have root or password-less sudo access to the hosts
  • If using root, the hosts must accept the same root password
  • The hosts must have Internet access to allow the wizard to install software from archive.cloudera.com
  • Run a supported OS:
    • RHEL-compatible
      • Red Hat Enterprise Linux and CentOS
        • 5.7, 64-bit
        • 6.4, 64-bit
        • 6.5 in SE Linux mode
        • 6.5, 64-bit
        • 6.6, 64-bit
      • Oracle Enterprise Linux with default kernel and Unbreakable Enterprise Kernel, 64-bit
        • 5.6 (UEK R2)
        • 6.4 (UEK R2)
        • 6.5 (UEK R2, UEK R3)
        • 6.6 (UEK R3)
    • SLES - SUSE Linux Enterprise Server 11, 64-bit. Service Pack 2 or later is required. The Updates repository must be active and SUSE Linux Enterprise Software Development Kit 11 SP1 is required.
    • Debian - Wheezy (7.0 and 7.1), 64-bit
    • Ubuntu - Trusty (14.04) and (Precise) 12.04, 64-bit
If your environment does not satisfy these requirements, the procedure described in this guide may not be appropriate for you. For information about other Cloudera Manager installation options and requirements, see Installing Cloudera Manager, CDH, and Managed Services.

Download and Run the Cloudera Manager Server Installer

  1. Download the Cloudera Manager installer binary from Cloudera Manager 5.4.0 Downloads to the cluster host where you want to install the Cloudera Manager Server.
    1. Click Download Cloudera Express or Download Cloudera Enterprise. See Cloudera Express and Cloudera Enterprise Features.
    2. Register and click Submit.
    3. Download the installer:
      wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
      
  2. Change cloudera-manager-installer.bin to have executable permission.
    $ chmod u+x cloudera-manager-installer.bin
    
  3. Run the Cloudera Manager Server installer.
    $ sudo ./cloudera-manager-installer.bin
    
  4. Read the Cloudera Manager README and then press Return or Enter to choose Next.
  5. Read the Cloudera Express License and then press Return or Enter to choose Next. Use the arrow keys and press Return or Enter to choose Yes to confirm you accept the license.
  6. Read the Oracle Binary Code License Agreement and then press Return or Enter to choose Next.
  7. When the installation completes, the complete URL provided for the Cloudera Manager Admin Console, including the port number, which is 7180 by default. Press Return or Enter to choose OK to continue.
  8. Press Return or Enter to choose OK to exit the installer.
  9. On RHEL 5 and CentOS 5, install Python 2.6 or 2.7. Download the appropriate repository rpm packages to the Cloudera Manager Server host and then install Python using yum. For example, use the following commands:
    $ su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm'
    ...
    $ yum install python26
    
  Note: If the installation is interrupted for some reason, you may need to clean up before you can re-run it. See Uninstalling Cloudera Manager and Managed Software.

Start the Cloudera Manager Admin Console

  1. Wait several minutes for the Cloudera Manager Server to complete its startup. To observe the startup process, run tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
  2. In a web browser, enter http://Server host:7180, where Server host is the fully qualified domain name or IP address of the host where the Cloudera Manager Server is running. The login screen for Cloudera Manager Admin Console displays.
  3. Log into Cloudera Manager Admin Console with the credentials: Username: admin Password: admin.

Install and Configure Software Using the Cloudera Manager Wizard

Installing and configuring Cloudera Manager, CDH, and managed service software on the cluster hosts involves the following three main steps.

Choose Cloudera Manager Edition and Specify Hosts

  1. Choose Cloudera Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed. The trial allows you to create all CDH and managed services supported by Cloudera Manager. Click Continue.
  2. Information is displayed indicating what edition of Cloudera Manager will be installed and the services you can choose from. Click Continue. The Specify hosts for your CDH cluster installation screen displays.
  3. Specify the four hosts on which to install CDH and managed services. You can specify hostnames and/or IP addresses and ranges, for example: 10.1.1.[1-4] or host[1-3].company.com. You can specify multiple addresses and address ranges by separating them by commas, semicolons, tabs, or blank spaces, or by placing them on separate lines.
  4. Click Search. Cloudera Manager identifies the hosts on your cluster. Verify that the number of hosts shown matches the number of hosts where you want to install services. Deselect host entries that do not exist and deselect the hosts where you do not want to install services. Click Continue. The Select Repository screen displays.

Install CDH and Managed Service Software

  1. Keep the default distribution method Use Parcels and the default version of CDH 5. Leave the Additional Parcels selections at None.
  2. For the Cloudera Manager Agent, keep the default Matched release for this Cloudera Manager Server. Click Continue. The JDK Installation Options screen displays.
  3. Select the Install Oracle Java SE Development Kit (JDK) checkbox to allow Cloudera Manager to install the JDK on each cluster host or uncheck if you plan to install it yourself. Leave the Install Java Unlimited Strength Encryption Policy Files checkbox deselected. Click Continue. The Enable Single User Mode screen displays.
  4. Leave the Single User Mode checkbox deselected and click Continue. The Provide SSH login credentials page displays.
  5. Specify host SSH login properties:
    1. Keep the default login root or enter the user name for an account that has password-less sudo permission.
    2. If you choose to use password authentication, enter and confirm the password.
  6. Click Continue. Cloudera Manager installs the Oracle JDK and the Cloudera Manager Agent packages on each host and starts the Agent.
  7. Click Continue. The Installing Selected Parcels screen displays. Cloudera Manager installs CDH. During the parcel installation, progress is indicated for the phases of the parcel installation process in separate progress bars. When the Continue button at the bottom of the screen turns blue, the installation process is completed.
  8. Click Continue. The Host Inspector runs to validate the installation, and provides a summary of what it finds, including all the versions of the installed components. Click Finish. The Cluster Setup screen displays.

Add and Configure Services

  1. Click the All Services radio button to create HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, Sqoop, HBase, Impala, Solr, Spark, and Key-Value Store Indexer services. Click Continue. The Customize Role Assignments screen displays.
  2. Configure the following role assignments:
    • Click the text field under the HBase Thrift Server role. In the host selection dialog that displays, select the checkbox next to any host and click OK at the bottom right.
    • Click the text field under the Server role of the ZooKeeper service. In the host selection dialog that displays, uncheck the checkbox next to the host assigned by default (the master host) and select checkboxes next to the remaining three hosts. Click OK at the bottom right.
    Click Continue. The Database Setup screen displays.
  3. Leave the default setting of Use Embedded Database to have Cloudera Manager create and configure all required databases in an embedded PostgreSQL database. Click Test Connection. When the test completes, click Continue. The Review Changes screen displays.
  4. Review the configuration changes to be applied. Click Continue. The Command Progress page displays.
  5. The wizard performs 32 steps to configure and starts the services. When the startup completes, click Continue.
  6. A success message displays indicating that the cluster has been successfully started. Click Finish to proceed to the Home page.

Test the Installation

The Home page looks something like this:

On the left side of the screen is a list of services currently running with their status information. All the services should be running with Good Health , however there may be a small number of configuration warnings indicated by a wrench icon and a number , which you can ignore.

You can click each service to view more detailed information about the service. You can also test your installation by running a MapReduce job or interacting with the cluster with a Hue application.

Running a MapReduce Job

  1. Log into a cluster host.
  2. Run the Hadoop PiEstimator example:
    sudo -u hdfs hadoop jar \ 
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
    pi 10 100
    
  3. View the result of running the job by selecting the following from the top navigation bar in the Cloudera Manager Admin Console: Clusters > Cluster 1 > Activities > YARN Applications. You will see an entry like the following:

Testing with Hue

A good way to test the cluster is by running a job. In addition, you can test the cluster by running one of the Hue web applications. Hue is a graphical user interface that allows you to interact with your clusters by running applications that let you browse HDFS, manage a Hive metastore, and run Hive, Impala, and Search queries, Pig scripts, and Oozie workflows.
  1. In the Cloudera Manager Admin Console Home page, click the Hue service.
  2. Click the Hue Web UI link, which opens Hue in a new window.
  3. Log in with the credentials, username: hdfs, password: hdfs.
  4. Choose an application in the navigation bar at the top of the browser window.

For more information, see the Hue User Guide.