The recommended tool for installing Cloudera Enterprise
This download installs Cloudera Enterprise or Cloudera Express.
Cloudera Enterprise requires a license; however, when installing Cloudera Express you will have the option to unlock Cloudera Enterprise features for a free 60-day trial.
Once the trial has concluded, the Cloudera Enterprise features will be disabled until you obtain and upload a license.
- System Requirements
- What's New
- Supported Operating Systems
- Supported JDK Versions
- Supported Browsers
- Supported Databases
- Supported CDH and Managed Service Versions
- Resource Requirements
- Networking and Security Requirements
Supported Operating Systems
Supported JDK Versions
The version of Oracle JDK supported by Cloudera Manager depends on the version of CDH that is being managed. The following table lists the JDK versions supported on a Cloudera Manager 5.5 cluster running the latest CDH 4 and CDH 5. For further information on supported JDK versions for previous versions of Cloudera Manager and CDH, see JDK Compatibility.
Important: There is one exception to the minimum supported and recommended JDK versions in the following table. If Oracle releases a security patch that affects server-side Java before the next minor release of Cloudera products, the Cloudera support policy covers customers using the patch.
|CDH Version Managed (Latest)||Minimum Supported JDK Version||Recommended JDK Version|
Cloudera recommends that you not use JDK 1.8.0_40.
|CDH 4 and CDH 5||1.7.0_55||1.7.0_80|
Cloudera Manager can install Oracle JDK 1.7.0_67 during installation and upgrade. If you prefer to install the JDK yourself, follow the instructions in Java Development Kit Installation.
The Cloudera Manager Admin Console, which you use to install, configure, manage, and monitor services, supports the following browsers:
- Mozilla Firefox 24 and 31.
- Google Chrome.
- Internet Explorer 9 and higher. Internet Explorer 11 Native Mode.
- Safari 5 and higher.
Cloudera Manager requires several databases. The Cloudera Manager Server stores information about configured services, role assignments, configuration history, commands, users, and running processes in a database of its own. You must also specify a database for the Activity Monitor and Reports Manager roles.
Important: When processes restart, the configuration for each of the services is redeployed using information that is saved in the Cloudera Manager database. If this information is not available, your cluster will not start or function correctly. You must therefore schedule and maintain regular backups of the Cloudera Manager database in order to recover the cluster in the event of the loss of this database. See Backing Up Databases.
The database you use must be configured to support UTF8 character set encoding. The embedded PostgreSQL database that is installed when you follow Installation Path A - Automated Installation by Cloudera Manager automatically provides UTF8 encoding. If you install a custom database, you may need to enable UTF8 encoding. The commands for enabling UTF8 encoding are described in each database topic under Cloudera Manager and Managed Service Data Stores.
After installing a database, upgrade to the latest patch version and apply any other appropriate updates. Available updates may be specific to the operating system on which it is installed.
Cloudera Manager and its supporting services can use the following databases:
- MariaDB 5.5
- MySQL - 5.5 and 5.6
- Oracle 11gR2 and 12c
- PostgreSQL - 9.2, 9.3, and 9.4
Cloudera supports the shipped version of MariaDB, MySQL and PostgreSQL for each supported Linux distribution. Each database is supported for all components in Cloudera Manager and CDH subject to the notes in CDH 4 Supported Databases and CDH 5 Supported Databases.
Supported CDH and Managed Service Versions
The following versions of CDH and managed services are supported:
Warning: Cloudera Manager 5 does not support CDH 3 and you cannot upgrade Cloudera Manager 4 to Cloudera Manager 5 if you have a cluster running CDH 3. Therefore, to upgrade CDH 3 clusters to CDH 4 using Cloudera Manager, you must use Cloudera Manager 4.
- CDH 4 and CDH 5. The latest released versions of CDH 4 and CDH 5 are strongly recommended. For information on CDH 4 requirements, see CDH 4 Requirements and Supported Versions. For information on CDH 5 requirements, see CDH 5 Requirements and Supported Versions.
- Cloudera Impala - Cloudera Impala is included with CDH 5. Cloudera Impala 1.2.1 with CDH 4.1.0 or later. For more information on Cloudera Impala requirements with CDH 4, seeCloudera Impala Requirements.
- Cloudera Search - Cloudera Search is included with CDH 5. Cloudera Search 1.2.0 with CDH 4.6.0. For more information on Cloudera Search requirements with CDH 4, see Cloudera Search Requirements.
- Apache Spark - 0.90 or later with CDH 4.4.0 or later.
- Apache Accumulo - 1.4.3 with CDH 4.3.0, 1.4.4 with CDH 4.5.0, and 1.6.0 with CDH 4.6.0.
For more information, see the Product Compatibility Matrix.
Cloudera Manager requires the following resources:
- Disk Space
- Cloudera Manager Server
- 5 GB on the partition hosting /var.
- 500 MB on the partition hosting /usr.
- For parcels, the space required depends on the number of parcels you download to the Cloudera Manager Server and distribute to Agent hosts. You can download multiple parcels of the same product, of different versions and builds. If you are managing multiple clusters, only one parcel of a product/version/build/distribution is downloaded on the Cloudera Manager Server—not one per cluster. In the local parcel repository on the Cloudera Manager Server, the approximate sizes of the various parcels are as follows:
- CDH 4.6 - 700 MB per parcel; CDH 5 (which includes Impala and Search) - 1.5 GB per parcel (packed), 2 GB per parcel (unpacked)
- Cloudera Impala - 200 MB per parcel
- Cloudera Search - 400 MB per parcel
- Cloudera Management Service -The Host Monitor and Service Monitor databases are stored on the partition hosting /var. Ensure that you have at least 20 GB available on this partition. For more information, see Data Storage for Monitoring Data.
- Agents - On Agent hosts each unpacked parcel requires about three times the space of the downloaded parcel on the Cloudera Manager Server. By default unpacked parcels are located in /opt/cloudera/parcels.
- Cloudera Manager Server
- RAM - 4 GB is recommended for most cases and is required when using Oracle databases. 2 GB may be sufficient for non-Oracle deployments with fewer than 100 hosts. However, to run the Cloudera Manager Server on a machine with 2 GB of RAM, you must tune down its maximum heap size (by modifying -Xmx in /etc/default/cloudera-scm-server). Otherwise the kernel may kill the Server for consuming too much RAM.
- Python - Cloudera Manager and CDH 4 require Python 2.4 or later, but Hue in CDH 5 and package installs of CDH 5 require Python 2.6 or 2.7. All supported operating systems include Python version 2.4 or later.
- Perl - Cloudera Manager requires perl.
Networking and Security Requirements
The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:
- Cluster hosts must have a working network name resolution system and correctly formatted /etc/hosts file. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must
- Contain consistent information about hostnames and IP addresses across all hosts
- Not contain uppercase hostnames
- Not contain duplicate IP addresses
Also, do not use aliases, either in /etc/hosts or in configuring DNS. A properly formatted /etc/hosts file should be similar to the following example:
127.0.0.1 localhost.localdomain localhost
192.168.1.1 cluster-01.example.com cluster-01
192.168.1.2 cluster-02.example.com cluster-02
192.168.1.3 cluster-03.example.com cluster-03
- In most cases, the Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation or upgrade wizard. You must log in using a root account or an account that has password-less sudo permission. For authentication during the installation and upgrade procedures, you must either enter the password or upload a public and private key pair for the root or sudo user account. If you want to use a public and private key pair, the public key must be installed on the cluster hosts before you use Cloudera Manager.
Cloudera Manager uses SSH only during the initial install or upgrade. Once the cluster is set up, you can disable root SSH access or change the root password. Cloudera Manager does not save SSH credentials, and all credential information is discarded when the installation is complete. For more information, see Permission Requirements for Package-based Installations and Upgrades of CDH.
- If single user mode is not enabled, the Cloudera Manager Agent runs as root so that it can make sure the required directories are created and that processes and files are owned by the appropriate user (for example, the hdfs and mapred users).
- No blocking is done by Security-Enhanced Linux (SELinux).
Important: Cloudera Enterprise is supported on platforms with Security-Enhanced Linux (SELinux) enabled. However, policies need to be provided by other parties or created by the administrator of the cluster deployment. Cloudera is not responsible for policy support nor policy enforcement, nor for any issues with such. If you experience issues with SELinux, contact your OS support provider.
- IPv6 must be disabled.
- No blocking by iptables or firewalls; port 7180 must be open because it is used to access Cloudera Manager after installation. Cloudera Manager communicates using specific ports, which must be open.
- For RHEL and CentOS, the /etc/sysconfig/network file on each host must contain the hostname you have just set (or verified) for that host.
- Cloudera Manager and CDH use several user accounts and groups to complete their tasks. The set of user accounts and groups varies according to the components you choose to install. Do not delete these accounts or groups and do not modify their permissions and rights. Ensure that no existing systems prevent these accounts and groups from functioning. For example, if you have scripts that delete user accounts not in a whitelist, add these accounts to the list of permitted accounts. Cloudera Manager, CDH, and managed services create and use the following accounts and groups:
Table 2. Users and Groups
|Unix User ID||Groups||Notes|
|Cloudera Manager (all versions)||cloudera-scm||cloudera-scm||Cloudera Manager processes such as the Cloudera Manager Server and the monitoring roles run as this user.
The Cloudera Manager keytab file must be named cmf.keytab since that name is hard-coded in Cloudera Manager.
Note: Applicable to clusters managed by Cloudera Manager only.
|Apache Accumulo (Accumulo 1.4.3 and higher)||accumulo||accumulo||Accumulo processes run as this user.|
|Apache Avro||No special users.|
|Apache Flume (CDH 4, CDH 5)||flume||flume||The sink that writes to HDFS as this user must have write privileges.|
|Apache HBase (CDH 4, CDH 5)||hbase||hbase||The Master and the RegionServer processes run as this user.|
|HDFS (CDH 4, CDH 5)||hdfs||hdfs, hadoop||The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.|
|Apache Hive (CDH 4, CDH 5)||hive||hive||
The HiveServer2 process and the Hive Metastore processes run as this user.
A user must be defined for Hive access to its Metastore DB (e.g. MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This isjavax.jdo.option.ConnectionUserName in hive-site.xml.
|Apache HCatalog (CDH 4.2 and higher, CDH 5)||hive||hive||
The WebHCat service (for REST access to Hive functionality) runs as the hiveuser.
|HttpFS (CDH 4, CDH 5)||httpfs||httpfs||
The HttpFS service runs as this user. See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.
|Hue (CDH 4, CDH 5)||hue||hue||
Hue services run as this user.
|Cloudera Impala (CDH 4.1 and higher, CDH 5)||impala||impala, hadoop, hive||Impala services run as this user.|
|Apache Kafka (Cloudera Distribution of Kafka 1.2.0)||kafka||kafka||Kafka services run as this user.|
|Java KeyStore KMS (CDH 5.2.1 and higher)||kms||kms||The Java KeyStore KMS service runs as this user.|
|Key Trustee KMS (CDH 5.3 and higher)||kms||kms||The Key Trustee KMS service runs as this user.|
|Key Trustee Server (CDH 5.4 and higher)||keytrustee||keytrustee||The Key Trustee Server service runs as this user.|
|Kudu||kudu||kudu||Kudu services run as this user.|
|Llama (CDH 5)||llama||llama||Llama runs as this user.|
|Apache Mahout||No special users.|
|MapReduce (CDH 4, CDH 5)||mapred||mapred, hadoop||Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos.|
|Apache Oozie (CDH 4, CDH 5)||oozie||oozie||The Oozie service runs as this user.|
|Parquet||No special users.|
|Apache Pig||No special users.|
|Cloudera Search (CDH 4.3 and higher, CDH 5)||solr||solr||The Solr processes run as this user.|
|Apache Spark (CDH 5)||spark||spark||The Spark History Server process runs as this user.|
|Apache Sentry (incubating) (CDH 5.1 and higher)||sentry||sentry||The Sentry service runs as this user.|
|Apache Sqoop (CDH 4, CDH 5)||sqoop||sqoop||This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.|
|Apache Sqoop2 (CDH 4.2 and higher, CDH 5)||sqoop2||sqoop, sqoop2||The Sqoop2 service runs as this user.|
|Apache Whirr||No special users.|
|YARN (CDH 4, CDH 5)||yarn||yarn, hadoop||Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos.|
|Apache ZooKeeper (CDH 4, CDH 5)||zookeeper||zookeeper||The ZooKeeper processes run as this user. It is not configurable.|
What's New in Cloudera Manager 5.5.0
- Operating Systems - Support for RHEL/CentOS 6.6 (in SE Linux mode), 6.7, and 7.1, and Oracle Enterprise Linux 7.1.
Important: Cloudera supports RHEL 7 with the following limitations:
- Only RHEL 7.1 is supported. RHEL 7.0 is not supported.
- Only a new installation of RHEL 7.1 is supported. Upgrades from RHEL 6 to RHEL 7.1 are not supported. For more information, see Does Red Hat support upgrades between major versions of Red Hat Enterprise Linux?
- Navigator Encrypt is not supported on RHEL 7.1.
- Databases - Supports MariaDB 5.5, Oracle 12c, and PostgreSQL 9.4.
- Selective service restart after activating parcels is supported.
- Retrying upgrade actions is supported. If a cluster upgrade command fails while in progress, you can retry a command after fixing the cause of failure. On retry, the command restarts from the command step where it failed.
- The command details page for running and recent commands has been redesigned for usability and scalability.
- Instead of serially starting all services for the first time, services that are not dependent are started in parallel. This decreases the time required to start services for the first time after creating a cluster.
- Performance has improved for service startup, client configuration deployment, and calculation of stale configurations.
- Suppression of notifications
- You can suppress the warnings that Cloudera Manager issues when a configuration value is outside the recommended range or is invalid. See Suppressing Configuration and Parameter Validation Warnings.
- You can suppress health test warnings. See Suppressing Health Test Results.
Suppression can be useful if a warning does not apply to your deployment and you no longer want to see the notification. Suppressed warnings are still retained by Cloudera Manager, and you can unsuppress the warnings at any time.
- Multi Cloudera Manager Dashboard - A special mode of Cloudera Manager that enables you to view monitoring data aggregated from multiple Cloudera Manager instances that manage one or more CDH clusters. See Monitoring Multiple CDH Deployments Using the Multi Cloudera Manager Dashboard.
- You can decommission roles when services are completely stopped. This allows you to decommission hosts during cluster downtime.
- You can disable collection of certain domain metrics—for example, for HBase RegionServers, Kafka Brokers, and others—through new settings in the host advanced configuration snippet. This is useful in certain support situations and should only be done under the direction of Cloudera Support.
- You can configure which aggregate metrics are automatically generated. This advanced feature can be useful in certain situations to impact the monitoring workload, allowing unused or less-important aggregate metrics to be skipped. This may result in improved performance and the ability to handle larger monitoring workloads, or to retain data for a larger workload for longer. Cloudera recommends using this only under the direction of Cloudera Support.
- Alert Publisher can be configured to pass alert events to a user-defined script. Use this for integrating with other alerting systems or for custom logic (for example, to send some alerts to some people and others to other people).
- Agent minor version mismatches (5.4 to 5.5) now cause bad host health. Maintenance version mismatches (for example, 5.4.x to 5.4.y) still cause concerning host health.
- Cloudera Manager indicates if the Java version in use is too old.
- Cloudera Manager indicates if the supervisor component of the Agent needs to be restarted after an upgrade.
- Full and User Administrators can view active user sessions. See Viewing User Sessions.
- Full Administrators and Auditors can audit failed and successful logins.
- Multiple user session logins can be disallowed.
- You can configure external authentication so that local administrator emergency access is disabled. This means that no local accounts can log in under any circumstances, including when the external system is not functioning.
- You can turn on authentication for the URLs for downloading client configuration zip files. Previously, authentication was never required.
- Passwords are no longer accessible in cleartext through the Cloudera Manager UI or in the configuration files stored on disk. See Password Redaction. There are some exceptions; see Known Issues and Workarounds in Cloudera Manager 5.
- Use a configuration option in HBase to skip region reload during rolling restart and rolling upgrade, to increase the speed of the operations.
- HBase rolling restart performance can be improved by increasing the number of Region Mover Threads. If the value of this property is 1, it can lower rolling restart speed. The Admin Console now displays this information and, if the value is 1, advises increasing it.
- HBase Thrift Server and Rest Server support TLS/SSL.
- HDFS encryption can be enabled using a wizard. See Enabling HDFS Encryption Using the Wizard.
- AES is an encryption option for HDFS RPC encryption.
- Hive can use TLS/SSL and Kerberos at the same time.
- When Hive is configured to use TLS/SSL, Hue is automatically configured to use that protocol when communicating with Hive. Similarly, when Impala is configured to use TLS/SSL, Hue is automatically configured to use that protocol when communicating with Impala.
- HiveServer2 supports a timeout value for idle sessions and operations. By default, it times out client sessions after a week and idle operations after three days. This helps alleviate problems with long-running sessions when using Hue.
- Cloudera Manager collects and displays various operational metrics for Hive.
- Hue supports a Load Balancer role using HTTPD as a load balancer.
- You can configure certificates trusted by Hue using the TLS/SSL Truststore configuration. This replaces the REQUESTS_CA_BUNDLE advanced configuration snippet entry.
- You can specify a password that protects the Hue private key file.
- Cloudera Manager collects and displays various operational metrics for Hue. New health tests have been added for Hue as well.
- Impala supports TLS/SSL internally between the StateStore and the Catalog Server roles as well as Impala Daemon.
- Enabled rolling restart.
- Extended broker metric coverage.
- Exposed more commonly configured parameters.
- Updated existing parameters; reviewing defaults, descriptions, and validations.
- The broker instance list now shows which broker is the active controller.
- Key Trustee
- The Key Trustee Server CSD is included in Cloudera Manager. Manual installation of the Key Trustee Server CSD is not required.
- A Key Administrator role in Cloudera Manager is used for configuring HDFS Data at Rest Encryption. Only a Key Administrator and a Full Administrator can make configuration changes to Java Keystore KMS, Key Trustee KMS, and Key Trustee Server. Configuring HDFS to use Data at Rest Encryption is also limited to the Key Administrator and Full Administrator roles. This allows organizations to keep Key Administrators and Cluster Administrators separate, which is a security best practice.
- When running Key Trustee KMS in a highly available configuration, Cloudera Manager can automatically generate the load-balancer URL.
- Sentry introduces column-level access control for tables in Hive and Impala. Previously, Sentry supported privilege granularity only at the table level. You can now assign the SELECT privilege on a subset of columns in a table. See Hive SQL Syntax for Use with Sentry.
- Sentry supports Kerberos authentication for the Sentry web server. See, Using the Sentry Web Server.
- Solr can be configured with a load balancer in a secure environment.
- There is a new Solr Max Connector Threads property for Solr Server in CDH 5.1.0 and higher.
- Solr supports LDAP/AD authentication.
- Backup and Disaster Recovery
- The user interface for scheduling and reviewing replications and snapshots has been improved. You can now view the history of replication jobs and subtasks more easily.See Viewing Replication History.
- When specifying an HDFS replication job, you can apply exclusion filters to exclude specific files or directories. See Configuring Replication of HDFS Data.
- You can download or send to Cloudera Support a diagnostic bundle to troubleshoot replication jobs. Bundles include logs of the replication run. See Viewing Replication Schedules.
- The performance of the file-listing phase of a replication job has been improved.
- The performance of the initialization and running phase has been improved.
- The following advanced configuration snippets for configuring replications have been added:
- HDFS Replication Advanced Configuration Snippet (Safety Valve) for hadoop-env.sh
- Hive Replication Advanced Configuration Snippet (Safety Valve) for hive-site.xml
- HDFS Replication Advanced Configuration Snippet (Safety Valve) for yarn-site.xml
- HDFS Replication Advanced Configuration Snippet (Safety Valve) for mapred-site.xml
- Snapshot properties for HBase, such as thread pool size, can be configured in the HBase Client Advanced Configuration Snippet (Safety Valve) for hbase-site.xmlproperty.
- Hive partitions are chunked during export and import to avoid message size limitations.
- Hive replications validate metadata on the destination Hive Metastore before copying HDFS data from the source to avoid copying errors during replication.
- The use of snapshots to improve replications is documented. See Using Snapshots with Replication.
- The effect of network latency on replications is documented. See Network Latency and Replication.
- Scheduled snapshots can be disabled and re-enabled.
- API improvements:
- Explicit support for pausing snapshot policies
- Failed file listing
- Collection of diagnostic bundles for replication schedules and history
Want to Get Involved or Learn More?
Check out our other resources
Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.