The recommended tool for installing Cloudera Enterprise
This download installs Cloudera Enterprise or Cloudera Express.
Cloudera Enterprise requires a license; however, when installing Cloudera Express you will have the option to unlock Cloudera Enterprise features for a free 60-day trial.
Once the trial has concluded, the Cloudera Enterprise features will be disabled until you obtain and upload a license.
- System Requirements
- What's New
- Supported Operating Systems
- Supported JDK Versions
- Supported Browsers
- Supported Databases
- Supported CDH and Managed Service Versions
- Supported Transport Layer Security Versions
- Resource Requirements
- Networking and Security Requirements
Supported Operating Systems
Supported JDK Versions
A supported minor JDK release will remain supported throughout a Cloudera major release lifecycle, from the time of its addition forward, unless specifically excluded.
Warning: JDK 1.8u40 and JDK 1.8u60 are excluded from support. Also, the Oozie Web Console returns 500 error when Oozie server runs on JDK 8u75 or higher.
Running CDH nodes within the same cluster on different JDK releases is not supported. JDK release across a cluster needs to match the patch level.
- All nodes in your cluster must run the same Oracle JDK version.
- All services must be deployed on the same Oracle JDK version.
The Cloudera Manager repository is packaged with Oracle JDK 1.7.0_67 (for example) and can be automatically installed during a new installation or an upgrade.
For a full list of supported JDK Versions please see CDH and Cloudera Manager Supported JDK Versions.
The Cloudera Manager Admin Console, which you use to install, configure, manage, and monitor services, supports the following browsers:
- Mozilla Firefox 24 and 31.
- Google Chrome 36 and higher.
- Internet Explorer 9 and higher. Internet Explorer 11 Native Mode.
- Safari 5 and higher.
Cloudera Manager requires several databases. The Cloudera Manager Server stores information about configured services, role assignments, configuration history, commands, users, and running processes in a database of its own. You must also specify a database for the Activity Monitor and Reports Manager roles.
Important: When you restart processes, the configuration for each of the services is redeployed using information saved in the Cloudera Manager database. If this information is not available, your cluster does not start or function correctly. You must schedule and maintain regular backups of the Cloudera Manager database to recover the cluster in the event of the loss of this database.
The database you use must be configured to support UTF8 character set encoding. The embedded PostgreSQL database installed when you follow Installation Path A - Automated Installation by Cloudera Manager (Non-Production Mode) automatically provides UTF8 encoding. If you install a custom database, you might need to enable UTF8 encoding. The commands for enabling UTF8 encoding are described in each database topic under Cloudera Manager and Managed Service Datastores.
After installing a database, upgrade to the latest patch version and apply any other appropriate updates. Available updates may be specific to the operating system on which it is installed.
Cloudera supports the shipped version of MariaDB, MySQL and PostgreSQL for each supported Linux distribution.
|Component||MariaDB||MySQL||SQLite||PostgreSQL||Oracle||Derby - see Note 5|
|Cloudera Manager||5.5, 10||5.6, 5.5, 5.1||–||9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1||12c, 11gR2|
|Oozie||5.5, 10||5.6, 5.5, 5.1||–||
9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1
See Note 3
|Flume||–||–||–||–||–||Default (for the JDBC Channel only)|
|Hue||5.5, 10||5.6, 5.5, 5.1
See Note 6
9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1
See Note 3
|Hive/Impala||5.5, 10||5.6, 5.5, 5.1
See Note 1
9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1
See Note 3
|Sentry||5.5, 10||5.6, 5.5, 5.1
See Note 1
9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1
See Note 3
|Sqoop 1||5.5, 10||See Note 4||–||See Note 4||See Note 4||–|
|Sqoop 2||5.5, 10||See Note 9||–||–||–||Default|
- Cloudera supports the databases listed above provided they are supported by the underlying operating system on which they run.
- MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and higher. The InnoDB storage engine must be enabled in the MySQL server.
- Cloudera Manager installation fails if GTID-based replication is enabled in MySQL.
- PostgreSQL 9.2 is supported on CDH 5.1 and higher. PostgreSQL 9.3 is supported on CDH 5.2 and higher. PostgreSQL 9.4 is supported on CDH 5.5 and higher.
- For purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
- Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation guide for recommendations.
- CDH 5 Hue requires the default MySQL version of the operating system on which it is being installed, which is usually MySQL 5.1, 5.5, or 5.6.
- When installing a JDBC driver, only the ojdbc6.jar file is supported for both Oracle 11g R2 and Oracle 12c; the ojdbc7.jar file is not supported.
- Sqoop 2 lacks some of the features of Sqoop 1. Cloudera recommends you use Sqoop 1. Use Sqoop 2 only if it contains all the features required for your use case.
- MariaDB 10 is supported only on CDH 5.9 and higher.
Supported CDH and Managed Service Versions
The following versions of CDH and managed services are supported:
Warning: Cloudera Manager 5 does not support CDH 3 and you cannot upgrade Cloudera Manager 4 to Cloudera Manager 5 if you have a cluster running CDH 3. Therefore, to upgrade CDH 3 clusters to CDH 4 using Cloudera Manager, you must use Cloudera Manager 4.
- CDH 4 and CDH 5. The latest released versions of CDH 4 and CDH 5 are strongly recommended. For information on CDH 4 requirements, see CDH 4 Requirements and Supported Versions. For information on CDH 5 requirements, see CDH 5 Requirements and Supported Versions.
- Cloudera Impala - Cloudera Impala is included with CDH 5. Cloudera Impala 1.2.1 with CDH 4.1.0 or higher. For more information on Impala requirements with CDH 4, see Impala Requirements.
- Cloudera Search - Cloudera Search is included with CDH 5. Cloudera Search 1.2.0 with CDH 4.6.0. For more information on Cloudera Search requirements with CDH 4, see Cloudera Search Requirements.
- Apache Spark - 0.90 or higher with CDH 4.4.0 or higher.
- Apache Accumulo - 1.4.3 with CDH 4.3.0, 1.4.4 with CDH 4.5.0, and 1.6.0 with CDH 4.6.0.
For more information, see the Product Compatibility Matrix.
Supported Transport Layer Security Versions
Cloudera Manager requires the following resources:
- Disk Space
- Cloudera Manager Server
- 5 GB on the partition hosting /var.
- 500 MB on the partition hosting /usr.
- For parcels, the space required depends on the number of parcels you download to the Cloudera Manager Server and distribute to Agent hosts. You can download multiple parcels of the same product, of different versions and different builds. If you are managing multiple clusters, only one parcel of a product/version/build/distribution is downloaded on the Cloudera Manager Server—not one per cluster. In the local parcel repository on the Cloudera Manager Server, the approximate sizes of the various parcels are as follows:
- CDH 5 (which includes Impala and Search) - 1.5 GB per parcel (packed), 2 GB per parcel (unpacked)
- Impala - 200 MB per parcel
- Cloudera Search - 400 MB per parcel
- Cloudera Management Service -The Host Monitor and Service Monitor databases are stored on the partition hosting /var. Ensure that you have at least 20 GB available on this partition.
- Agents - On Agent hosts, each unpacked parcel requires about three times the space of the downloaded parcel on the Cloudera Manager Server. By default, unpacked parcels are located in /opt/cloudera/parcels.
- Cloudera Manager Server
- RAM - 4 GB is recommended for most cases and is required when using Oracle databases. 2 GB might be sufficient for non-Oracle deployments with fewer than 100 hosts. However, to run the Cloudera Manager Server on a machine with 2 GB of RAM, you must tune down its maximum heap size (by modifying -Xmx in /etc/default/cloudera-scm-server). Otherwise the kernel might kill the Server for consuming too much RAM.
- Python - Cloudera Manager requires Python 2.4 or higher (but is not compatible with Python 3.0 or higher). Hue in CDH 5 and package installs of CDH 5 require Python 2.6 or 2.7. All supported operating systems include Python version 2.4 or higher. Cloudera Manager is compatible with Python 2.4 through the latest version of Python 2.x. Cloudera Manager does not support Python 3.0 and higher.
- Perl - Cloudera Manager requires perl.
Networking and Security Requirements
The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:
- CDH requires IPv4. IPv6 is not supported and must be disabled.
- Multihoming CDH or Cloudera Manager is not supported outside specifically certified Cloudera partner appliances. Cloudera finds that current Hadoop architectures combined with modern network infrastructures and security practices remove the need for multihoming. Multihoming, however, is beneficial internally in appliance form factors to take advantage of high-bandwidth InfiniBand interconnects.
- Although some subareas of the product might work with unsupported custom multihoming configurations, there are known issues with multihoming. In addition, unknown issues can arise because multihoming is not covered by the test matrix outside the Cloudera-certified partner appliances.
- Cluster hosts must have a working network name resolution system and correctly formatted /etc/hostsfile. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must:
- Contain consistent information about hostnames and IP addresses across all hosts
- Not contain uppercase hostnames
- Not contain duplicate IP addresses
Cluster hosts must not use aliases, either in /etc/hosts or in configuring DNS. A properly formatted /etc/hosts file should be similar to the following example:
127.0.0.1 localhost.localdomain localhost
192.168.1.1 cluster-01.example.com cluster-01
192.168.1.2 cluster-02.example.com cluster-02
192.168.1.3 cluster-03.example.com cluster-03
- In most cases, the Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation or upgrade wizard. You must log in using a root account or an account that has password-less sudo permission. For authentication during the installation and upgrade procedures, you must either enter the password or upload a public and private key pair for the root or sudo user account. If you want to use a public and private key pair, the public key must be installed on the cluster hosts before you use Cloudera Manager.
Cloudera Manager uses SSH only during the initial install or upgrade. Once the cluster is set up, you can disable root SSH access or change the root password. Cloudera Manager does not save SSH credentials, and all credential information is discarded when the installation is complete.
- If single user mode is not enabled, the Cloudera Manager Agent runs as root so that it can make sure the required directories are created and that processes and files are owned by the appropriate user (for example, the hdfs and mapred users).
- No blocking is done by Security-Enhanced Linux (SELinux).Note: Cloudera Enterprise is supported on platforms with Security-Enhanced Linux (SELinux) enabled. However, Cloudera does not support use of SELinux with Cloudera Navigator. Cloudera is not responsible for policy support nor policy enforcement. If you experience issues with SELinux, contact your OS provider.
- No blocking by iptables or firewalls; port 7180 must be open because it is used to access Cloudera Manager after installation. Cloudera Manager communicates using specific ports, which must be open.
- For RHEL and CentOS, the /etc/sysconfig/network file on each host must contain the hostname you have just set (or verified) for that host.
- Cloudera Manager and CDH use several user accounts and groups to complete their tasks. The set of user accounts and groups varies according to the components you choose to install. Do not delete these accounts or groups and do not modify their permissions and rights. Ensure that no existing systems prevent these accounts and groups from functioning. For example, if you have scripts that delete user accounts not in a whitelist, add these accounts to the list of permitted accounts. Cloudera Manager, CDH, and managed services create and use the following accounts and groups:
Users and Groups
|Unix User ID||Groups||Notes|
|Cloudera Manager (all versions)||cloudera-scm||cloudera-scm||Cloudera Manager processes such as the Cloudera Manager Server and the monitoring roles run as this user.
The Cloudera Manager keytab file must be named cmf.keytab since that name is hard-coded in Cloudera Manager.Note: Applicable to clusters managed by Cloudera Manager only.
|Apache Accumulo (Accumulo 1.4.3 and higher)||accumulo||accumulo||Accumulo processes run as this user.|
|Apache Avro||No special users.|
|Apache Flume (CDH 4, CDH 5)||flume||flume||The sink that writes to HDFS as this user must have write privileges.|
|Apache HBase (CDH 4, CDH 5)||hbase||hbase||The Master and the RegionServer processes run as this user.|
|HDFS (CDH 4, CDH 5)||hdfs||hdfs, hadoop||The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.|
|Apache Hive (CDH 4, CDH 5)||hive||hive||
The HiveServer2 process and the Hive Metastore processes run as this user.
A user must be defined for Hive access to its Metastore DB (for example, MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml.
|Apache HCatalog (CDH 4.2 and higher, CDH 5)||hive||hive||
The WebHCat service (for REST access to Hive functionality) runs as the hive user.
|HttpFS (CDH 4, CDH 5)||httpfs||httpfs||
The HttpFS service runs as this user. See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.
|Hue (CDH 4, CDH 5)||hue||hue||
Hue services run as this user.
|Hue Load Balancer (Cloudera Manager 5.5 and higher)||apache||apache||The Hue Load balancer has a dependency on the apache2 package that uses the apache user name. Cloudera Manager does not run processes using this user ID.|
|Cloudera Impala (CDH 4.1 and higher, CDH 5)||impala||impala, hive||Impala services run as this user.|
|Apache Kafka (Cloudera Distribution of Kafka 1.2.0)||kafka||kafka||Kafka services run as this user.|
|Java KeyStore KMS (CDH 5.2.1 and higher)||kms||kms||The Java KeyStore KMS service runs as this user.|
|Key Trustee KMS (CDH 5.3 and higher)||kms||kms||The Key Trustee KMS service runs as this user.|
|Key Trustee Server (CDH 5.4 and higher)||keytrustee||keytrustee||The Key Trustee Server service runs as this user.|
|Kudu||kudu||kudu||Kudu services run as this user.|
|Llama (CDH 5)||llama||llama||Llama runs as this user.|
|Apache Mahout||No special users.|
|MapReduce (CDH 4, CDH 5)||mapred||mapred, hadoop||Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos.|
|Apache Oozie (CDH 4, CDH 5)||oozie||oozie||The Oozie service runs as this user.|
|Parquet||No special users.|
|Apache Pig||No special users.|
|Cloudera Search (CDH 4.3 and higher, CDH 5)||solr||solr||The Solr processes run as this user.|
|Apache Spark (CDH 5)||spark||spark||The Spark History Server process runs as this user.|
|Apache Sentry (CDH 5.1 and higher)||sentry||sentry||The Sentry service runs as this user.|
|Apache Sqoop (CDH 4, CDH 5)||sqoop||sqoop||This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.|
|Apache Sqoop2 (CDH 4.2 and higher, CDH 5)||sqoop2||sqoop, sqoop2||The Sqoop2 service runs as this user.|
|Apache Whirr||No special users.|
|YARN (CDH 4, CDH 5)||yarn||yarn, hadoop||Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos.|
|Apache ZooKeeper (CDH 4, CDH 5)||zookeeper||zookeeper||The ZooKeeper processes run as this user. It is not configurable.|
Fixes an issue where an ArrayIndexOutOfBoundsException can be throw by Reports Manager.
Fixed in Cloudera Manager 5.12, 5.9.3
Fixed an issue when the Oozie Load Balancer field in the Oozie configuration is not set and Oozie high availability is in use (two or more Oozie Server roles are reployed). This now creates a validation error and Cloudera Manager does not let you start or restart Oozie until the problem is fixed.
Fixed in: Cloudera Manager 5.12, 5.10.2, 5.9.3
Sometimes on a large or busy cluster, collection of diagnostic data using the By Date Range option can fail due to a timeout during estimation step of the diagnostic bundle collection. You can now configure both the host level estimation timeout and the overall estimation timeout using Java options. These options can be set on the Cloudera Manager Server host I in the /etc/default/cloudera-scm-server file by adding the following to the line that begins with export CMF_JAVA_OPTS:
- -Dcom.cloudera.RoleLogEstimator.maxEstimateTimeoutSeconds=number of seconds (If not specified, the default is 90 seconds)
- -Dcom.cloudera.RoleLogEstimator.estimateTimeoutPerHostSeconds=number of seconds (If not specified, default is 60 seconds)
The value of -Dcom.cloudera.RoleLogEstimator.maxEstimateTimeoutSeconds=number of seconds must be >= the value of -Dcom.cloudera.RoleLogEstimator.estimateTimeoutPerHostSeconds=number of seconds.
Restart the Cloudera Manager server for the updated Java flags to take effect:
sudo service cloudera-scm-server restart
Fixed in: Cloudera Manager 5.12, 5.10.2, 5.9.3
For CDH versions 5.8 and higher, the Low Watermark for Memstore Flush configuration parameter is associated with the HBase parameter hbase.regionserver.global.memstore.lowerLimit.
This value represents the fullness threshold of the memstore as a percentage of memstore capacity. The default value for this parameter was incorrectly set too low at .38. This can cause severe under utilization of the memstore.
The default has been corrected to be .95. When upgrading to a version of Cloudera Manager with this fix, if the value was previously set to the old default of .38, it will automatically be increased to the new default, which may cause Cloudera Manager to mark your HBase service as having a stale configuration, requiring a restart.
Additionally, if an existing Low Watermark for Memstore Flush configuration parameter has a value <= .9, it will be flagged as a configuration warning.
Fixed in: Cloudera Manager 5.12, 5.11.1, 5.10.2, 5.9.3, 5.8.5
Fixed how null values for the maintenanceOwners parameter are handled when creating clusters with the Cloudera Manager API.
Fixed an issue in the stop command execution for CSD services authored to use a service-level graceful shutdown. The stop command could be shown with the second step (forced kill) marked as failed when all roles were already stopped. This issue also affects shutdown of Kafka using Cloudera Manager 5.11.0 and can impact Kafaka upgrades.
Fixed in: Cloudera Manager 5.11.1, 5.10.2, 5.9.3
Fixed an issue where if Sentry crashed with an OutOfMemory error, the generated dump file name was not unique, and therefore the file could be overwritten on subsequent occurrences of the error. This has been fixed to include the PID in the dump file name, which is the standard practice for other CDH services.
BDR Replication Host Selection Policy has been updated. The process that launches and coordinates a HDFS/Hive replication job will now only run on the following hosts:
- Hosts that run any role of the HDFS/Hive Service (for HDFS or Hive replication)
- Hosts that have a Non-Gateway role
- Hosts where the health status is in the GOOD or CONCERNING state with preference given to GOOD
- Hosts that are whitelisted, if configured
Fixed an issue where secret data, including passwords and other data, can be exposed in the /var/run/cloudera-scm-agent/process-name/proc.json or /var/run/cloudera-scm-agent/process-name/config.zipfiles because these files are world readable. See TSB-235 for more information.
Fixed in: Cloudera Manager 5.12, 5.11, 5.10.2, 5.9.3
Fixed an issue where the Impala Peak Memory Usage page in the Cluster Utilization Report did not display any data in Cloudera Manager versions 5.9 to 5.11.
Fixed in: Cloudera Manager 5.12, 5.9.3
The recommended value for Max Message Size for Hive MetaStore configuration parameter should be at least 10% of the value for the Java Heap Size of Hive Metastore Server in Bytes parameter, but should never exceed 2147483647B. Previously, the validator for Max Message Size for Hive MetaStore was showing an incorrect value in its validation message. This validator is now fixed to show the correct recommended value.
Fixed a bug that broke backwards compatibility with Cloudera Manager API version 11 (introduced with the Cloudera Manager 5.6 release) for the following endpoint:
Want to Get Involved or Learn More?
Check out our other resources
Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.