The recommended tool for installing Cloudera Enterprise
This download installs Cloudera Enterprise or Cloudera Express.
Cloudera Enterprise requires a license; however, when installing Cloudera Express you will have the option to unlock Cloudera Enterprise features for a free 60-day trial.
Once the trial has concluded, the Cloudera Enterprise features will be disabled until you obtain and upload a license.
- System Requirements
- What's New
- Supported Operating Systems
- Supported JDK Versions
- Supported Browsers
- Supported Databases
- Supported CDH and Managed Service Versions
- Resource Requirements
- Networking and Security Requirements
Supported Operating Systems
Supported JDK Versions
Supported JDK Versions
Cloudera Manager supports Oracle JDK 1.7.0_75 and 1.8.0_40 when it's managing CDH 5.x, and Oracle JDK 1.6.0_31 and 1.7.0_75 when it's managing CDH 4.x. Cloudera Manager supports Oracle JDK 1.7.0_75 and 1.8.0_40 when it's managing both CDH 4.x and CDH 5.x clusters. Oracle JDK 1.6.0_31 and1.7.0_75 can be installed during the installation and upgrade. For further information, see Java Development Kit Installation.
The Cloudera Manager Admin Console, which you use to install, configure, manage, and monitor services, supports the following browsers:
- Mozilla Firefox 11 and higher
- Google Chrome
- Internet Explorer 9 and higher. Internet Explorer 11 Native Mode.
- Safari 5 and higher
Cloudera Manager requires several databases. The Cloudera Manager Server stores information about configured services, role assignments, configuration history, commands, users, and running processes in a database of its own. You must also specify a database for the Activity Monitor and Reports Manager management services.
Important: When processes restart, the configuration for each of the services is redeployed using information that is saved in the Cloudera Manager database. If this information is not available, your cluster will not start or function correctly. You must therefore schedule and maintain regular backups of the Cloudera Manager database in order to recover the cluster in the event of the loss of this database.
See Backing Up Databases.
The database you use must be configured to support UTF8 character set encoding. The embedded PostgreSQL database that is installed when you followInstallation Path A - Automated Installation by Cloudera Manager automatically provides UTF8 encoding. If you install a custom database, you may need to enable UTF8 encoding. The commands for enabling UTF8 encoding are described in each database topic under Cloudera Manager and Managed Service Data Stores.
After installing a database, upgrade to the latest patch version and apply any other appropriate updates. Available updates may be specific to the operating system on which it is installed.
Cloudera Manager and its supporting services can use the following databases:
- MySQL - 5.5 and 5.6
- Oracle 11gR2
- PostgreSQL - 8.4, 9.2, and 9.3
Cloudera supports the shipped version of MySQL and PostgreSQL for each supported Linux distribution. Each database is supported for all components in Cloudera Manager and CDH subject to the notes in CDH 4 Supported Databases and CDH 5 Supported Databases.
Supported CDH and Managed Service Versions
The following versions of CDH and managed services are supported:
Warning: Cloudera Manager 5 does not support CDH 3 and you cannot upgrade Cloudera Manager 4 to Cloudera Manager 5 if you have a cluster running CDH 3.Therefore, to upgrade CDH 3 clusters to CDH 4 using Cloudera Manager, you must use Cloudera Manager 4.
- CDH 4 and CDH 5. The latest released versions of CDH 4 and CDH 5 are strongly recommended. For information on CDH 4 requirements, see CDH 4 Requirements and Supported Versions. For information on CDH 5 requirements, see CDH 5 Requirements and Supported Versions.
- Cloudera Impala - Cloudera Impala is included with CDH 5. Cloudera Impala 1.2.1 with CDH 4.1.0 or later. For more information on Cloudera Impala requirements with CDH 4, see Cloudera Impala Requirements.
- Cloudera Search - Cloudera Search is included with CDH 5. Cloudera Search 1.2.0 with CDH 4.6.0. For more information on Cloudera Search requirements with CDH 4, see Cloudera Search Requirements.
- Apache Spark - 0.90 or later with CDH 4.4.0 or later.
- Apache Accumulo - 1.4.3 with CDH 4.3.0, 1.4.4 with CDH 4.5.0, and 1.6.0 with CDH 4.6.0.
For more information, see the Product Compatibility Matrix.
Cloudera Manager requires the following resources:
- Disk Space
- Cloudera Manager Server
- 5 GB on the partition hosting /var.
- 500 MB on the partition hosting /usr.
- For parcels, the space required depends on the number of parcels you download to the Cloudera Manager Server and distribute to Agent hosts. You can download multiple parcels of the same product, of different versions and builds. If you are managing multiple clusters, only one parcel of a product/version/build/distribution is downloaded on the Cloudera Manager Server—not one per cluster. In the local parcel repository on the Cloudera Manager Server, the approximate sizes of the various parcels are as follows:
- CDH 4.6 - 700 MB per parcel; CDH 5 (which includes Impala and Search) - 1.5 GB per parcel (packed), 2 GB per parcel (unpacked)
- Cloudera Impala - 200 MB per parcel
- Cloudera Search - 400 MB per parcel
- Cloudera Management Service -The Host Monitor and Service Monitor databases are stored on the partition hosting /var. Ensure that you have at least 20 GB available on this partition.For more information, see Data Storage for Monitoring Data.
- Agents - On Agent hosts each unpacked parcel requires about three times the space of the downloaded parcel on the Cloudera Manager Server. By default unpacked parcels are located in /opt/cloudera/parcels.
- Cloudera Manager Server
- RAM - 4 GB is recommended for most cases and is required when using Oracle databases. 2 GB may be sufficient for non-Oracle deployments with fewer than 100 hosts. However, to run the Cloudera Manager Server on a machine with 2 GB of RAM, you must tune down its maximum heap size (by modifying -Xmx in /etc/default/cloudera-scm-server). Otherwise the kernel may kill the Server for consuming too much RAM.
- Python - Cloudera Manager and CDH 4 require Python 2.4 or later, but Hue in CDH 5 and package installs of CDH 5 require Python 2.6 or 2.7. All supported operating systems include Python version 2.4 or later.
Networking and Security Requirements
The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:
- Cluster hosts must have a working network name resolution system and correctly formatted /etc/hosts file. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must
- Contain consistent information about hostnames and IP addresses across all hosts
- Not contain uppercase hostnames
- Not contain duplicate IP addresses
Also, do not use aliases, either in /etc/hosts or in configuring DNS. A properly formatted /etc/hosts file should be similar to the following example:
127.0.0.1 localhost.localdomain localhost
192.168.1.1 cluster-01.example.com cluster-01
192.168.1.2 cluster-02.example.com cluster-02
192.168.1.3 cluster-03.example.com cluster-03
- In most cases, the Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation or upgrade wizard. You must log in using a root account or an account that has password-less sudo permission. For authentication during the installation and upgrade procedures, you must either enter the password or upload a public and private key pair for the root or sudo user account. If you want to use a public and private key pair, the public key must be installed on the cluster hosts before you use Cloudera Manager.
Cloudera Manager uses SSH only during the initial install or upgrade. Once the cluster is set up, you can disable root SSH access or change the root password. Cloudera Manager does not save SSH credentials, and all credential information is discarded when the installation is complete. For more information, see Permission Requirements.
- If single user mode is not enabled, the Cloudera Manager Agent runs as root so that it can make sure the required directories are created and that processes and files are owned by the appropriate user (for example, the hdfs and mapred users).
- No blocking is done by Security-Enhanced Linux (SELinux).
- IPv6 must be disabled.
- No blocking by iptables or firewalls; port 7180 must be open because it is used to access Cloudera Manager after installation. Cloudera Manager communicates using specific ports, which must be open.
- For RedHat and CentOS, the /etc/sysconfig/network file on each host must contain the hostname you have just set (or verified) for that host.
- Cloudera Manager and CDH use several user accounts and groups to complete their tasks. The set of user accounts and groups varies according to the components you choose to install. Do not delete these accounts or groups and do not modify their permissions and rights. Ensure that no existing systems prevent these accounts and groups from functioning. For example, if you have scripts that delete user accounts not in a whitelist, add these accounts to the list of permitted accounts. Cloudera Manager, CDH, and managed services create and use the following accounts and groups:
Table 1. Users and Groups
|Unix User ID||Groups||Notes|
|Cloudera Manager (all versions)||cloudera-scm||cloudera-scm||Cloudera Manager processes such as the Cloudera Manager Server and the monitoring roles run as this user.
The Cloudera Manager keytab file must be named cmf.keytabsince that name is hard-coded in Cloudera Manager.
Note: Applicable to clusters managed by Cloudera Manager only.
|Apache Accumulo (Accumulo 1.4.3 and higher)||accumulo||accumulo||Accumulo processes run as this user.|
|Apache Avro||No special users.|
|Apache Flume (CDH 4, CDH 5)||flume||flume||The sink that writes to HDFS as this user must have write privileges.|
|Apache HBase (CDH 4, CDH 5)||hbase||hbase||The Master and the RegionServer processes run as this user.|
|HDFS (CDH 4, CDH 5)||hdfs||hdfs, hadoop||The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.|
|Apache Hive (CDH 4, CDH 5)||hive||hive||
The HiveServer2 process and the Hive Metastore processes run as this user.
A user must be defined for Hive access to its Metastore DB (e.g. MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This isjavax.jdo.option.ConnectionUserName in hive-site.xml.
|Apache HCatalog (CDH 4.2 and higher, CDH 5)||hive||hive||
The WebHCat service (for REST access to Hive functionality) runs as the hive user.
|HttpFS (CDH 4, CDH 5)||httpfs||httpfs||
The HttpFS service runs as this user. See HttpFS Security Configuration for instructions on how to generate the mergedhttpfs-http.keytab file.
|Hue (CDH 4, CDH 5)||hue||hue||
Hue services run as this user.
|Cloudera Impala (CDH 4.1 and higher, CDH 5)||impala||impala, hadoop, hdfs, hive||Impala services run as this user.|
|Apache Kafka (Cloudera Distribution of Kafka 1.2.0)||kafka||kafka||Kafka services run as this user.|
|Java KeyStore KMS (CDH 5.2.1 and higher)||kms||kms||The Java KeyStore KMS service runs as this user.|
|Key Trustee KMS (CDH 5.3 and higher)||kms||kms||The Key Trustee KMS service runs as this user.|
|Key Trustee Server (CDH 5.4 and higher)||keytrustee||keytrustee||The Key Trustee Server service runs as this user.|
|Llama (CDH 5)||llama||llama||Llama runs as this user.|
|Apache Mahout||No special users.|
|MapReduce (CDH 4, CDH 5)||mapred||mapred, hadoop||Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos.|
|Apache Oozie (CDH 4, CDH 5)||oozie||oozie||The Oozie service runs as this user.|
|Parquet||No special users.|
|Apache Pig||No special users.|
|Cloudera Search (CDH 4.3 and higher, CDH 5)||solr||solr||The Solr processes run as this user.|
|Apache Spark (CDH 5)||spark||spark||The Spark History Server process runs as this user.|
|Apache Sentry (incubating) (CDH 5.1 and higher)||sentry||sentry||The Sentry service runs as this user.|
|Apache Sqoop (CDH 4, CDH 5)||sqoop||sqoop||This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.|
|Apache Sqoop2 (CDH 4.2 and higher, CDH 5)||sqoop2||sqoop, sqoop2||The Sqoop2 service runs as this user.|
|Apache Whirr||No special users.|
|YARN (CDH 4, CDH 5)||yarn||yarn, hadoop||Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos.|
|Apache ZooKeeper (CDH 4, CDH 5)||zookeeper||zookeeper||The ZooKeeper processes run as this user. It is not configurable.|
Issues Fixed in Cloudera Manager 5.4.5
Cancel Impala Query attempts to connect via SSL despite SSL being disabled
In Impala queries, if you select Cancel for any query, you see a small "internal error" at the top of the query list. This occurs because an attempt to connect via SSL is performed even though Impala does not have SSL enabled.
Cloudera Manager displays a spurious validation warning about the Cloudera Management Service truststore
Cloudera Manager incorrectly warns that Cloudera Management Service daemons will use HTTPS for communication with either Cloudera Manager or CDH services, even if no Cloudera Management Service truststore is in use.
Aggregation of Work attributes
Cloudera Manager now correctly aggregates Work attributes such as Yarn applications or Impala query duration.
Hue Solr Indexer
Cloudera Manager now creates the correct configuration required to create a collection.
Cloudera Manager incorrectly reports “Not finalized” status for rolling upgrade
When performing a rolling upgrade from a version of CDH lower than 5.4 to CDH 5.4, and the HDFS rolling upgrade is finalized, Cloudera Manager incorrectly reports the status as not finalized. This is an error in reporting only and does not affect HDFS functionality.
Validation errors not visible from service-level configuration pages
Validation errors and warnings were only visible when accessing the individual instance-level configuration pages. This has been fixed.
Add Role Instances wizard does not work when initialized using the Cloudera Manager search box
You can now start the Add Role Instances wizard by searching for "<service name> Add Role" in the Cloudera Manager Admin Console search box.
Cloudera Manager now allows you to use '/' in cluster names
In previous versions, this resulted in problems during replication because '/' was treated as an URL path.
Expose HBase multi-WAL configuration properties in Cloudera Manager
The properties were added and are now being written to the hbase-site.xcml file.
Custom Kerberos principals now handled correctly during Solr startup
The Solr custom Kerberos principal is now initialized properly during Solr server startup.
Added a check in the Upgrade and Kerberos wizards to make sure Spark-standalone is not enabled
Spark Standalone does not work in clusters with Kerberos authentication. Spark on YARN supports Kerberos and is recommended over Spark Standalone. Either disable Kerberos or remove Spark Standalone before upgrading.
Fixed link for Reports when YARN high availability is enabled
The Reports link in HA mode would result in a 404 error.
Removed bogus failure when deploying client configuration
Deploying client configuration would sometimes fail because Cloudera Manager could not locate JAVA_HOME. This is not a valid failure because deploying client configuration does not require Java.
Added core-site.xml to Sentry's classpath
Previously, core-site.xml was only added to Sentry's configuration folder, but not the classpath.
Improved memory usage in serializing objects and writing them to support bundles
Performance improvements that require less memory were made for the creation of bundles.
New property to enable suppressing INFO-level log messages from NameNode
You can now use the NameNode Block State Change Logging Threshold property to suppress INFO-level block state change log messages from the NameNode.
Improved advice for clock offset health test
The way the health of the host's NTP daemon is determined was changed recently, which caused some cases where the related health test (host clock offset) failed without a warning. Information on this change was added to Cloudera Manager.
Cloudera Manager displays warning about using RHEL 6 with Transparent Huge Pages (THP)
The THP algorithm was broken in certain variants of RHEL 6.2 and above. Cloudera Manager now displays a warning if THP is enabled for all RHEL 6 and above.
Agent gets no logs if the last log4j event is larger than the max-size specified
If the byte_limit (max-size) specified by Cloudera Manager during log retrieval was smaller than the last log4j event to be collected, the Agent skipped the complete event and return nothing. This behavior was modified so pick the first N bytes (N = max-size) are picked from the log4j event and return a partial log4j event.
Agent log retrieval does not always honor timeouts
Cloudera Manager Agents no longer enter an infinite loop during log retrieval.
Cloudera Manager Agent missing log messages
The default timeout for displaying log entries (../logs/search and ../logs/context) has been increased to 60 seconds.
Fixed cross-site scripting vulnerability
A cross-site scripting vulnerability was discovered and fixed in Cloudera Manager.
New property added for ResourceManager high availability failover
The ZooKeeper session timeout property yarn.resourcemanager.zk-timeout-ms was added, and its default value is 1 minute.
Set maximum value for YARN mapreduce.jobhistory.max-age-ms to 10 years
Cloudera Manager would previously display a validation error when the value was greater than 60 days.
Added warning in upgrade wizard regarding dropped support for symlinks in CDH 5
This fix added a warning about removing HDFS symlinks when upgrading from CDH 4 to CDH 5.
Refresh Data Directories command no longer fails on secure clusters
The DataNodeRefreshCommand now sets SCM_KERBEROS_PRINCIPAL in the environment of the command process, which causes hdfs.sh to do a kinit. Before this change, a manual kinit was required.
New Sentry Synchronization Path Prefixes added in NameNode configuration are not enforced correctly
Any new path prefixes added in the NameNode configuration are not correctly enforced by Sentry. The ACLs are initially set correctly, however they would be reset to the old default after some time interval.
Workaround: Set the following property in Sentry Service Advanced Configuration Snippet (Safety Valve) and Hive Metastore Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml:
Fixed NullPointerException on health tests' Details page
The health tests Details page threw a NullPointerException because it was referring to a deprecated metric name.
Improved Service Monitor Canary check to see if HTable is disabled
Without this check, Service Monitor would fail due to too many ZooKeeper connection messages leaking into the Service Monitor log. This resulted in resource and allocation pressures on the Service Monitor.
Cloudera Manager no longer retains unnecessary references to HTables
Retaining too many unnecessary references to HTable was using up too much memory, especially when working with a large number of tables.
Sqoop 2 failure in Kerberized clusters fixed
Cloudera Manager was using the wrong authentication package and picking up the wrong configuration properties for Sqoop 2 authentication with Kerberos.
Fixed Solr server startup error
The Solr server would not start due to insufficient space for the shared memory file.
Added HBase Canary security configuration properties
Enabling the HBase canary on a secure cluster would fail. The new properties now let Cloudera Manager specify the canary's Kerberos principal and keytab in the hbase-site.xml deployed at the RegionServers.
Want to Get Involved or Learn More?
Check out our other resources
Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.