The recommended tool for installing Cloudera Enterprise
This download installs Cloudera Enterprise or Cloudera Express.
Cloudera Enterprise requires a license; however, when installing Cloudera Express you will have the option to unlock Cloudera Enterprise features for a free 60-day trial.
Once the trial has concluded, the Cloudera Enterprise features will be disabled until you obtain and upload a license.
- System Requirements
- What's New
- Supported Operating Systems
- Supported JDK Versions
- Supported Browsers
- Supported Databases
- Supported CDH and Managed Service Versions
- Supported Transport Layer Security Versions
- Resource Requirements
- Networking and Security Requirements
Supported Operating Systems
Supported JDK Versions
Unless specifically excluded, support for a minor JDK release begins from the Cloudera major release in which support for the major JDK release was added. For example, 8u102 was released in time for C5.9 but is actually supported from C5.3 because that is when support for JDK 1.8 was added. Cloudera excludes or removes support for select Java updates when security is jeopardized.
Running CDH nodes within the same cluster on different JDK releases is not supported. JDK release across a cluster needs to match the patch level.
- All nodes in your cluster must run the same Oracle JDK version.
- All services must be deployed on the same Oracle JDK version.
All JDK 7 updates, from the minimum required version, are supported in CM/CDH 5.0 and higher unless specifically excluded. Updates above the minimum that are not listed are supported but not tested.
The Cloudera Manager repository is packaged with Oracle JDK 1.7.0_67 (for example) and can be automatically installed during a new installation or an upgrade.
JDK 7 updates that are supported and tested
|JDK 7||Supported in all C5.x|
|1.7u80||Recommended / Latest version supported|
All JDK 8 updates, from the minimum required version, are supported in CM/CDH 5.3 and higher unless specifically excluded. Updates above the minimum that are not listed are supported but not tested.
Warning: JDK 8u40, 8u45, and 8u60 are excluded from support due to a security risk: HTTP authentication can fail for web-based UI components such as HDFS, YARN, SOLR, and Oozie.
Important: JDK 8u75 is supported but has a Known Issue: Oozie Web Console returns 500 error when Oozie server runs on JDK 8u75 or higher.
JDK 8 updates that are supported and tested
|JDK 8||Supported in C5.3 and Higher|
|1.8u121||Recommended / Latest version supported|
- Chrome: Version history
- Firefox: Version history
- Internet Explorer: Version history
- Safari (Mac only): Version history
Hue can display in older, and other, browsers, but you might not have access to all of its features.Important: To see all icons in the Hue Web UI, users with IE and HTTPS must add a Load Balancer.
The Cloudera Manager Admin Console, which you use to install, configure, manage, and monitor services, supports the latest version of the following browsers:
- Mozilla Firefox
- Google Chrome
- Internet Explorer
Please see Cloudera Manager Supported Databases for a full list of supported databases for each version of Cloudera Manager.
Cloudera Manager and CDH come packaged with an embedded PostgreSQL database, but it is recommended that you configure your cluster with custom external databases, especially in production.
In most cases (but not all), Cloudera supports versions of MariaDB, MySQL and PostgreSQL that are native to each supported Linux distribution.
After installing a database, upgrade to the latest patch and apply appropriate updates. Available updates may be specific to the operating system on which it is installed.
- Use UTF8 encoding for all custom databases.
- Cloudera Manager installation fails if GTID-based replication is enabled in MySQL.
- Hue requires the default MySQL/MariaDB version (if used) of the operating system on which it is installed. See Hue Databases.
- Both the Community and Enterprise versions of MySQL are supported, as well as MySQL configured by the AWS RDS service.
Important: When you restart processes, the configuration for each of the services is redeployed using information saved in the Cloudera Manager database. If this information is not available, your cluster does not start or function correctly. You must schedule and maintain regular backups of the Cloudera Manager database to recover the cluster in the event of the loss of this database.
Supported CDH and Managed Service Versions
The Cloudera Manager minor version must always be equal to or greater than the CDH minor version. Older versions of Cloudera Manager might not support features in newer versions of CDH. For example, to upgrade to CDH 5.7.1 you must first upgrade to Cloudera Manager 5.7.0.
Warning: Cloudera Manager 5 does not support CDH 3 and you cannot upgrade Cloudera Manager 4 to Cloudera Manager 5 if you have a cluster running CDH 3. Therefore, to upgrade CDH 3 clusters to CDH 4 using Cloudera Manager, you must use Cloudera Manager 4.
For more information on supported managed service versions, see the Product Compatibility Matrix.
Supported Transport Layer Security Versions
Cloudera Manager requires the following resources:
- Disk Space
- Cloudera Manager Server
- 5 GB on the partition hosting /var.
- 500 MB on the partition hosting /usr.
- For parcels, the space required depends on the number of parcels you download to the Cloudera Manager Server and distribute to Agent hosts. You can download multiple parcels of the same product, of different versions and different builds. If you are managing multiple clusters, only one parcel of a product/version/build/distribution is downloaded on the Cloudera Manager Server—not one per cluster. In the local parcel repository on the Cloudera Manager Server, the approximate sizes of the various parcels are as follows:
- CDH 5 (which includes Impala and Search) - 1.5 GB per parcel (packed), 2 GB per parcel (unpacked)
- Impala - 200 MB per parcel
- Cloudera Search - 400 MB per parcel
- Cloudera Management Service -The Host Monitor and Service Monitor databases are stored on the partition hosting /var. Ensure that you have at least 20 GB available on this partition.
- Agents - On Agent hosts, each unpacked parcel requires about three times the space of the downloaded parcel on the Cloudera Manager Server. By default, unpacked parcels are located in /opt/cloudera/parcels.
- Cloudera Manager Server
- RAM - 4 GB is recommended for most cases and is required when using Oracle databases. 2 GB might be sufficient for non-Oracle deployments with fewer than 100 hosts. However, to run the Cloudera Manager Server on a machine with 2 GB of RAM, you must tune down its maximum heap size (by modifying -Xmx in /etc/default/cloudera-scm-server). Otherwise the kernel might kill the Server for consuming too much RAM.
- Python - Cloudera Manager requires Python 2.4 or higher (but is not compatible with Python 3.0 or higher). Hue in CDH 5 and package installs of CDH 5 require Python 2.6 or 2.7. All supported operating systems include Python version 2.4 or higher. Cloudera Manager is compatible with Python 2.4 through the latest version of Python 2.x. Cloudera Manager does not support Python 3.0 and higher.
- Perl - Cloudera Manager requires perl.
- python-psycopg2 package - Cloudera Manager 5.8 and higher has a new dependency on the package python-psycopg2. This package is not available in standard SLES 11 and SLES 12 repositories. You need to add this repository or install it manually to any machine that runs the Cloudera Manager Agent before you install or upgrade Cloudera Manager.
If the Cloudera Manager Server and Agent run on the same host, install the Cloudera Manager Server first and then add the python-psycopg2 repository or package. After adding the repository or package, install the Cloudera Manager Agent.
Download the python-psycopg2 repository or package from the following URL by selecting the correct SLES version: http://software.opensuse.org/download.html?project=server%3Adatabase%3Apostgresql&package=python-psycopg2
You can add and install the repository manually or grab the package directly.
To add the repository and manually install it, you need the URL to the repository. You can find the URL for your operating system version on the download page.
For example, to add and manually install the python-psycopg2 repository for SLES 11 SP4, run the following commands:
zypper addrepo http://download.opensuse.org/repositories/server:database:postgresql/SLE_11_SP4/server:database:postgresql.repo
zypper install python-psycopg2
Alternatively, you can grab the package directly from the download page. Select the python-psycopg2-2.6.2-<version>.x86_64.rpm file for your operating system version.
Networking and Security Requirements
The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:
- Networking Protocols Support
CDH requires IPv4. IPv6 is not supported and must be disabled.Note: Contact your OS vendor for help disabling IPv6.
See also Configuring Network Names.
- Multihoming Support
– Multihoming CDH or Cloudera Manager is not supported outside specifically certified Cloudera partner appliances. Cloudera finds that current Hadoop architectures combined with modern network infrastructures and security practices remove the need for multihoming. Multihoming, however, is beneficial internally in appliance form factors to take advantage of high-bandwidth InfiniBand interconnects.
Although some subareas of the product may work with unsupported custom multihoming configurations, there are known issues with multihoming. In addition, unknown issues may arise because multihoming is not covered by our test matrix outside the Cloudera-certified partner appliances.
Data at rest encryption requires sufficient entropy to ensure randomness.
See Entropy Requirements.
- Cluster hosts must have a working network name resolution system and correctly formatted /etc/hostsfile. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must:
- Contain consistent information about hostnames and IP addresses across all hosts
- Not contain uppercase hostnames
- Not contain duplicate IP addresses
Cluster hosts must not use aliases, either in /etc/hosts or in configuring DNS. A properly formatted /etc/hosts file should be similar to the following example:
127.0.0.1 localhost.localdomain localhost
192.168.1.1 cluster-01.example.com cluster-01
192.168.1.2 cluster-02.example.com cluster-02
192.168.1.3 cluster-03.example.com cluster-03
- In most cases, the Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation or upgrade wizard. You must log in using a root account or an account that has password-less sudo permission. For authentication during the installation and upgrade procedures, you must either enter the password or upload a public and private key pair for the root or sudo user account. If you want to use a public and private key pair, the public key must be installed on the cluster hosts before you use Cloudera Manager.
Cloudera Manager uses SSH only during the initial install or upgrade. Once the cluster is set up, you can disable root SSH access or change the root password. Cloudera Manager does not save SSH credentials, and all credential information is discarded when the installation is complete.
- If single user mode is not enabled, the Cloudera Manager Agent runs as root so that it can make sure the required directories are created and that processes and files are owned by the appropriate user (for example, the hdfs and mapred users).
- No blocking is done by Security-Enhanced Linux (SELinux).Note: Cloudera Enterprise is supported on platforms with Security-Enhanced Linux (SELinux) enabled. However, Cloudera does not support use of SELinux with Cloudera Navigator. Cloudera is not responsible for policy support nor policy enforcement. If you experience issues with SELinux, contact your OS provider.
- No blocking by iptables or firewalls; port 7180 must be open because it is used to access Cloudera Manager after installation. Cloudera Manager communicates using specific ports, which must be open.
- For RHEL and CentOS, the /etc/sysconfig/network file on each host must contain the hostname you have just set (or verified) for that host.
- Cloudera Manager and CDH use several user accounts and groups to complete their tasks. The set of user accounts and groups varies according to the components you choose to install. Do not delete these accounts or groups and do not modify their permissions and rights. Ensure that no existing systems prevent these accounts and groups from functioning. For example, if you have scripts that delete user accounts not in a whitelist, add these accounts to the list of permitted accounts. Cloudera Manager, CDH, and managed services create and use the following accounts and groups:
Users and Groups
|Unix User ID||Groups||Notes|
|Cloudera Manager (all versions)||cloudera-scm||cloudera-scm||Cloudera Manager processes such as the Cloudera Manager Server and the monitoring roles run as this user.
The Cloudera Manager keytab file must be named cmf.keytab since that name is hard-coded in Cloudera Manager.Note: Applicable to clusters managed by Cloudera Manager only.
|Apache Accumulo (Accumulo 1.4.3 and higher)||accumulo||accumulo||Accumulo processes run as this user.|
|Apache Avro||No special users.|
|Apache Flume (CDH 4, CDH 5)||flume||flume||The sink that writes to HDFS as this user must have write privileges.|
|Apache HBase (CDH 4, CDH 5)||hbase||hbase||The Master and the RegionServer processes run as this user.|
|HDFS (CDH 4, CDH 5)||hdfs||hdfs, hadoop||The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.|
|Apache Hive (CDH 4, CDH 5)||hive||hive||
The HiveServer2 process and the Hive Metastore processes run as this user.
A user must be defined for Hive access to its Metastore DB (for example, MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml.
|Apache HCatalog (CDH 4.2 and higher, CDH 5)||hive||hive||
The WebHCat service (for REST access to Hive functionality) runs as the hive user.
|HttpFS (CDH 4, CDH 5)||httpfs||httpfs||
The HttpFS service runs as this user. See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.
|Hue (CDH 4, CDH 5)||hue||hue||
Hue services run as this user.
|Hue Load Balancer (Cloudera Manager 5.5 and higher)||apache||apache||The Hue Load balancer has a dependency on the apache2 package that uses the apache user name. Cloudera Manager does not run processes using this user ID.|
|Cloudera Impala (CDH 4.1 and higher, CDH 5)||impala||impala, hive||Impala services run as this user.|
|Apache Kafka (Cloudera Distribution of Kafka 1.2.0)||kafka||kafka||Kafka services run as this user.|
|Java KeyStore KMS (CDH 5.2.1 and higher)||kms||kms||The Java KeyStore KMS service runs as this user.|
|Key Trustee KMS (CDH 5.3 and higher)||kms||kms||The Key Trustee KMS service runs as this user.|
|Key Trustee Server (CDH 5.4 and higher)||keytrustee||keytrustee||The Key Trustee Server service runs as this user.|
|Kudu||kudu||kudu||Kudu services run as this user.|
|Llama (CDH 5)||llama||llama||Llama runs as this user.|
|Apache Mahout||No special users.|
|MapReduce (CDH 4, CDH 5)||mapred||mapred, hadoop||Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos.|
|Apache Oozie (CDH 4, CDH 5)||oozie||oozie||The Oozie service runs as this user.|
|Parquet||No special users.|
|Apache Pig||No special users.|
|Cloudera Search (CDH 4.3 and higher, CDH 5)||solr||solr||The Solr processes run as this user.|
|Apache Spark (CDH 5)||spark||spark||The Spark History Server process runs as this user.|
|Apache Sentry (CDH 5.1 and higher)||sentry||sentry||The Sentry service runs as this user.|
|Apache Sqoop (CDH 4, CDH 5)||sqoop||sqoop||This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.|
|Apache Sqoop2 (CDH 4.2 and higher, CDH 5)||sqoop2||sqoop, sqoop2||The Sqoop2 service runs as this user.|
|Apache Whirr||No special users.|
|YARN (CDH 4, CDH 5)||yarn||yarn, hadoop||Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos.|
|Apache ZooKeeper (CDH 4, CDH 5)||zookeeper||zookeeper||The ZooKeeper processes run as this user. It is not configurable.|
- Backup and Disaster Recovery
- Refreshing Impala metadata during replication
You can now use an option in the Cloudera Manager Admin Console to configure BDR to automatically refresh Impala’s metadata cache in the destination cluster during replication. Previously, this feature required an Advanced Configuration Snippet (Safety Valve). See Invalidating Impala Metadata.
- Automatically renewal of Kerberos tickets and Delegation Tokens
Previously, BDR replication jobs would fail on a Kerberized cluster if the job duration was longer than the renewal interval for the HDFS delegation token. With this fix, both the delegation token and Kerberos ticket are renewed until the max lifetime of token/ticket (default value is 7 days). This enables longer replications without needing to bring down the source cluster to change the ticket timeout.
- Streamlined Kerberos Configuration
- As part of Test Connectivity for peers, Cloudera Manager now tests for properly configured Kerberos authentication on the source and destination clusters. Test Connectivity runs automatically when you add a peer for replication, or you can manually initiate Test Connectivity from the Actions menu. This feature is available when the source and destination clusters run Cloudera Manager 5.12 or higher. See Enabling Replication Between Clusters with Kerberos Authentication.
- If Cloudera Manager is managing the Kerberos configuration (krb5.conf) for your clusters, BDR can automatically make some required changes to your Kerberos configuration based on issues found during the Test Connectivity action.
- The configuration process for adding peers when using Kerberized clusters is simplified if both the source and target clusters use Cloudera Manager 5.12 or later. Now, you only need to setup trust on the target cluster and not the source, reducing the complexity of enabling Hive Replication. See Enabling Replication Between Clusters with Kerberos Authentication
- Add a name and description to replication schedules
When you create or edit a replication schedule, you can add a name on the General tab and add a description on the Advanced tab.
- Refreshing Impala metadata during replication
- Hive Metastore Schema Integrity Checker
Cloudera Manager now uses the Hive Metastore schemaTool for validating the integrity of Hive metadata. When you upgrade a cluster that contains a Hive Service to CDH 5.12 or higher using the Cloudera Manager Upgrade Wizard or command line, before upgrading the Hive metastore schema, Cloudera Manager first runs a validation check to detect any corruption. If the validation check fails, Cloudera Manager displays the error and stops the upgrade. Corruption issues should be resolved before proceeding with the upgrade.
- Support for HSM Key Provider
The HDFS Encryption Wizard in Cloudera Manager now supports configuration of the Hardware Security Module (HSM) Key Providers supported by CDH 5.12 for encryption key management.
- Sending Diagnostic Bundles
The user interface in the Cloudera Manager Admin Console for collecting and sending diagnostic bundles has been improved. Regardless of how diagnostic data collection is configured before you start, each time you create a bundle, you can now select one of the following options: Collect and Upload Diagnostic Data to Cloudera Support or Collect Diagnostics Data only. Additionally, the Cloudera Manager Admin console better indicates the status of the bundle. For example, showing whether or not the bundle was successfully sent to Cloudera.
- Delete Kerberos Service Principals
You can now delete MIT Kerberos or Active Directory Service Principals that were previously created by Cloudera Manager while Kerberizing a cluster using the delete_credentials API.
- HBase Region in Transition Health Check
Cloudera Manager now performs a health check to detect whether HBase regions have become stuck in transition during splitting and merging operations.
- Replication factor for MapReduce job submission files
New auto-configuration logic for MR1 and MR2's Submit Replication Factor property attempts to choose a value that is at least the value of the HDFS Replication Factor for clusters with three or more DataNodes. Additionally, a new configuration validator raises a configuration warning if the existing Submit Replication Factor is lower than the HDFS Replication Factor if the cluster has at least 3 DataNodes.
- Custom Header Color
You can customize the header color that Cloudera Manager displays in the web UI. Select Administration> Settings. Select Other for the Category and use the drop-down menu for Custom Header Color.
- Dynamic Resource Pools UI
The Dynamic Resource Pools user interface now displays Access Control information about resource pools, showing whether they are freely usable, restricted to a custom set of users/groups, or inherit ACLs from their parent pool.
- Example Impala Shell Command
The Impala Service Status Page now includes an example Impala Shell Command.
- Configurable S3 Endpoint
The S3 Connector service now allows you to configure the default S3 endpoint used by HDFS clients (including Hive and Impala), ensuring all S3 data created/accessed by your cluster is (by default) stored in the AWS region of your choice. Additionally, Hue is configured to automatically use the default endpoint as the S3 Connector.
- Request Rate and Index Size Charts
The graphs on the Solr status page now include the request rates against the service and the aggregate size of the indices.
- New Tags in Logs
The logging for Solr has been improved. Logs now include the following IDs: thread, shard, replica, and collection.
- Request Rate and Index Size Charts
Want to Get Involved or Learn More?
Check out our other resources
Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.