Long term component architecture
As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.
PLEASE NOTE:
With the exception of DSSD support, Cloudera Enterprise 5.6.0 is identical to CDH 5.5.2/Cloudera Manager 5.5.3 If you do not need DSSD support, you do not need to upgrade if you are already using the latest 5.5.x release.
- System Requirements
- What's New
- Documentation
System Requirements
- Supported Operating Systems
- Supported Databases
- Supported JDK Versions
- Supported Browsers
- Supported Internet Protocol
- Supported Transport Layer Security Versions
Supported Operating Systems
Supported Databases
Component | MariaDB | MySQL | SQLite | PostgreSQL | Oracle | Derby - see Note 5 |
---|---|---|---|---|---|---|
Cloudera Manager | 5.5, 10 | 5.6, 5.5, 5.1 | – | 9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1 | 12c, 11gR2 | |
Oozie | 5.5, 10 | 5.6, 5.5, 5.1 | – | 9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1 See Note 3 |
12c, 11gR2 | Default |
Flume | – | – | – | – | – | Default (for the JDBC Channel only) |
Hue | 5.5, 10 | 5.6, 5.5, 5.1 See Note 6 |
Default | 9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1 See Note 3 |
12c, 11gR2 | – |
Hive/Impala | 5.5, 10 | 5.6, 5.5, 5.1 See Note 1 |
– | 9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1 See Note 3 |
12c, 11gR2 | Default |
Sentry | 5.5, 10 | 5.6, 5.5, 5.1 See Note 1 |
– | 9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1 See Note 3 |
12c, 11gR2 | – |
Sqoop 1 | 5.5, 10 | See Note 4 | – | See Note 4 | See Note 4 | – |
Sqoop 2 | 5.5, 10 | See Note 9 | – | – | – | Default |
Note:
- Cloudera supports the databases listed above provided they are supported by the underlying operating system on which they run.
- MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and higher. The InnoDB storage engine must be enabled in the MySQL server.
- Cloudera Manager installation fails if GTID-based replication is enabled in MySQL.
- PostgreSQL 9.2 is supported on CDH 5.1 and higher. PostgreSQL 9.3 is supported on CDH 5.2 and higher. PostgreSQL 9.4 is supported on CDH 5.5 and higher.
- For purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
- Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation guide for recommendations.
- CDH 5 Hue requires the default MySQL version of the operating system on which it is being installed, which is usually MySQL 5.1, 5.5, or 5.6.
- When installing a JDBC driver, only the ojdbc6.jar file is supported for both Oracle 11g R2 and Oracle 12c; the ojdbc7.jar file is not supported.
- Sqoop 2 lacks some of the features of Sqoop 1. Cloudera recommends you use Sqoop 1. Use Sqoop 2 only if it contains all the features required for your use case.
- MariaDB 10 is supported only on CDH 5.9 and higher.
Supported JDK Versions
CDH and Cloudera Manager Supported JDK Versions
Only 64 bit JDKs from Oracle are supported. Oracle JDK 7 is supported across all versions of Cloudera Manager 5 and CDH 5. Oracle JDK 8 is supported in C5.3.x and higher.
A supported minor JDK release will remain supported throughout a Cloudera major release lifecycle, from the time of its addition forward, unless specifically excluded.
Warning: JDK 1.8u40 and JDK 1.8u60 are excluded from support. Also, the Oozie Web Console returns 500 error when Oozie server runs on JDK 8u75 or higher.
Running CDH nodes within the same cluster on different JDK releases is not supported. JDK release across a cluster needs to match the patch level.
- All nodes in your cluster must run the same Oracle JDK version.
- All services must be deployed on the same Oracle JDK version.
The Cloudera Manager repository is packaged with Oracle JDK 1.7.0_67 (for example) and can be automatically installed during a new installation or an upgrade.
For a full list of supported JDK Versions please see CDH and Cloudera Manager Supported JDK Versions.
Supported Browsers
Hue
Hue works with the two most recent versions of the following browsers. Cookies and JavaScript must be on.
- Chrome
- Firefox
- Safari (not supported on Windows)
- Internet Explorer
Hue could display in older versions and even other browsers, but you might not have access to all of its features.
Supported Internet Protocol
CDH requires IPv4. IPv6 is not supported.
See also Configuring Network Names.
Multihoming CDH or Cloudera Manager is not supported outside specifically certified Cloudera partner appliances. Cloudera finds that current Hadoop architectures combined with modern network infrastructures and security practices remove the need for multihoming. Multihoming, however, is beneficial internally in appliance form factors to take advantage of high-bandwidth InfiniBand interconnects.
Although some subareas of the product may work with unsupported custom multihoming configurations, there are known issues with multihoming. In addition, unknown issues may arise because multihoming is not covered by our test matrix outside the Cloudera-certified partner appliances.
Supported Transport Layer Security Versions
The following components are supported by the indicated versions of Transport Layer Security (TLS):
Component |
Role | Name | Port | Version |
---|---|---|---|---|
Cloudera Manager | Cloudera Manager Server | 7182 | TLS 1.2 | |
Cloudera Manager | Cloudera Manager Server | 7183 | TLS 1.2 | |
Flume | 9099 | TLS 1.2 | ||
Flume | Avro Source/Sink | TLS 1.2 | ||
Flume | Flume HTTP Source/Sink | TLS 1.2 | ||
HBase | Master | HBase Master Web UI Port | 60010 | TLS 1.2 |
HDFS | NameNode | Secure NameNode Web UI Port | 50470 | TLS 1.2 |
HDFS | Secondary NameNode | Secure Secondary NameNode Web UI Port | 50495 | TLS 1.2 |
HDFS | HttpFS | REST Port | 14000 | TLS 1.1, TLS 1.2 |
Hive | HiveServer2 | HiveServer2 Port | 10000 | TLS 1.2 |
Hue | Hue Server | Hue HTTP Port | 8888 | TLS 1.2 |
Impala | Impala Daemon | Impala Daemon Beeswax Port | 21000 | TLS 1.2 |
Impala | Impala Daemon | Impala Daemon HiveServer2 Port | 21050 | TLS 1.2 |
Impala | Impala Daemon | Impala Daemon Backend Port | 22000 | TLS 1.2 |
Impala | Impala StateStore | StateStore Service Port | 24000 | TLS 1.2 |
Impala | Impala Daemon | Impala Daemon HTTP Server Port | 25000 | TLS 1.2 |
Impala | Impala StateStore | StateStore HTTP Server Port | 25010 | TLS 1.2 |
Impala | Impala Catalog Server | Catalog Server HTTP Server Port | 25020 | TLS 1.2 |
Impala | Impala Catalog Server | Catalog Server Service Port | 26000 | TLS 1.2 |
Oozie | Oozie Server | Oozie HTTPS Port | 11443 | TLS 1.1, TLS 1.2 |
Solr | Solr Server | Solr HTTP Port | 8983 | TLS 1.1, TLS 1.2 |
Solr | Solr Server | Solr HTTPS Port | 8985 | TLS 1.1, TLS 1.2 |
Spark | History Server | 18080 | TLS 1.2 | |
YARN | ResourceManager | ResourceManager Web Application HTTP Port | 8090 | TLS 1.2 |
YARN | JobHistory Server | MRv1 JobHistory Web Application HTTP Port | 19890 | TLS 1.2 |
What's New
What's New In CDH 5.9.x
Apache Hadoop
- CDH 5.9 allows you to use temporary credentials to log in to Amazon S3. You can obtain temporary credentials from Amazon's Security Token Service (STS).
Apache HBase
- A tool has been added--org.apache.hadoop.hbase.replication.regionserver.DumpReplicationQueues--to dump existing replication peers, configurations, and queues when using HBase replication. The tool includes two flags:
- --distributed - Polls each replication server for information about the replication queues being processed on this replication server. By default, this is not enabled, and the information about the replication queues and configuration is obtained from ZooKeeper.
- --hdfs When --distributed is used, this flag attempts to calculate the total size of the WAL files used by the replication queues. Because multiple peers can be configured, this value can be overestimated.
For more information, see Class DumpReplicationQueues.
- Metrics have been added that expose the amount of replayed work occurring in the HBase replication system. For more information on these metrics, see Replication Metrics in the Apache HBase Reference Guide.
Apache Hive
HIVE-14270 : Added parameters to optimize write performance for Hive tables and partitions that are stored on Amazon S3. See Optimizing Hive Write Performance on Amazon S3.
Hue
HUE-2915: Integrates Hue with Amazon S3. You can now access both S3 and HDFS in the File Browser, create tables from files in S3, and save query results in S3. See how to Enable S3 Cloud Storage.
HUE-4039: Improves SQL Autocompleter. The new Autocompleter deeply understands Hive and Impala SQL dialects and provides smart suggestions based on your statement structure and cursor position. See how to manually Enable and Disable Autocompleter.
HUE-3877: Adds support for Amazon RDS. You can now deploy Hue against an Amazon RDS database instance with MySQL, PostgreSQL, and Oracle engines.
Rebase of Hue on upstream Hue 3.11.
Apache Impala
Performance improvements:
[IMPALA-3206] Speedup for queries against DECIMAL columns in Avro tables. The code that parses DECIMAL values from Avro now uses native code generation.
[IMPALA-3674] Improved efficiency in LLVM code generation can reduce codegen time, especially for short queries.
[IMPALA-2979] Improvements to scheduling on worker nodes, enabled by the REPLICA_PREFERENCEquery option. See REPLICA_PREFERENCE Query Option (CDH 5.9 or higher only) for details.
[IMPALA-1683] The REFRESH statement can be applied to a single partition, rather than the entire table. See REFRESH Statement and Refreshing a Single Partition for details.
Improvements to the Impala web user interface:
[IMPALA-2767] You can now force a session to expire by clicking a link in the web UI, on the /sessions tab.
[IMPALA-3715] The /memz tab includes more information about Impala memory usage.
[IMPALA-3716] The Details page for a query now includes a Memory tab.
[IMPALA-3499] Scalability improvements to the catalog server. Impala handles internal communication more efficiently for tables with large numbers of columns and partitions, where the size of the metadata exceeds 2 GiB.
[IMPALA-3677] You can send a SIGUSR1 signal to any Impala-related daemon to write a Breakpad minidump. For advanced troubleshooting, you can now produce a minidump without triggering a crash. See Breakpad Minidumps for Impala (CDH 5.8 or higher only) for details about the Breakpad minidump feature.
[IMPALA-3687] The schema reconciliation rules for Avro tables have changed slightly for CHAR and VARCHAR columns. Now, if the definition of such a column is changed in the Avro schema file, the column retains its CHAR or VARCHAR type as specified in the SQL definition, but the column name and comment from the Avro schema file take precedence. See Creating Avro Tables for details about column definitions in Avro tables.
[IMPALA-3575] Some network operations now have additional timeout and retry settings. The extra configuration helps avoid failed queries for transient network problems, to avoid hangs when a sender or receiver fails in the middle of a network transmission, and to make cancellation requests more reliable despite network issues.
Apache Sentry
- Sentry adds support for securing data on Amazon RDS. As a result, Sentry will now be able to secure URIs with an RDS schema.
- SENTRY-1233 - Logging improvements for SentryConfigToolSolr.
- SENTRY-1119 - Allow data engines to obtain the ActionFactory directly from the configuration, instead of having hardcoded component-specific classes. This will allow external data engines to integrate with Sentry easily.
- SENTRY-1229 - Added a basic configurable cache to SentryGenericProviderBackend.
Apache Spark
- You can now set up AWS credentials for Spark with the Hadoop credential provider, to avoid exposing the AWS secret key in configuration files.
Apache Sqoop
- The mainframe import module extension has been added to support data sets on tape.
Cloudera Search
- The Solr watchdog is now configured to use the fully qualified domain name (FQDN) of the host on which the Solr process is running (instead of 127.0.0.1). You can override this configuration by setting SOLR_HOSTNAME environment variable to appropriate value (before starting the Solr server).
- Cloudera Search adds support for index snapshots. For more information on how to back up, migrate, or restore your indexed data, see Backing Up and Restoring Cloudera Search.
Documentation
Want to Get Involved or Learn More?
Check out our other resources

Cloudera Community
Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University
Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.