Long term component architecture
As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.
With the exception of DSSD support, Cloudera Enterprise 5.6.0 is identical to CDH 5.5.2/Cloudera Manager 5.5.3 If you do not need DSSD support, you do not need to upgrade if you are already using the latest 5.5.x release.
- System Requirements
- What's New
- Supported Operating Systems
- Supported Databases
- Supported JDK Versions
- Supported Browsers
- Supported Internet Protocol
- Supported Transport Layer Security Versions
Supported Operating Systems
Please see Cloudera Manager Supported Databases for a full list of supported databases for each version of Cloudera Manager.
Cloudera Manager and CDH come packaged with an embedded PostgreSQL database, but it is recommended that you configure your cluster with custom external databases, especially in production.
In most cases (but not all), Cloudera supports versions of MariaDB, MySQL and PostgreSQL that are native to each supported Linux distribution.
After installing a database, upgrade to the latest patch and apply appropriate updates. Available updates may be specific to the operating system on which it is installed.
- Use UTF8 encoding for all custom databases.
- Cloudera Manager installation fails if GTID-based replication is enabled in MySQL.
- Hue requires the default MySQL/MariaDB version (if used) of the operating system on which it is installed. See Hue Databases.
- Both the Community and Enterprise versions of MySQL are supported, as well as MySQL configured by the AWS RDS service.
Important: When you restart processes, the configuration for each of the services is redeployed using information saved in the Cloudera Manager database. If this information is not available, your cluster does not start or function correctly. You must schedule and maintain regular backups of the Cloudera Manager database to recover the cluster in the event of the loss of this database.
Supported JDK Versions
A supported minor JDK release will remain supported throughout a Cloudera major release lifecycle, from the time of its addition forward, unless specifically excluded.
Warning: JDK 1.8u40 and JDK 1.8u60 are excluded from support. Also, the Oozie Web Console returns 500 error when Oozie server runs on JDK 8u75 or higher.
Running CDH nodes within the same cluster on different JDK releases is not supported. JDK release across a cluster needs to match the patch level.
- All nodes in your cluster must run the same Oracle JDK version.
- All services must be deployed on the same Oracle JDK version.
The Cloudera Manager repository is packaged with Oracle JDK 1.7.0_67 (for example) and can be automatically installed during a new installation or an upgrade.
For a full list of supported JDK Versions please see CDH and Cloudera Manager Supported JDK Versions.
- Chrome: Version history
- Firefox: Version history
- Internet Explorer: Version history
- Safari (Mac only): Version history
Hue can display in older, and other, browsers, but you might not have access to all of its features.Important: To see all icons in the Hue Web UI, users with IE and HTTPS must add a Load Balancer.
Supported Internet Protocol
CDH requires IPv4. IPv6 is not supported.
See also Configuring Network Names.
Multihoming CDH or Cloudera Manager is not supported outside specifically certified Cloudera partner appliances. Cloudera finds that current Hadoop architectures combined with modern network infrastructures and security practices remove the need for multihoming. Multihoming, however, is beneficial internally in appliance form factors to take advantage of high-bandwidth InfiniBand interconnects.
Although some subareas of the product may work with unsupported custom multihoming configurations, there are known issues with multihoming. In addition, unknown issues may arise because multihoming is not covered by our test matrix outside the Cloudera-certified partner appliances.
Supported Transport Layer Security Versions
The following components are supported by the indicated versions of Transport Layer Security (TLS):
Components Supported by TLS
|Cloudera Manager||Cloudera Manager Server||7182||TLS 1.2|
|Cloudera Manager||Cloudera Manager Server||7183||TLS 1.2|
|Flume||Avro Source/Sink||TLS 1.2|
|Flume||Flume HTTP Source/Sink||TLS 1.2|
|HBase||Master||HBase Master Web UI Port||60010||TLS 1.2|
|HDFS||NameNode||Secure NameNode Web UI Port||50470||TLS 1.2|
|HDFS||Secondary NameNode||Secure Secondary NameNode Web UI Port||50495||TLS 1.2|
|HDFS||HttpFS||REST Port||14000||TLS 1.1, TLS 1.2|
|Hive||HiveServer2||HiveServer2 Port||10000||TLS 1.2|
|Hue||Hue Server||Hue HTTP Port||8888||TLS 1.2|
|Impala||Impala Daemon||Impala Daemon Beeswax Port||21000||TLS 1.2|
|Impala||Impala Daemon||Impala Daemon HiveServer2 Port||21050||TLS 1.2|
|Impala||Impala Daemon||Impala Daemon Backend Port||22000||TLS 1.2|
|Impala||Impala StateStore||StateStore Service Port||24000||TLS 1.2|
|Impala||Impala Daemon||Impala Daemon HTTP Server Port||25000||TLS 1.2|
|Impala||Impala StateStore||StateStore HTTP Server Port||25010||TLS 1.2|
|Impala||Impala Catalog Server||Catalog Server HTTP Server Port||25020||TLS 1.2|
|Impala||Impala Catalog Server||Catalog Server Service Port||26000||TLS 1.2|
|Oozie||Oozie Server||Oozie HTTPS Port||11443||TLS 1.1, TLS 1.2|
|Solr||Solr Server||Solr HTTP Port||8983||TLS 1.1, TLS 1.2|
|Solr||Solr Server||Solr HTTPS Port||8985||TLS 1.1, TLS 1.2|
|Spark||History Server||18080||TLS 1.2|
|YARN||ResourceManager||ResourceManager Web Application HTTP Port||8090||TLS 1.2|
|YARN||JobHistory Server||MRv1 JobHistory Web Application HTTP Port||19890||TLS 1.2|
The following sections describe new features introduced in 5.11.0.
- Supported Apache Tomcat TLS ciphers for HttpFS are configurable using the HTTPFS_SSL_CIPHERSenvironment variable.
- Supported Apache Tomcat TLS ciphers for the KMS are configurable using the KMS_SSL_CIPHERSenvironment variable.
- Amazon S3 Consistency with Metadata Caching (S3Guard)
Data written to Amazon S3 buckets is subject to the "eventual consistency" guarantee provided by Amazon Web Services (AWS), which means that data written to S3 may not be immediately available for queries and listing operations. This can cause failures in multi-step ETL workflows, where data from a previous step is not available to the next step. To mitigate these consistency issues you can now configure metadata caching for data stored in Amazon S3 using S3Guard. S3Guard requires that you provision a DynamoDB database from Amazon Web Services and configure S3Guard using the Cloudera Manager Admin Console or command-line tools. See Configuring and Managing S3Guard.
- Amazon S3 Server-side Encryption with SSE-KMS
Clusters that use Amazon S3 storage can now use Amazon Server-Side Encryption with AWS KMS–Managed Keys (SSE-KMS) to encrypt data, so you now have two choices for data-at-rest encryption on Amazon S3 (SSE-S3, SSE-KMS). Use Cloudera Manager Admin Console to configure the cluster to use this new feature as detailed in How to Configure Encryption for Amazon S3.
Hive on Amazon S3 performance optimizations for:
HIVE-14204 : Dynamic partitioning writes and the INSERT OVERWRITE statement
HIVE-15546 : Parallel input path listing
Support for Microsoft Azure Data Lake Store (ADLS) as a secondary filesystem for Hive on MapReduce2 (YARN). You can use Hive on MapReduce2 to read and write data stored on ADLS. Hive-on-Spark is not currently supported to use ADLS data with CDH.
AWS cloud clusters can now share a single persistent instance of Amazon Relational Database Service (RDS) as the Hive metastore backend database, enabling persistent sharing of metadata beyond a cluster's life cycle.
Integrate Navigator with Hue: Phase 1, Metadata Discovery
- Search and tag partitions, databases, views, tables, columns.
- Off by default. Check both "Enable" fields in Hue > Configuration > Cloudera Navigator.
- See How to Enable and Use Navigator in Hue.
Embed new create table wizard within Editor and Assist
- Safely import multiple formats such as Kudu, Parquet, JSON, and CSV.
- More easily create table partitions.
- Continued SQL improvements
- Visually more pleasant colors and text.
- No more hanging spinner in the Editor.
HUE-5742: Allow non-public PostgreSQL schemas.
HUE-5608: Add ability to DESC table without TABLE level privilege
- Supported TLS ciphers for Apache Tomcat are configurable using the OOZIE_HTTPS_CIPHERS environment variable.
Blacklisting. This feature reduces the chance of application failure, by not scheduling work on hosts that are experiencing intermittent disk failures. See this blog post for background information.
You can enable Kerberos authentication and TLS/SSL encryption for the Spark History Server through Cloudera Manager configuration settings, rather than including the password in clear text in an Advanced Configuration Snippet field. See these settings in the Cloudera Manager user interface:
- history_server_spnego_enabled - for Kerberos authentication
With authentication enabled, only Kerberos-authorized users can read data from the Spark History Server, and non-admin users can only see information about their own jobs.
With TLS/SSL enabled, you provide the location of the keystore and its password, similar to the security configuration for other components.
Navigator lineage. The former Spark lineage extractor that was enabled through a safety valve is superceded by a more robust lineage collection mechanism. See Apache Spark Known Issues for some limitations and restrictions with this feature.
Support for Azure Data Lake Store (ADLS) as a secondary filesystem. You can use Spark jobs to read and write data stored on ADLS. Hive-on-Spark and Spark with Kudu are not currently supported for ADLS data.
- Supported TLS ciphers for Apache Tomcat are configurable using the SOLR_CIPHERS_CONFIG environment variable.
Server-Server Mutual Authentication
All ZooKeeper servers in an ensemble can now be configured to support quorum peer (server-server) mutual authentication, mitigating risk of spoofing by a rogue server on an unsecured network. The feature leverages Kerberos authentication through the SASL framework, so Kerberos is required.
Want to Get Involved or Learn More?
Check out our other resources
Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.