Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Please Read and Accept our Terms


 

Long term component architecture

 

As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build long term architecture on these components with confidence.
 

Important: In order to be covered by Cloudera Support:

  • All CDH hosts in a logical cluster must run on the same major OS release.
  • Cloudera temporarily allows a mixed OS configuration during an OS upgrade project.
  • Cloudera Manager must run on the same OS release as one of the CDH clusters it manages.

Cloudera recommends running the same minor release cross-cluster. However, the risk caused by running different minor OS releases is considered lower than the risk of running different major OS releases.

 

Gateway hosts may use RHEL/Centos 7.2, subject to some restrictions. See Operating System Support for Gateway Hosts (CDH 5.11 and higher only)

 

Other disclaimers:

  • RHEL / CentOS / OEL 6.9 is supported in C5.8 and higher. In C5.8-11, it has a known cipher issue.
  • RHEL / CentOS / OEL 7.0 is not supported.
  • Red Hat only supports specific upgrades from RHEL 6 to 7. Contact your OS vendor and review What are the supported use cases for upgrading to RHEL 7?
  • SLES hosts running Cloudera Manager agents must use SLES SDK 11 SP1.
  • Cloudera does not support CDH cluster deployments in Docker containers.
  • Cloudera Enterprise (without Cloudera Navigator) is supported on platforms with Security-Enhanced Linux (SELinux) enabled.

Important: Cloudera is not responsible for policy support nor policy enforcement. If you experience issues with SELinux, contact your OS support provider.

 

CDH 5.12.x Supported Operating Systems

 

Operating System Version (bold=new)
Red Hat Enterprise Linux-compatible

RHEL / CentOS

Max SE Linux support: 7.2

7.3, 7.2, 7.1

6.9 , 6.8, 6.7, 6.6, 6.5, 6.4

5.11, 5.10, 5.7

Oracle Enterprise Linux (OEL)

7.3, 7.2, 7.1 (UEK default)

6.9, 6.8 (UEK R4)

6.7, 6.6, 6.5 (UEK R3)

6.4 (UEK R2)

5.11, 5.10, 5.7 (UEK R2)

SUSE Linux Enterprise Server
SLES

12 SP2, 12 SP1

11 SP4, 11 SP3, 11 SP2

Ubuntu/Debian
Ubuntu

16.04 LTS (Xenial)

14.04 LTS (Trusty)

12.04 LTS (Precise)

Debian

8.2, 8.4 (Jessie)

7.0, 7.1, 7.8 (Wheezy)

 

Selected tab: SupportedOperatingSystems

Please see Cloudera Manager Supported Databases for a full list of supported databases for each version of Cloudera Manager.

 

Cloudera Manager and CDH come packaged with an embedded PostgreSQL database, but it is recommended that you configure your cluster with custom external databases, especially in production.

 

In most cases (but not all), Cloudera supports versions of MariaDB, MySQL and PostgreSQL that are native to each supported Linux distribution.

 

After installing a database, upgrade to the latest patch and apply appropriate updates. Available updates may be specific to the operating system on which it is installed.

 

Notes:

  • Use UTF8 encoding for all custom databases.
  • Cloudera Manager installation fails if GTID-based replication is enabled in MySQL.
  • Hue requires the default MySQL/MariaDB version (if used) of the operating system on which it is installed. See Hue Databases.
  • Both the Community and Enterprise versions of MySQL are supported, as well as MySQL configured by the AWS RDS service.

Important: When you restart processes, the configuration for each of the services is redeployed using information saved in the Cloudera Manager database. If this information is not available, your cluster does not start or function correctly. You must schedule and maintain regular backups of the Cloudera Manager database to recover the cluster in the event of the loss of this database.

Selected tab: SupportedDatabases

Only 64 bit JDKs from Oracle are supported. Oracle JDK 7 is supported across all versions of Cloudera Manager 5 and CDH 5. Oracle JDK 8 is supported in C5.3.x and higher.

 

Unless specifically excluded, support for a minor JDK release begins from the Cloudera major release in which support for the major JDK release was added. For example, 8u102 was released in time for C5.9 but is actually supported from C5.3 because that is when support for JDK 1.8 was added. Cloudera excludes or removes support for select Java updates when security is jeopardized.

 

Running CDH nodes within the same cluster on different JDK releases is not supported. JDK release across a cluster needs to match the patch level.

  • All nodes in your cluster must run the same Oracle JDK version.
  • All services must be deployed on the same Oracle JDK version.

 

JDK 7

All JDK 7 updates, from the minimum required version, are supported in CM/CDH 5.0 and higher unless specifically excluded. Updates above the minimum that are not listed are supported but not tested.

 

The Cloudera Manager repository is packaged with Oracle JDK 1.7.0_67 (for example) and can be automatically installed during a new installation or an upgrade.

 

JDK 7 updates that are supported and tested

JDK 7 Supported in all C5.x
1.7u80 Recommended / Latest version supported
1.7u75 Recommended
1.7u67 Recommended
1.7u55 Minimum required

 

 

JDK 8

All JDK 8 updates, from the minimum required version, are supported in CM/CDH 5.3 and higher unless specifically excluded. Updates above the minimum that are not listed are supported but not tested.

 

Warning: JDK 8u40, 8u45, and 8u60 are excluded from support due to a security risk: HTTP authentication can fail for web-based UI components such as HDFS, YARN, SOLR, and Oozie.Important: JDK 8u75 is supported but has a Known Issue: Oozie Web Console returns 500 error when Oozie server runs on JDK 8u75 or higher.

 

JDK 8 updates that are supported and tested

JDK 8 Supported in C5.3 and Higher
1.8u121 Recommended / Latest version supported
1.8u111 Recommended
1.8u102 Recommended
1.8u91 Recommended
1.8u74 Recommended
1.8u31 Minimum required

 

Selected tab: SupportedJDKVersions

Hue

Hue works with the two most recent LTS (long term support) or ESR (extended support release) browsers. Cookies and JavaScript must be on.

Hue can display in older, and other, browsers, but you might not have access to all of its features.

 

Important: To see all icons in the Hue Web UI, users with IE and HTTPS must add a Load Balancer.

Selected tab: SupportedBrowsers

CDH requires IPv4. IPv6 is not supported.

 

See also Configuring Network Names.

Multihoming CDH or Cloudera Manager is not supported outside specifically certified Cloudera partner appliances. Cloudera finds that current Hadoop architectures combined with modern network infrastructures and security practices remove the need for multihoming. Multihoming, however, is beneficial internally in appliance form factors to take advantage of high-bandwidth InfiniBand interconnects.

 

Although some subareas of the product may work with unsupported custom multihoming configurations, there are known issues with multihoming. In addition, unknown issues may arise because multihoming is not covered by our test matrix outside the Cloudera-certified partner appliances.

 

Selected tab: SupportedInternetProtocol

The following components are supported by the indicated versions of Transport Layer Security (TLS):

 

Components Supported by TLS

Component

Role Name Port Version
Cloudera Manager Cloudera Manager Server   7182 TLS 1.2
Cloudera Manager Cloudera Manager Server   7183 TLS 1.2
Flume     9099 TLS 1.2
Flume   Avro Source/Sink   TLS 1.2
Flume   Flume HTTP Source/Sink   TLS 1.2
HBase Master HBase Master Web UI Port 60010 TLS 1.2
HDFS NameNode Secure NameNode Web UI Port 50470 TLS 1.2
HDFS Secondary NameNode Secure Secondary NameNode Web UI Port 50495 TLS 1.2
HDFS HttpFS REST Port 14000 TLS 1.1, TLS 1.2
Hive HiveServer2 HiveServer2 Port 10000 TLS 1.2
Hue Hue Server Hue HTTP Port 8888 TLS 1.2
Impala Impala Daemon Impala Daemon Beeswax Port 21000 TLS 1.2
Impala Impala Daemon Impala Daemon HiveServer2 Port 21050 TLS 1.2
Impala Impala Daemon Impala Daemon Backend Port 22000 TLS 1.2
Impala Impala StateStore StateStore Service Port 24000 TLS 1.2
Impala Impala Daemon Impala Daemon HTTP Server Port 25000 TLS 1.2
Impala Impala StateStore StateStore HTTP Server Port 25010 TLS 1.2
Impala Impala Catalog Server Catalog Server HTTP Server Port 25020 TLS 1.2
Impala Impala Catalog Server Catalog Server Service Port 26000 TLS 1.2
Oozie Oozie Server Oozie HTTPS Port 11443 TLS 1.1, TLS 1.2
Solr Solr Server Solr HTTP Port 8983 TLS 1.1, TLS 1.2
Solr Solr Server Solr HTTPS Port 8985 TLS 1.1, TLS 1.2
Spark History Server   18080 TLS 1.2
YARN ResourceManager ResourceManager Web Application HTTP Port 8090 TLS 1.2
YARN JobHistory Server MRv1 JobHistory Web Application HTTP Port 19890 TLS 1.2

 

Selected tab: SupportedTransportLayerSecurityVersions
Selected tab: SystemRequirements

What's New in CDH 5.12.x

 

Apache Hive / Hive-on-Spark

  • Support for Microsoft Azure Data Lake Store (ADLS) as a secondary filesystem for both Hive on MapReduce2 (YARN) and Hive-on-Spark. You can now use both Hive on MapReduce2 and Hive-on-Spark to read and write data stored on ADLS.

    See Configuring ADLS Connectivity

  • The Hive schematool is integrated with Cloudera Manager where you can use it to upgrade or validate the Hive metastore schema.

    See Using the Hive Schema Tool for details.

  • HIVE-1575: Added support for JSON arrays at the root level by the get_json_object function. For example:

    SELECT get_json_object('[1,2,3]', '$[0]')...

     

Hue

Hue 4 is out and jam-packed with great new features.

New Layout in Hue 4!

  • Apps are consolidated under blue button–set your favorite as default landing page
  • Top search bar lets you search for saved queries and other data
  • Left and right assist panels let you search and filter schema objects
  • Cursor position determines which of multiple queries to run
  • New Pig editor, Job Designer, and Job Browser
  • Access old Hue 3 layout under user drop down or remove "hue" from URL.

 


 

Load Balancer Added by Default

  • During a new installation of CDH/Hue, one Load Balancer is automatically promoted to ensure optimal performance–it can reduce the Hue server load by up to 90%! In existing clusters, administrators are prompted to add a load balancer role and users are then guided on how to enable it. See the Cloudera Blog on Automatic HA.

 

Test LDAP Configuration

  • Verify your LDAP configuration, on-the-fly, with this new feature in Cloudera Manager under Hue > Actions> Test LDAP Configuration. See Authenticate Hue with LDAP.

 

Navigator Optimizer Integrated (Phase 1)

  • With Navigator Optimizer enabled in Hue, popular tables, columns, joins, filters are displayed in the autocompleter. Risky statements, such as missing filters on partitioned tables, trigger an alert.

 

Navigator Search & Tag Enabled by Default

 

Other Cool Features

  • You can create partitioned tables from files
  • Impala metadata is refreshed automatically
  • SQL autocompleter handles more advanced corner cases
  • Remote Load balancer works with SSL
  • Query history is paginated!

 

Apache Impala (incubating)

The following are some of the most significant new Impala features in this release:

  • Impala can now read and write data stored on the Microsoft Azure Data Lake Store (ADLS).

  • New built-in functions:

    • A new string function, replace(), which is faster than regexp_replace() for simple string substitutions. See Impala String Functions for details.

    • A new conditional function, nvl2(), which offers more flexibility than the nvl() function. It lets you return one value for NOT NULL arguments, and a different value for NULL arguments. See Impala Conditional Functions for details.

  • New syntax, REFRESH FUNCTIONS db_name, lets Impala recognize newly added functions, such as UDFs created through Hive. Impala scans the metadata for a specified database to locate the new functions, which is faster and more convenient than doing a full INVALIDATE METADATA operation.

  • Startup flags for the impalad daemon, is_executor and is_coordinator, let you divide the work on a large, busy cluster between a small number of hosts acting as query coordinators, and a larger number of hosts acting as query executors. By default, each host can act in both roles, potentially introducing bottlenecks during heavily concurrent workloads. See Controlling which Hosts are Coordinators and Executors for details.

  • A new query option, DEFAULT_JOIN_DISTRIBUTION_MODE, lets you change the default assumption about how join queries should handle tables with no statistics. This can help to avoid out-of-memory conditions for join queries, without manual tuning to add the /* +SHUFFLE */ hint for queries on large tables with missing statistics.

  • The SORT BY clause lets you create Parquet files with more efficient compression and smaller ranges of values for specified columns, allowing Impala to apply optimizations to skip reading data from Parquet files that do not contain any values that match equality and range operators in the WHERE clause. SeeCREATE TABLE Statement for details.

  • The max_audit_event_log_files lets you perform log rotation for the audit event log files, similar to the rotation for regular Impala log files.

  • Kudu enhancements:

    • The ALTER TABLE statement can specify more attributes for a Kudu table with the ADD COLUMNS clause. Now you can specify [NOT] NULL, ENCODING COMPRESSION, DEFAULT, and BLOCK_SIZE. See ALTER TABLE Statement for details.

    • The TIMESTAMP type is now available for Kudu tables.Note: See Handling Date, Time, or Timestamp Data with Kudu for information about the tradeoffs between performance and convenience when using this data type. For high-performance applications, you might continue to use the BIGINT type to represent date/time values.

    • The INSERT and CREATE TABLE AS SELECT statements are more efficient when writing to Kudu tables. Formerly, the overhead for the write operations could result in timeouts when writing large numbers of rows in a single operation.

 

Apache HBase

  • Apache HBase now has ADLS support and recommendations for Azure deployment.
  • Outside of cloud, HBase now has support for long-lived Spark applications via token renewal.

 

Apache Spark

Spark can read and write data on the Azure Data Lake Store (ADLS) cloud service. See Accessing Data Stored in Azure Data Lake Store (ADLS) through Spark for details.

Selected tab: WhatsNew
 
 
 
Selected tab: Documentation

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University

Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.