Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Please Read and Accept our Terms

Long term component architecture

As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.

 

PLEASE NOTE:

With the exception of DSSD support, Cloudera Enterprise 5.6.0 is identical to CDH 5.5.2/Cloudera Manager 5.5.3  If you do not need DSSD support, you do not need to upgrade if you are already using the latest 5.5.x release.

 

Note: All CDH hosts that make up a logical cluster need to run on the same major OS release to be covered by Cloudera Support. Cloudera Manager needs to run on the same OS release as one of the CDH clusters it manages, to be covered by Cloudera Support. The risk of issues caused by running different minor OS releases is considered lower than the risk of running different major OS releases. Cloudera recommends running the same minor release cross-cluster, because it simplifies issue tracking and supportability.

 

CDH 5 provides 64-bit packages for RHEL-compatible, SLES, Ubuntu, and Debian systems as listed below.

 

Operating System Version
Red Hat Enterprise Linux (RHEL)-compatible
RHEL (+ SELinux mode in available versions) 7.2, 7.1, 6.8, 6.7, 6.6, 6.5, 6.4, 5.11, 5.10, 5.7
CentOS (+ SELinux mode in available versions) 7.2, 7.1, 6.8, 6.7, 6.6, 6.5, 6.4, 5.11, 5.10, 5.7
Oracle Enterprise Linux (OEL) with Unbreakable Enterprise Kernel (UEK)

7.2 (UEK R2), 7.1, 6.8 (UEK R3), 6.7 (UEK R3),

6.6 (UEK R3), 6.5 (UEK R2, UEK R3),

6.4 (UEK R2), 5.11, 5.10, 5.7

SLES
SUSE Linux Enterprise Server (SLES)

12 with Service Pack 1,

11 with Service Pack 4,

11 with Service Pack 3,

11 with Service Pack 2

Hosts running Cloudera Manager Agents must use SUSE Linux Enterprise Software Development Kit 11 SP1.
Ubuntu/Debian
Ubuntu

Trusty 14.04 - Long-Term Support (LTS)

Precise 12.04 - Long-Term Support (LTS)

Debian

Jessie 8.4, 8.2

Wheezy 7.8, 7.1, 7.0

 

Important: Cloudera supports RHEL 7 with the following limitations:

 

Note:

  • Cloudera Enterprise is supported on platforms with Security-Enhanced Linux (SELinux) enabled. Cloudera is not responsible for policy support nor policy enforcement. If you experience issues with SELinux, contact your OS provider.
  • CDH 5.9 DataNode hosts with EMC® DSSD™ D5™ are supported by RHEL 6.6, 7.1, and 7.2.

 

Selected tab: SupportedOperatingSystems
Component MariaDB MySQL SQLite PostgreSQL Oracle Derby - see Note 5
Cloudera Manager 5.5, 10 5.6, 5.5, 5.1 9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1 12c, 11gR2  
Oozie 5.5, 10 5.6, 5.5, 5.1

9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1

See Note 3

12c, 11gR2 Default
Flume Default (for the JDBC Channel only)
Hue 5.5, 10 5.6, 5.5, 5.1

See Note 6

Default

9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1

See Note 3

12c, 11gR2
Hive/Impala 5.5, 10 5.6, 5.5, 5.1

See Note 1

9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1

See Note 3

12c, 11gR2 Default
Sentry 5.5, 10 5.6, 5.5, 5.1

See Note 1

9.4, 9.3, 9.2, 9.1. 8.4, 8.3, 8.1

See Note 3

12c, 11gR2
Sqoop 1 5.5, 10 See Note 4 See Note 4 See Note 4
Sqoop 2 5.5, 10 See Note 9 Default

 

 

Note:

  1. Cloudera supports the databases listed above provided they are supported by the underlying operating system on which they run.
  2. MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and higher. The InnoDB storage engine must be enabled in the MySQL server.
  3. Cloudera Manager installation fails if GTID-based replication is enabled in MySQL.
  4. PostgreSQL 9.2 is supported on CDH 5.1 and higher. PostgreSQL 9.3 is supported on CDH 5.2 and higher. PostgreSQL 9.4 is supported on CDH 5.5 and higher.
  5. For purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  6. Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation guide for recommendations.
  7. CDH 5 Hue requires the default MySQL version of the operating system on which it is being installed, which is usually MySQL 5.1, 5.5, or 5.6.
  8. When installing a JDBC driver, only the ojdbc6.jar file is supported for both Oracle 11g R2 and Oracle 12c; the ojdbc7.jar file is not supported.
  9. Sqoop 2 lacks some of the features of Sqoop 1. Cloudera recommends you use Sqoop 1. Use Sqoop 2 only if it contains all the features required for your use case.
  10. MariaDB 10 is supported only on CDH 5.9 and higher.
Selected tab: SupportedDatabases
CDH 5.9.x is supported with the versions shown in the following table:
Minimum Required Version(s) Excluded Version(s) Comments
JDK1.8_31 JDK1.8_40, JDK1.8_45 N/A
JDK1.7_55 N/A

Using JDK 1.7.0_80 with CDH 5.1 and CDH 5.2 causes Kerberos authentication failures with HDFS clients.

Selected tab: SupportedJDKVersions

Hue

Hue works with the two most recent versions of the following browsers. Cookies and JavaScript must be on.

  • Chrome
  • Firefox
  • Safari (not supported on Windows)
  • Internet Explorer

Hue could display in older versions and even other browsers, but you might not have access to all of its features.

 

Selected tab: SupportedBrowsers

CDH requires IPv4. IPv6 is not supported.

 

See also Configuring Network Names.

 

Multihoming CDH or Cloudera Manager is not supported outside specifically certified Cloudera partner appliances. Cloudera finds that current Hadoop architectures combined with modern network infrastructures and security practices remove the need for multihoming. Multihoming, however, is beneficial internally in appliance form factors to take advantage of high-bandwidth InfiniBand interconnects.

Although some subareas of the product may work with unsupported custom multihoming configurations, there are known issues with multihoming. In addition, unknown issues may arise because multihoming is not covered by our test matrix outside the Cloudera-certified partner appliances.

 

Selected tab: SupportedInternetProtocol

The following components are supported by the indicated versions of Transport Layer Security (TLS):

 

Components Supported by TLS

Component

Role Name Port Version
Cloudera Manager Cloudera Manager Server   7182 TLS 1.2
Cloudera Manager Cloudera Manager Server   7183 TLS 1.2
Flume     9099 TLS 1.2
Flume   Avro Source/Sink   TLS 1.2
Flume   Flume HTTP Source/Sink   TLS 1.2
HBase Master HBase Master Web UI Port 60010 TLS 1.2
HDFS NameNode Secure NameNode Web UI Port 50470 TLS 1.2
HDFS Secondary NameNode Secure Secondary NameNode Web UI Port 50495 TLS 1.2
HDFS HttpFS REST Port 14000 TLS 1.1, TLS 1.2
Hive HiveServer2 HiveServer2 Port 10000 TLS 1.2
Hue Hue Server Hue HTTP Port 8888 TLS 1.2
Impala Impala Daemon Impala Daemon Beeswax Port 21000 TLS 1.2
Impala Impala Daemon Impala Daemon HiveServer2 Port 21050 TLS 1.2
Impala Impala Daemon Impala Daemon Backend Port 22000 TLS 1.2
Impala Impala StateStore StateStore Service Port 24000 TLS 1.2
Impala Impala Daemon Impala Daemon HTTP Server Port 25000 TLS 1.2
Impala Impala StateStore StateStore HTTP Server Port 25010 TLS 1.2
Impala Impala Catalog Server Catalog Server HTTP Server Port 25020 TLS 1.2
Impala Impala Catalog Server Catalog Server Service Port 26000 TLS 1.2
Oozie Oozie Server Oozie HTTPS Port 11443 TLS 1.1, TLS 1.2
Solr Solr Server Solr HTTP Port 8983 TLS 1.1, TLS 1.2
Solr Solr Server Solr HTTPS Port 8985 TLS 1.1, TLS 1.2
Spark History Server   18080 TLS 1.2
YARN ResourceManager ResourceManager Web Application HTTP Port 8090 TLS 1.2
YARN JobHistory Server MRv1 JobHistory Web Application HTTP Port 19890 TLS 1.2
Selected tab: SupportedTransportLayerSecurityVersions
Selected tab: SystemRequirements

What's New In CDH 5.9.x

Apache Hadoop

 

  • CDH 5.9 allows you to use temporary credentials to log in to Amazon S3. You can obtain temporary credentials from Amazon's Security Token Service (STS).

 

 

Apache HBase

 

  • A tool has been added--org.apache.hadoop.hbase.replication.regionserver.DumpReplicationQueues--to dump existing replication peers, configurations, and queues when using HBase replication. The tool includes two flags:
    • --distributed - Polls each replication server for information about the replication queues being processed on this replication server. By default, this is not enabled, and the information about the replication queues and configuration is obtained from ZooKeeper.
    • --hdfs When --distributed is used, this flag attempts to calculate the total size of the WAL files used by the replication queues. Because multiple peers can be configured, this value can be overestimated.

    For more information, see Class DumpReplicationQueues.

  • Metrics have been added that expose the amount of replayed work occurring in the HBase replication system. For more information on these metrics, see Replication Metrics in the Apache HBase Reference Guide.

 

 

Apache Hive

 

 

 

Hue

 

  • HUE-2915: Integrates Hue with Amazon S3. You can now access both S3 and HDFS in the File Browser, create tables from files in S3, and save query results in S3. See how to Enable S3 Cloud Storage.

  • HUE-4039: Improves SQL Autocompleter. The new Autocompleter deeply understands Hive and Impala SQL dialects and provides smart suggestions based on your statement structure and cursor position. See how to manually Enable and Disable Autocompleter.

  • HUE-3877: Adds support for Amazon RDS. You can now deploy Hue against an Amazon RDS database instance with MySQL, PostgreSQL, and Oracle engines.

  • Rebase of Hue on upstream Hue 3.11.

 

 

Apache Impala (incubating)

 

  • Performance improvements:

    • [IMPALA-3206] Speedup for queries against DECIMAL columns in Avro tables. The code that parses DECIMAL values from Avro now uses native code generation.

    • [IMPALA-3674] Improved efficiency in LLVM code generation can reduce codegen time, especially for short queries.

    • [IMPALA-2979] Improvements to scheduling on worker nodes, enabled by the REPLICA_PREFERENCEquery option. See REPLICA_PREFERENCE Query Option (CDH 5.9 or higher only) for details.

  • [IMPALA-1683] The REFRESH statement can be applied to a single partition, rather than the entire table. See REFRESH Statement and Refreshing a Single Partition for details.

  • Improvements to the Impala web user interface:

    • [IMPALA-2767] You can now force a session to expire by clicking a link in the web UI, on the /sessions tab.

    • [IMPALA-3715] The /memz tab includes more information about Impala memory usage.

    • [IMPALA-3716] The Details page for a query now includes a Memory tab.

  • [IMPALA-3499] Scalability improvements to the catalog server. Impala handles internal communication more efficiently for tables with large numbers of columns and partitions, where the size of the metadata exceeds 2 GiB.

  • [IMPALA-3677] You can send a SIGUSR1 signal to any Impala-related daemon to write a Breakpad minidump. For advanced troubleshooting, you can now produce a minidump without triggering a crash. See Breakpad Minidumps for Impala (CDH 5.8 or higher only) for details about the Breakpad minidump feature.

  • [IMPALA-3687] The schema reconciliation rules for Avro tables have changed slightly for CHAR and VARCHAR columns. Now, if the definition of such a column is changed in the Avro schema file, the column retains its CHAR or VARCHAR type as specified in the SQL definition, but the column name and comment from the Avro schema file take precedence. See Creating Avro Tables for details about column definitions in Avro tables.

  • [IMPALA-3575] Some network operations now have additional timeout and retry settings. The extra configuration helps avoid failed queries for transient network problems, to avoid hangs when a sender or receiver fails in the middle of a network transmission, and to make cancellation requests more reliable despite network issues.

 

 

Apache Sentry

 

  • Sentry adds support for securing data on Amazon RDS. As a result, Sentry will now be able to secure URIs with an RDS schema.
  • SENTRY-1233 - Logging improvements for SentryConfigToolSolr.
  • SENTRY-1119 - Allow data engines to obtain the ActionFactory directly from the configuration, instead of having hardcoded component-specific classes. This will allow external data engines to integrate with Sentry easily.
  • SENTRY-1229 - Added a basic configurable cache to SentryGenericProviderBackend.

 

 

Apache Spark

 

  • You can now set up AWS credentials for Spark with the Hadoop credential provider, to avoid exposing the AWS secret key in configuration files.

 

 

Apache Sqoop

 

  • The mainframe import module extension has been added to support data sets on tape.

 

 

Cloudera Search

 

  • The Solr watchdog is now configured to use the fully qualified domain name (FQDN) of the host on which the Solr process is running (instead of 127.0.0.1). You can override this configuration by setting SOLR_HOSTNAME environment variable to appropriate value (before starting the Solr server).
  • Cloudera Search adds support for index snapshots. For more information on how to back up, migrate, or restore your indexed data, see Backing Up and Restoring Cloudera Search.

 

Selected tab: WhatsNew

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University

Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.