Long term component architecture
As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.
With the exception of DSSD support, Cloudera Enterprise 5.6.0 is identical to CDH 5.5.2/Cloudera Manager 5.5.3 If you do not need DSSD support, you do not need to upgrade if you are already using the latest 5.5.x release.
- System Requirements
- What's New
- Supported Operating Systems
- Supported Databases
- Supported JDK Versions
- Supported Browsers
- Supported Internet Protocol
- Supported Transport Layer Security Versions
Supported Operating Systems
|Component||MariaDB||MySQL||SQLite||PostgreSQL||Oracle||Derby - see Note 6|
|Oozie||5.5||5.5, 5.6||-||9.2, 9.3, 9.4
See Note 3
|Flume||-||-||-||-||-||Default (for the JDBC Channel only)|
|Hue||5.5||5.1, 5.5, 5.6
See Note 7
|Default||9.2, 9.3, 9.4
See Note 3
See Note 1
|-||9.2, 9.3, 9.4
See Note 3
See Note 1
|-||9.2, 9.3, 9.4
See Note 3
|Sqoop 1||5.5||See Note 4||-||See Note 4||See Note 4||-|
|Sqoop 2||5.5||See Note 5||-||See Note 5||See Note 5||Default|
- MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and higher. The InnoDB storage engine must be enabled in the MySQL server.
- Cloudera Manager installation fails if GTID-based replication is enabled in MySQL.
- PostgreSQL 9.2 is supported on CDH 5.1 and higher. PostgreSQL 9.3 is supported on CDH 5.2 and higher. PostgreSQL 9.4 is supported on CDH 5.5 and higher.
- For purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
- Sqoop 2 can transfer data to and from MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, and Microsoft SQL Server 2012 and above. The Sqoop 2 repository database is supported only on Derby and PostgreSQL.
- Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation and Upgrade guide for recommendations.
- CDH 5 Hue requires the default MySQL version of the operating system on which it is being installed, which is usually MySQL 5.1, 5.5, or 5.6.
Supported JDK Versions
Important: There is one exception to the minimum supported and recommended JDK versions in the following table. If Oracle releases a security patch that affects server-side Java before the next minor release of Cloudera products, the Cloudera support policy covers customers using the patch.
CDH 5.5.x is supported with the versions shown in the following table:
|Minimum Supported Version||Recommended Version||Exceptions|
|1.8.0_31||1.8.0_60||Cloudera recommends that you not use JDK 1.8.0_40.|
- Safari (not supported on Windows)
- Internet Explorer
Supported Internet Protocol
Supported Transport Layer Security Versions
The following components are supported by the indicated versions of Transport Layer Security (TLS):
|Flume||Avro Source/Sink||9099||TLS 1.2|
|HBase||Master||HBase Master Web UI Port||60010||TLS 1.2|
|HDFS||NameNode||Secure NameNode Web UI Port||50470||TLS 1.2|
|HDFS||Secondary NameNode||Secure Secondary NameNode Web UI Port||50495||TLS 1.2|
|HDFS||HttpFS||REST Port||14000||TLS 1.0|
|Hive||HiveServer2||HiveServer2 Port||10000||TLS 1.2|
|Hue||Hue Server||Hue HTTP Port||8888||TLS 1.2|
|Cloudera Impala||Impala Daemon||Impala Daemon Beeswax Port||21000||TLS 1.2|
|Cloudera Impala||Impala Daemon||Impala Daemon HiveServer2 Port||21050||TLS 1.2|
|Cloudera Impala||Impala Daemon||Impala Daemon Backend Port||22000||TLS 1.2|
|Cloudera Impala||Impala Daemon||Impala Daemon HTTP Server Port||25000||TLS 1.2|
|Cloudera Impala||Impala StateStore||StateStore Service Port||24000||TLS 1.2|
|Cloudera Impala||Impala StateStore||StateStore HTTP Server Port||25010||TLS 1.2|
|Cloudera Impala||Impala Catalog Server||Catalog Server HTTP Server Port||25020||TLS 1.2|
|Cloudera Impala||Impala Catalog Server||Catalog Server Service Port||26000||TLS 1.2|
|Oozie||Oozie Server||Oozie HTTPS Port||11443||TLS 1.1, TLS 1.2|
|Solr||Solr Server||Solr HTTP Port||8983||TLS 1.1, TLS 1.2|
|Solr||Solr Server||Solr HTTPS Port||8985||TLS 1.1, TLS 1.2|
|YARN||ResourceManager||ResourceManager Web Application HTTP Port||8090||TLS 1.2|
|YARN||JobHistory Server||MRv1 JobHistory Web Application HTTP Port||19890||TLS 1.2|
Operating System and Database Support
- Operating Systems - Support for RHEL/CentOS 6.6 (in SE Linux mode), 6.7, and 7.1, and Oracle Enterprise Linux 7.1.
Important: Cloudera supports RHEL 7 with the following limitations:
- Only RHEL 7.1 is supported. RHEL 7.0 is not supported.
- Only a new installation of RHEL 7.1 is supported. Upgrades from RHEL 6 to RHEL 7.1 are not supported. For more information, see Does Red Hat support upgrades between major versions of Red Hat Enterprise Linux?
- Navigator Encrypt is not supported on RHEL 7.1.
- Databases - Supports MariaDB 5.5, Oracle 12c, and PostgreSQL 9.4.
- Flume is rebased on Flume 1.6.
- FLUME-2498 Taildir source.
- FLUME-2215 ResettableFileInputStream support for ucs-4 character.
- FLUME-2729 PollableSource backoff times made configurable.
- FLUME-2628 Netcat source support for different source encodings.
- FLUME-2753 Support for empty replace string in Search and Replace interceptor.
- FLUME-2763 Flume_env script support to handle JVM parameters.
- FLUME-2095 JMS source support for username and password.
- HDFS-8828 - DistCp leverages HDFS snapshot diff to more easily build file and directory lists. The snapshot diff report provides diff information between two snapshots or between a snapshot and a non-HDFS directory.
- HADOOP-11827 - DistCp buildListing() now uses a threadpool to improve performance. To use, pass --numListstatusThreads <numThreads> to the distcpcommand. The default value is 1.
- HADOOP-1540 - DistCp supports file exclusions with a new filter option, -exclusions <argument>, to prevent files from being copied. The argument is a file that contains a list of Java regex patterns (one per line). If an exclusion pattern is matched, the file is not copied. To use, pass -filters <pathToFileterFile> to the distcp command.
- HADOOP-11219, HADOOP-7280 - WebImageViewer was upgraded to Netty 4. This does not affect the external classpath of Hadoop.
- HADOOP-8989 - The Hadoop shell now has a find utility, like that in UNIX, that allows users to search for files by name. Run hadoop fs -help find for more information.
- HDFS-6133 - HDFS balancer supports the exclusion of subtrees because running the HDFS balancer can destroy local data that is important for applications such as the HBase RegionServer.
- Improvements to HDFS scalability and performance:
- HDFS-7435 and HDFS-8867 add more efficient over-the-wire encoding.
- HDFS-7923 adds rate-limiting for block reports so that the NameNode is not swamped by DataNodes sending too many block reports at once.
- HDFS-9107 fixes a bug that could limit scalability on larger clusters by causing the NameNode to falsely consider DataNodes to be dead.
- HDFS-8792 and HDFS-7609 optimize data structures on the NameNode side.
- HDFS-7923 and HDFS-7999 eliminate some cases on the DataNode side where I/O errors lead to scans being repeated on the local disks.
- HDFS-8581 fixes some cases where a lock is held for too long.
- Other bugs included: HADOOP-11785, HADOOP-12172, HADOOP-11659, HDFS-8845
- CDH now includes a scanner heartbeat check, which enforces a time limit on the execution of scan RPC requests. When the server receives a scan RPC request, a time limit is calculated to be half of the smaller of the two values hbase.client.scanner.timeout.period and hbase.rpc.timeout. When the time limit is reached, the server returns the results it has accumulated up to that point. For more information, see Configuring the HBase Scanner Heartbeat.
- Cloudera Manager exposes new configuration options for the following HBase settings:
- Enabling TLS/SSL for HBase Thrift Server over HTTP
- Enabling TLS/SSL for HBase REST Server
- You can now limit the speed of compactions by configuring hbase.regionserver.throughput.controller and other hbase.hstore.compaction.throughput.*options. See Limiting the Speed of Compactions.
- HBase includes support for using native Hadoop libraries to calculate checksums, which enable corruption detection. This optimization increases CPU efficiency. The optimization is enabled by default, but can be disabled by setting the hbase.regionserver.checksum.verify property to false in the RegionServer Advanced Configuration Snippet (Safety Valve) for hbase-site.xml advanced configuration snippet if you use Cloudera Manager, or in hbase-site.xml if you do not.
- CDH includes the HBase configuration option hbase.use.dynamic.jars, which, if set to false, disables the automatic creation of the directory pointed to byhbase.dynamic.jars.dir (which defaults to the lib/ directory under the HBase root directory). hbase.use.dynamic.jars defaults to true, which causes the directory to be created if it does not exist. To configure this option, add the property to the advanced configuration snippet for hbase-site.xml if you use Cloudera Manager, or to hbase-site.xml otherwise.
- HIVE-10761 - Create Coda Hale-based system of metrics for Hive. The current system connects to JMX reporting, but all measurements and models are custom. The advantages of a Coda Hale-based metrics system are:
- Well-defined model for common metrics (for example, JVM metrics)
- Well-defined measurements (for example, max, mean, stddev, and mean_rate)
- Built-in reporting framework (for example, JMX, console, log, and JSON webserver).
- HIVE-11139 - Emit more lineage information. HIVE-1131 emits some column lineage information, but it does not support INSERT statements or CTAS statements. It does not emit the predicate information either. In Cloudera Navigator, this feature adds lineage diagrams for:
- CREATE TABLE AS SELECT
- INSERT INTO TABLE SELECT
- HIVE-10650 - Improve sum() function over windowing to support additional range formats. Support queries with windowing function, x preceding and y preceding, and x following and y following.
- HIVE-12287 - Lineage for lateral view shows wrong dependencies. INSERT statements with lateral views are not parsed correctly.
- HUE-2530 - Cloudera Manager supports high availability for Hue. Hue can now be load-balanced directly from Cloudera Manager.
- HUE-2288 - Metrics are added to monitor Hue usage in Cloudera Manager, specifically thread count, multi-processing processes, garbage collection, logged-in users, active requests, failed requests, and response times.
- HUE-2852 - Autocomplete is added for Hive and Impala nested queries. Also added are samples with nested columns for the customers table, three sample queries for Hive and Impala, and nested struct autocompletion for Hive and Impala editors.
- Cloudera Navigator now supports auditing of Hue. Administrators can track Hue user logins, logouts, and admin operations on users and groups.
- OOZIE-2187 - Users can now configure YARN ResourceManager and NameNode, or MapReduce JobTracker, in the Oozie server configuration file, oozie-site.xml. Set the property oozie.actions.default.job-tracker for either ResourceManager or JobTracker; set oozie.actions.default.name-node for NameNode. These properties are used when not specified by an action and not defined in the global section of workflow.xml. Cloudera Manager configures this automatically.
- OOZIE-2130 - A new Expression Language (EL) function, dateTzOffset, computes a relative date offset with regard to Daylight Savings Time. dateTzOffset is likedateOffset, but instead of using a hardcoded offset, it uses the difference between the given timezone (accounting for Daylight Savings Time) and the Oozie processing time.dateTzOffset returns an Oozie processing timezone.
- OOZIE-2160 - Oozie email action now supports attachments with an <attachment> element.
- OOZIE-2174 - The Oozie API and CLI now support administrator commands that have only been available from the REST API and Web UI. Click the link for details on these newly available Administrator Operations.
- OOZIE-1963 - An example for the Hive2 action was added to the Oozie tarball.
- OOZIE-2332 - The Hive and Hive2 actions now allow <query> to be used instead of <script>, which lets you inline Hive queries directly in your workflow instead of referencing a separate script.
- OOZIE-2356 - Users can temporarily disable Action Credentials for a specific action, all actions in a workflow, or all workflows. See the documentation for details.
- Cloudera Search adds support for Kerberos authentication for hosts running Solr behind a proxy server. For additional information, see:
- Cloudera Search adds support for using LDAP and Active Directory for authentication. For additional information, see:
- solrctl supports the Config API.
solrctl includes a config command that uses the Config API to directly manage configurations represented in Config objects. Config objects represent collection configuration information as specified by the solrctl collection --create -c configName command. instancedirs and Config objects handle the same information, meeting the same need from the Solr server perspective, but there a number of differences between these two implementations.
Table 1. Config and instancedir Comparison
Attribute Config instancedir Security Security support provided.
- In a Kerberos-enabled cluster, the ZooKeeper nodes associated with configurations created using the Config API automatically has proper ZooKeeper ACLs. Because instancedirupdates ZooKeeper directly, it is the client's responsibility to add the proper ACLs, which is cumbersome.
- Sentry can be used to control access to theConfig API, providing access control. For more information, see Enabling Sentry Authorization for Search using the Command Line.
No ZooKeeper security support. Any user can create, delete, or modify instancedirs directly in ZooKeeper. Creation method Generated from existing configs or instancedirs in ZooKeeper using the ConfigSet API. Manually edited locally and re-uploaded directly to ZooKeeper using solrctl instancedir. Template support Several predefined templates are available. These can be used as the basis for creating additional configs. Additional templates can be created by creating configs that are immutable.
Mutable templates that use a Managed Schema can be modified using the Schema API as opposed to being manually edited. As a result, configs are less flexible, but they are also less error-prone than instancedirs.
One standard template. Sentry support Configs include a number of templates, each with Sentry-enabled and non-Sentry-enabled versions. To enable Sentry, choose a Sentry-enabled template. instancedirs include a single template that supports enabling Sentry. To enable Sentry with instancedirs, overwrite the original solrconfig.xml file withsolrconfig.xml.secure as described in Enabling Sentry in Cloudera Search for CDH 5.
- Solr includes a set of built-in immutable configurations.
These templates are instantiated when Solr is initialized. This means these templates are not automatically available after an upgrade. To enable these templates on upgraded installations, use solrctl init or initialize Solr using Cloudera Manager.
Apache Sentry (incubating)
- Sentry is rebased on Apache Sentry 1.5.1.
- Sentry introduces column-level access control for tables in Hive and Impala. Previously, Sentry supported privilege granularity only at the table level. To restrict access to a column of sensitive data, you needed to first create a view for a subset of columns, and then grant privileges on that view. Sentry now allows you to assign the SELECT privilege on a subset of columns in a table. See Hive SQL Syntax for Use with Sentry.
- Support for enabling Kerberos authentication for the Sentry web server. See Using the Sentry Web Server.
- Spark is rebased on Apache Spark 1.5.0.
- Dynamic allocation is enabled by default. You can explicitly disable dynamic allocation by using the option spark.dynamicAllocation.enabled = false. Dynamic allocation is implicitly disabled if --num-executors is specified in the job.
- The following Spark libraries are now supported:
- Sqoop is rebased on Apache Sqoop 1.4.6.
Want to Get Involved or Learn More?
Check out our other resources
Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.