Long term component architecture
As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.
With the exception of DSSD support, Cloudera Enterprise 5.6.0 is identical to CDH 5.5.2/Cloudera Manager 5.5.3 If you do not need DSSD support, you do not need to upgrade if you are already using the latest 5.5.x release.
- System Requirements
- What's New
- Supported Operating Systems
- Supported Databases
- Supported JDK Versions
- Supported Browsers
- Supported Internet Protocol
- Supported Transport Layer Security Versions
Supported Operating Systems
Please see Cloudera Manager Supported Databases for a full list of supported databases for each version of Cloudera Manager.
Cloudera Manager and CDH come packaged with an embedded PostgreSQL database, but it is recommended that you configure your cluster with custom external databases, especially in production.
In most cases (but not all), Cloudera supports versions of MariaDB, MySQL and PostgreSQL that are native to each supported Linux distribution.
After installing a database, upgrade to the latest patch and apply appropriate updates. Available updates may be specific to the operating system on which it is installed.
- Use UTF8 encoding for all custom databases.
- Cloudera Manager installation fails if GTID-based replication is enabled in MySQL.
- Hue requires the default MySQL/MariaDB version (if used) of the operating system on which it is installed. See Hue Databases.
- Both the Community and Enterprise versions of MySQL are supported, as well as MySQL configured by the AWS RDS service.
Important: When you restart processes, the configuration for each of the services is redeployed using information saved in the Cloudera Manager database. If this information is not available, your cluster does not start or function correctly. You must schedule and maintain regular backups of the Cloudera Manager database to recover the cluster in the event of the loss of this database.
Supported JDK Versions
A supported minor JDK release will remain supported throughout a Cloudera major release lifecycle, from the time of its addition forward, unless specifically excluded.
Warning: JDK 1.8u40 and JDK 1.8u60 are excluded from support. Also, the Oozie Web Console returns 500 error when Oozie server runs on JDK 8u75 or higher.
Running CDH nodes within the same cluster on different JDK releases is not supported. JDK release across a cluster needs to match the patch level.
- All nodes in your cluster must run the same Oracle JDK version.
- All services must be deployed on the same Oracle JDK version.
The Cloudera Manager repository is packaged with Oracle JDK 1.7.0_67 (for example) and can be automatically installed during a new installation or an upgrade.
For a full list of supported JDK Versions please see CDH and Cloudera Manager Supported JDK Versions.
- Chrome: Version history
- Firefox: Version history
- Internet Explorer: Version history
- Safari (Mac only): Version history
Hue can display in older, and other, browsers, but you might not have access to all of its features.Important: To see all icons in the Hue Web UI, users with IE and HTTPS must add a Load Balancer.
Supported Internet Protocol
CDH requires IPv4. IPv6 is not supported.
See also Configuring Network Names.
Multihoming CDH or Cloudera Manager is not supported outside specifically certified Cloudera partner appliances. Cloudera finds that current Hadoop architectures combined with modern network infrastructures and security practices remove the need for multihoming. Multihoming, however, is beneficial internally in appliance form factors to take advantage of high-bandwidth InfiniBand interconnects.
Although some subareas of the product may work with unsupported custom multihoming configurations, there are known issues with multihoming. In addition, unknown issues may arise because multihoming is not covered by our test matrix outside the Cloudera-certified partner appliances.
Supported Transport Layer Security Versions
The following components are supported by the indicated versions of Transport Layer Security (TLS):
Components Supported by TLS
|Cloudera Manager||Cloudera Manager Server||7182||TLS 1.2|
|Cloudera Manager||Cloudera Manager Server||7183||TLS 1.2|
|Flume||Avro Source/Sink||TLS 1.2|
|Flume||Flume HTTP Source/Sink||TLS 1.2|
|HBase||Master||HBase Master Web UI Port||60010||TLS 1.2|
|HDFS||NameNode||Secure NameNode Web UI Port||50470||TLS 1.2|
|HDFS||Secondary NameNode||Secure Secondary NameNode Web UI Port||50495||TLS 1.2|
|HDFS||HttpFS||REST Port||14000||TLS 1.1, TLS 1.2|
|Hive||HiveServer2||HiveServer2 Port||10000||TLS 1.2|
|Hue||Hue Server||Hue HTTP Port||8888||TLS 1.2|
|Impala||Impala Daemon||Impala Daemon Beeswax Port||21000||TLS 1.2|
|Impala||Impala Daemon||Impala Daemon HiveServer2 Port||21050||TLS 1.2|
|Impala||Impala Daemon||Impala Daemon Backend Port||22000||TLS 1.2|
|Impala||Impala StateStore||StateStore Service Port||24000||TLS 1.2|
|Impala||Impala Daemon||Impala Daemon HTTP Server Port||25000||TLS 1.2|
|Impala||Impala StateStore||StateStore HTTP Server Port||25010||TLS 1.2|
|Impala||Impala Catalog Server||Catalog Server HTTP Server Port||25020||TLS 1.2|
|Impala||Impala Catalog Server||Catalog Server Service Port||26000||TLS 1.2|
|Oozie||Oozie Server||Oozie HTTPS Port||11443||TLS 1.1, TLS 1.2|
|Solr||Solr Server||Solr HTTP Port||8983||TLS 1.1, TLS 1.2|
|Solr||Solr Server||Solr HTTPS Port||8985||TLS 1.1, TLS 1.2|
|Spark||History Server||18080||TLS 1.2|
|YARN||ResourceManager||ResourceManager Web Application HTTP Port||8090||TLS 1.2|
|YARN||JobHistory Server||MRv1 JobHistory Web Application HTTP Port||19890||TLS 1.2|
The following upstream issues are fixed in CDH 5.11.2:
- FLUME-2752 - Fix AvroSource startup resource leaks
- FLUME-2905 - Fix NetcatSource file descriptor leak if startup fails
- HADOOP-12751 - While using kerberos Hadoop incorrectly assumes names with '@' to be non-simple .
- HADOOP-14141 - Store KMS SSL keystore password in catalina.properties
- HADOOP-14242 - Make KMS Tomcat SSL property sslEnabledProtocols and clientAuth configurable
- HADOOP-14511 - WritableRpcEngine.Invocation#toString NPE on null parameters
- HDFS-6757 - Simplify lease manager with INodeID
- HDFS-8856 - Make LeaseManager#countPath O(1).
- HDFS-10220 - A large number of expired leases can make namenode unresponsive and cause failover
- HDFS-10506 - OIV's ReverseXML processor cannot reconstruct some snapshot details
- HDFS-11579 - Make HttpFS Tomcat SSL property sslEnabledProtocols and clientAuth configurable
- HDFS-11708 - Positional read will fail if replicas moved to different DNs after stream is opened
- HDFS-11741 - Long running balancer may fail due to expired DataEncryptionKey
- HDFS-11861 - ipc.Client.Connection#sendRpcRequest should log request name
- HDFS-11881 - NameNode consumes a lot of memory for snapshot diff report generation
- HDFS-11960 - Successfully closed files can stay under-replicated
- HDFS-12042 - Lazy initialize AbstractINodeDiffList#diffs for snapshots to reduce memory consumption
- HDFS-12139 - HTTPFS liststatus returns incorrect pathSuffix for path of file
- YARN-2780 - Log aggregated resource allocation in rm-appsummary.log
- YARN-6368 - Decommissioning an NM results in a -1 exit code
- YARN-6615 - AmIpFilter drops query parameters on redirect
- HBASE-15720 - Print row locks at the debug dump page
- HBASE-15837 - Memstore size accounting is wrong if postBatchMutate() throws exception
- HBASE-16033 - Add more details in logging of responseTooSlow/TooLarge
- HBASE-16630 - Fragmentation in long running Bucket Cache
- HBASE-16739 - Timed out exception message should include encoded region name
- HBASE-16977 - VerifyReplication should log a printable representation of the row keys
- HBASE-17131 - Avoid livelock caused by HRegion#processRowsWithLocks
- HBASE-17501 - guard against NPE while reading FileTrailer and HFileBlock
- HBASE-17587 - Do not Rethrow DoNotRetryIOException as UnknownScannerException
- HBASE-17673 - Monitored RPC Handler not shown in the WebUI
- HBASE-17688 - MultiRowRangeFilter not working correctly if given same start and stop RowKey
- HBASE-17710 - HBase in standalone mode creates directories with 777 permission
- HBASE-17731 - Fractional latency reporting in MultiThreadedAction
- HBASE-17798 - RpcServer.Listener.Reader can abort due to CancelledKeyException
- HBASE-17970 - Set yarn.app.mapreduce.am.staging-dir when starting MiniMRCluster
- HBASE-18096 - Limit HFileUtil visibility and add missing annotations
- HIVE-9567 - JSON SerDe not escaping special chars when writing char/varchar data
- HIVE-10209 - FetchTask with VC may fail because ExecMapper.done is true
- HIVE-11418 - Dropping a database in an encryption zone with CASCADE and trash enabled fails
- HIVE-11592 - ORC metadata section can sometimes exceed protobuf message size limit
- HIVE-11878 - ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
- HIVE-12274 - Increase width of columns used for general configuration in the metastore.
- HIVE-12551 - Fix several kryo exceptions in branch-1
- HIVE-12762 - Common join on parquet tables returns incorrect result when hive.optimize.index.filter set to true
- HIVE-13330 - ORC vectorized string dictionary reader does not differentiate null vs empty string dictionary
- HIVE-13947 - HoS prints wrong number for hash table size in map join scenario
- HIVE-14178 - Hive::needsToCopy should reuse FileUtils::equalsFileSystem
- HIVE-14564 - Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.
- HIVE-15122 - Hive: Upcasting types should not obscure stats (min/max/ndv)
- HIVE-16004 - OutOfMemory in SparkReduceRecordHandler with vectorization mode
- HIVE-16060 - GenericUDTFJSONTuple's json cache could overgrow beyond its limit
- HIVE-16291 - Hive fails when unions a parquet table with itself
- HIVE-16413 - Create table as select does not check ownership of the location
- HIVE-16559 - Parquet schema evolution for partitioned tables may break if table and partition serdes differ
- HIVE-16593 - SparkClientFactory.stop may prevent JVM from exiting
- HIVE-16647 - Improve the validation output to make the output to stderr and stdout more consistent
- HIVE-16660 - Not able to add partition for views in hive when sentry is enabled
- HIVE-16665 - Race condition in Utilities.GetInputPathsCallable --> createDummyFileForEmptyPartition
- HIVE-16693 - beeline "source" command freezes if you have a comment in it.
- HIVE-16697 - Schema table validator should return a sorted list of missing tables
- HIVE-16869 - Hive returns wrong result when predicates on non-existing columns are pushed down to Parquet reader
- HIVE-16930 - HoS should verify the value of Kerberos principal and keytab file before adding them to spark-submit command parameters
- HIVE-16935 - Hive should strip comments from input before choosing which CommandProcessor to run.
- HIVE-17050 - Multiline queries that have comment in middle fail when executed via "beeline -e"
- HIVE-17052 - Remove logging of predicate filters
- HIVE-17149 - Hdfs directory is not cleared if partition creation failed on HMS
- HUE-5504 - [oozie] Only use JDBC URL from hive2 action when hardcoded
- HUE-6370 - [jobsub] Attempting to create and save any JobSub fails
- HUE-6398 - [jobsub] Fix loading edit design page
- HUE-6407 - [pig] Play button doesn't come back after killing the running pig job
- HUE-6446 - [oozie] User cant edit shared coordinator or bundle
- HUE-6604 - [oozie] Fix timestamp conversion to server timezone
- HUE-6694 - [aws] Gracefully handle bucket creation error for non-DNS compliant names
- HUE-6695 - [aws] Provide user-friendly message when accessing forbidden paths
- HUE-6696 - [aws] Display user friendly message when accessing non-DNS compliant bucket
- HUE-6702 - [about] About page is accessible without authentication
- HUE-6710 - [notebook] Application reachable directly by users without granted access
- HUE-6791 - [search] Protect against pivot facets conflicting with nested facets
- HUE-6813 - [metastore] Fix database comment last character truncation
- HUE-6814 - [search] Only return distinct usernames in security impersonate dropdown
- HUE-6819 - [oozie] Set generic widget for generic actions in oozie graph
- HUE-6856 - [search] Protect against reflected XSS in search query parameters
- HUE-6950 - [saml] Create home directory for new user login
- HUE-6995 - [oozie] Set minimum width of a workflow node
- IMPALA-4276 - Profile displays non-default query options set by planner
- IMPALA-4546 - Fix Moscow timezone conversion after 2014
- IMPALA-4631 - don't use floating point operations for time unit conversions
- IMPALA-4716 - Expr rewrite causes IllegalStateException
- IMPALA-4738 - STDDEV_SAMP should return NULL for single record input
- IMPALA-4962 - Fix SHOW COLUMN STATS for HS2
- IMPALA-5021 - Fix count(*) remaining rows overflow in Parquet.
- IMPALA-5056 - Ensure analysis uses 'fresh' catalog after metadata loading
- IMPALA-5154 - Handle 'unpartitioned' Kudu tables
- IMPALA-5172 - Buffer overrun for Snappy decompression
- IMPALA-5187 - Bump breakpad version to include the fix for Breakpad #681, re-enable the strict check that was disabled in IMPALA-3794.
- IMPALA-5189 - Pin version of setuptools-scm
- IMPALA-5197 - Erroneous corrupted Parquet file message
- IMPALA-5198 - Error messages are sometimes dropped before reaching client
- IMPALA-5217 - KuduTableSink checks null constraints incorrectly
- IMPALA-5223 - Add waiting for HBase Zookeeper nodes to retry loop
- IMPALA-5301 - Set Kudu minicluster memory limit
- IMPALA-5318 - Generate access events with fully qualified table names
- IMPALA-5355 - Fix the order of Sentry roles and privileges
- IMPALA-5363 - Reset probe_batch_ after reaching limit
- IMPALA-5419 - Check for cancellation when building hash tables
- IMPALA-5469 - Fix exception when processing catalog update
- IMPALA-5487 - Fix race in RuntimeProfile::toThrift()
- IMPALA-5524 - Fixes NPE during planning with DISABLE_UNSAFE_SPILLS=1
- IMPALA-5554 - sorter DCHECK on null column
- IMPALA-5580 - fix Java UDFs that return NULL strings
- IMPALA-5615 - Fix compute incremental stats for general partition exprs
- IMPALA-5623 - Fix lag() on STRING cols to release UDF mem
- IMPALA-5638 - Fix Kudu table set tblproperties inconsistencies
- IMPALA-5657 - Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS
- IMPALA-5172 - fix incorrect cast in call to LZO decompress
- KITE-1155 - Deleting an already deleted empty path should not fail the job
- OOZIE-2816 - Strip out the first command word from Sqoop action if its "sqoop"
- OOZIE-2844 - Increase stability of Oozie actions when log4j.properties is missing or not readable
- OOZIE-2872 - Address backward compatibility issue introduced by OOZIE-2748
- OOZIE-2908 - Fix typo in oozie.actions.null.args.allowed property in oozie-default.xml
- OOZIE-2923 - Improve Spark options parsing
- OOZIE-2984 - Parse spark-defaults.conf values with spaces without needing the quotes
- PARQUET-389 - Support predicate push down on missing columns.
- PIG-3567 - LogicalPlanPrinter throws OOM for large scripts
- PIG-3655 - BinStorage and InterStorage approach to record markers is broken
- SENTRY-1644 - Partition ACLs disappear after renaming Hive table with partitions.
- SENTRY-1646 - Unable to truncate table <database>.<tablename>; from "default" databases
- SENTRY-1759 - UpdatableCache leaks connections
- SENTRY-1811 - Optimize data structures used in HDFS sync
- SENTRY-1827 - Minimize TPathsDump thrift message used in HDFS sync
- SOLR-6673 - MDC-based logging of collection, shard, etc.
- SOLR-8836 - /update should return BAD REQUEST when invalid JSON provided.
- SOLR-9153 - Update beanutils version to 1.9.2
- SOLR-9527 - Solr RESTORE API doesn't distribute the replicas uniformly
- SOLR-10076 - Hiding keystore and truststore passwords from /admin/info/* outputs.
- SOLR-10889 - Stale zookeeper information is used during failover check.
- SPARK-13278 - Launcher fails to start with JDK 9 EA
- SPARK-15067 - YARN executors are launched with fixed perm gen size
- SPARK-16845 - org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering `grows beyond 64 KB
- SPARK-19019 - PySpark does not work with Python 3.6.0
- SPARK-19688 - Spark on YARN Credentials File set to different application directory.
- SPARK-20393 - Strengthen Spark to prevent XSS vulnerabilities
- SPARK-20904 - Task failures during shutdown cause problems with preempted executors.
- SPARK-20922 - Unsafe deserialization in Spark LauncherConnection.
- ZOOKEEPER-1653 - zookeeper fails to start because of inconsistent epoch.
- ZOOKEEPER-2040 - Server to log underlying cause of SASL connection problems.
Want to Get Involved or Learn More?
Check out our other resources
Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.