Long term component architecture
As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.
With the exception of DSSD support, Cloudera Enterprise 5.6.0 is identical to CDH 5.5.2/Cloudera Manager 5.5.3 If you do not need DSSD support, you do not need to upgrade if you are already using the latest 5.5.x release.
- System Requirements
- What's New
- Supported Operating Systems
- Supported Databases
- Supported JDK Versions
- Supported Internet Protocol
Supported Operating Systems
CDH 5 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.
|Red Hat Enterprise Linux (RHEL)-compatible|
|Red Hat Enterprise Linux||5.7||64-bit|
|6.4 in SE Linux mode||64-bit|
|6.4 in SE Linux mode||64-bit|
|Oracle Linux with default kernel and Unbreakable Enterprise Kernel||5.6 (UEK R2)||64-bit|
|6.4 (UEK R2)||64-bit|
|6.5 (UEK R2, UEK R3)||64-bit|
|SLES Linux Enterprise Server (SLES)||11 with Service Pack 2 or later||64-bit|
|Ubuntu||Precise (12.04) - Long-Term Support (LTS)||64-bit|
|Trusty (14.04) - Long-Term Support (LTS)||64-bit|
|Debian||Wheezy (7.0, 7.1)||64-bit|
- CDH 5 provides only 64-bit packages.
- Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
- If you are using an operating system that is not supported by Cloudera packages, you can also download source tarballs from Downloads.
|Component||MySQL||SQLite||PostgreSQL||Oracle||Derby - see Note 4|
|Oozie||5.5, 5.6||-||8.4, 9.1, 9.2, 9.3
See Note 2
|Flume||-||-||-||-||Default (for the JDBC Channel only)|
See Note 1
|Default||8.4, 9.1, 9.2, 9.3
See Note 2
See Note 1
|-||8.4, 9.1, 9.2, 9.3
See Note 2
See Note 1
|-||8.4, 9.1, 9.2,, 9.3
See Note 2
|Sqoop 1||See Note 3||-||See Note 3||See Note 3||-|
|Sqoop 2||See Note 4||-||See Note 4||See Note 4||Default|
- MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and later.
- PostgreSQL 9.2 is supported on CDH 5.1 and later. PostgreSQL 9.3 is supported on CDH 5.2 and later.
- For the purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
- Sqoop 2 can transfer data to and from MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, and Microsoft SQL Server 2012 and above. The Sqoop 2 repository database is supported only on Derby.
- Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation and Upgrade guide for recommendations.
Supported JDK Versions
CDH 5 is supported with the versions shown in the table that follows.
Table 1. Supported JDK Versions
|Latest Certified Version||Minimum Supported Version||Exceptions|
Supported Internet Protocol
Known Issues Fixed in CDH 5.3.8
Upstream Issues Fixed
The following upstream issues are fixed in CDH 5.3.8:
- CRUNCH-525 - Correct (more) accurate default scale factors for built-in MapFn implementations
- CRUNCH-527 - Use hash smearing for partitioning
- CRUNCH-528 - Improve Pair comparison
- CRUNCH-535 - call initCredentials on the job
- CRUNCH-536 - Refactor CrunchControlledJob.Hook interface and make it client-accessible
- CRUNCH-539 - Fix reading WritableComparables bimap
- CRUNCH-540 - Make AvroReflectDeepCopier serializable
- CRUNCH-542 - Eliminate flaky Scrunch sampling test.
- CRUNCH-543 - Have AvroPathPerKeyTarget handle child directories properly
- CRUNCH-544 - Improve performance/serializability of materialized toMap.
- CRUNCH-547 - Properly handle nullability for Avro union types
- CRUNCH-548 - Have the AvroReflectDeepCopier use the class of the source object when constructing new instances instead of the target class
- CRUNCH-551 - Make the use of Configuration objects consistent in CrunchInputSplit and CrunchRecordReader
- CRUNCH-553 - Fix record drop issue that can occur w/From.formattedFile TableSources
- FLUME-1934 - Spooling Directory Source dies on encountering zero-byte files.
- FLUME-2095 - JMS source with TIBCO
- FLUME-2385 - Remove incorrect log message at INFO level in Spool Directory Source.
- FLUME-2753 - Error when specifying empty replace string in Search and Replace Interceptor
- HADOOP-11105 - MetricsSystemImpl could leak memory in registered callbacks
- HADOOP-11446 - S3AOutputStream should use shared thread pool to avoid OutOfMemoryError
- HADOOP-11463 - Replace method-local TransferManager object with S3AFileSystem#transfers.
- HADOOP-11584 - s3a file block size set to 0 in getFileStatus.
- HADOOP-11607 - Reduce log spew in S3AFileSystem.
- HADOOP-12317 - Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
- HADOOP-12404 - Disable caching for JarURLConnection to avoid sharing JarFile with other users when loading resource from URL in Configuration class
- HADOOP-12413 - AccessControlList should avoid calling getGroupNames in isUserInList with empty groups
- HDFS-7978 - Add LOG.isDebugEnabled() guard for some LOG.debug(..)
- HDFS-8384 - Allow NN to startup if there are files having a lease but are not under construction
- HDFS-8964 - When validating the edit log, do not read at or beyond the file offset that is being written
- HDFS-8965 - Harden edit log reading code against out of memory errors
- MAPREDUCE-5918 - LineRecordReader can return the same decompressor to CodecPool multiple times
- MAPREDUCE-5948 - org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
- MAPREDUCE-6277 - Job can post multiple history files if attempt loses connection to the RM
- MAPREDUCE-6439 - AM may fail instead of retrying if RM shuts down during the allocate call.
- MAPREDUCE-6481 - LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes
- MAPREDUCE-6484 - Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled
- YARN-3385 - Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes
- YARN-3469 - ZKRMStateStore: Avoid setting watches that are not required.
- YARN-3990 - AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected
- HBASE-12639 - Backport HBASE-12565 Race condition in HRegion.batchMutate() causes partial data to be written when region closes
- HBASE-13217 - Procedure fails due to ZK issue
- HBASE-13388 - Handling NullPointer in ZKProcedureMemberRpcs while getting ZNode data
- HBASE-13437 - ThriftServer leaks ZooKeeper connections
- HBASE-13471 - Fix a possible infinite loop in doMiniBatchMutation
- HBASE-13684 - Allow mlockagent to be used when not starting as root
- HBASE-13885 - ZK watches leaks during snapshots.
- HBASE-14045 - Bumping thrift version to 0.9.2.
- HBASE-14302 - TableSnapshotInputFormat should not create back references when restoring snapshot
- HBASE-14354 - Minor improvements for usage of the mlock agent
- HIVE-4867 - Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator
- HIVE-7012 - Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
- HIVE-8162 - Dynamic sort optimization propagates additional columns even in the absence of order by
- HIVE-8398 - ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc
- HIVE-8404 - ColumnPruner doesnt prune columns from limit operator
- HIVE-8560 - SerDes that do not inherit AbstractSerDe do not get table properties during initialize()
- HIVE-9195 - CBO changes constant to column type
- HIVE-9450 - Merge[Parquet] Check all data types work for Parquet in Group
- HIVE-9613 - Left join query plan outputs wrong column when using subquery
- HIVE-9984 - JoinReorder's getOutputSize is exponential
- HIVE-10319 - Hive CLI startup takes a long time with a large number of databases
- HIVE-10572 - Improve Hive service test to check empty string
- HIVE-11077 - part ofExchange partition does not properly populate fields for post/pre execute hooks.
- HIVE-11172 - Retrofit Q-Test + Vectorization wrong results for aggregate query with where clause without group by
- HIVE-11172 - Vectorization wrong results for aggregate query with where clause without group by
- HIVE-11174 - Hive does not treat floating point signed zeros as equal (-0.0 should equal 0.0 according to IEEE floating point spec)
- HIVE-11203 - Beeline force option doesn't force execution when errors occurred in a script.
- HIVE-11216 - UDF GenericUDFMapKeys throws NPE when a null map value is passed in
- HIVE-11271 - java.lang.IndexOutOfBoundsException when union all with if function
- HIVE-11288 - Avro SerDe InstanceCache returns incorrect schema
- HIVE-11333 - ColumnPruner prunes columns of UnionOperator that should be kept
- HIVE-11590 - AvroDeserializer is very chatty
- HIVE-11657 - HIVE-2573 introduces some issues during metastore init (and CLI init)
- HIVE-11695 - If user have no permission to create LOCAL DIRECTORY ，the Hql does not throw any exception and fail silently.
- HIVE-11696 - Exception when table-level serde is Parquet while partition-level serde is JSON
- HIVE-11816 - Upgrade groovy to 2.4.4
- HIVE-11824 - Insert to local directory causes staging directory to be copied
- HIVE-11995 - Remove repetitively setting permissions in insert/load overwrite partition
- HUE-2880 - [hadoop] Fix uploading large files to a kerberized HTTPFS
- HUE-2893 - [desktop] Backport CherryPy SSL file upload fix
- IMPALA-1929 - Avoiding a DCHECK of NULL hash table in spilled right joins
- IMPALA-2133 - Properly unescape string value for HBase filters
- IMPALA-2165 - Avoid cardinality 0 in scan nodes of small tables and low selectivity
- IMPALA-2178 - fix Expr::ComputeResultsLayout() logic
- IMPALA-2314 - LargestSpilledPartition was not checking if partition is closed
- IMPALA-2364 - Wrong DCHECK in PHJ::ProcessProbeBatch
- KITE-1053 - Fix int overflow bug in FS writer.
- KITE-1074 - Partial updates aka Atomic updates with loadSolr aren't recognized with Solrcloud
- MAHOUT-1771 - Cluster dumper omits indices and 0 elements for dense vector or sparse containing 0s, this closes apache/mahout#158
- MAHOUT-1771 - Cluster dumper omits indices and 0 elements for dense vector or sparse containing 0s closes apache/mahout #158
- PIG-4024 - TestPigStreamingUDF and TestPigStreaming fail on IBM JDK
- PIG-4326 - AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records
- PIG-4338 - Fix test failures with JDK8
- SENTRY-799 - unit test forFix TestDbEndToEnd flaky test - drop table/dbs before creating
- SENTRY-878 - collect_list missing from HIVE_UDF_WHITE_LIST
- SENTRY-893 - Synchronize calls in SentryClient and create sentry client once per request in SimpleDBProvider
- SOLR-5496 - Ensure all http CMs get shutdown.
- SOLR-7956 - There are interrupts on shutdown in places that can cause ChannelAlreadyClose
- SOLR-7999 - SolrRequetParserTest#testStreamURL started failing.
- SPARK-6480 - [CORE] histogram() bucket function is wrong in some simple edge cases
- SPARK-6880 - [CORE]Fixed null check when all the dependent stages are cancelled due to previous stage failure
- SPARK-8606 - Prevent exceptions in RDD.getPreferredLocations() from crashing DAGScheduler
Want to Get Involved or Learn More?
Check out our other resources
Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.