Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Please Read and Accept our Terms


Long term component architecture

As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.

 

PLEASE NOTE:

With the exception of DSSD support, Cloudera Enterprise 5.6.0 is identical to CDH 5.5.2/Cloudera Manager 5.5.3  If you do not need DSSD support, you do not need to upgrade if you are already using the latest 5.5.x release.

 

CDH 5 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.

Operating System Version Packages
Red Hat Enterprise Linux (RHEL)-compatible
Red Hat Enterprise Linux 5.7 64-bit
  6.2 64-bit
  6.4 64-bit
  6.4 in SE Linux mode 64-bit
  6.5 64-bit
CentOS 5.7 64-bit
  6.2 64-bit
  6.4 64-bit
  6.4 in SE Linux mode 64-bit
  6.5 64-bit
Oracle Linux with default kernel and Unbreakable Enterprise Kernel 5.6 (UEK R2) 64-bit
  6.4 (UEK R2) 64-bit
  6.5 (UEK R2, UEK R3) 64-bit
SLES
SLES Linux Enterprise Server (SLES) 11 with Service Pack 2 or later 64-bit
Ubuntu/Debian
Ubuntu Precise (12.04) - Long-Term Support (LTS) 64-bit
  Trusty (14.04) - Long-Term Support (LTS) 64-bit
Debian Wheezy (7.0, 7.1) 64-bit

Note:

  • CDH 5 provides only 64-bit packages.
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera packages, you can also download source tarballs from Downloads.

 

Selected tab: SupportedOperatingSystems
Component MySQL SQLite PostgreSQL Oracle Derby - see Note 4
Oozie 5.5, 5.6 8.4, 9.1, 9.2, 9.3

See Note 2

11gR2 Default
Flume Default (for the JDBC Channel only)
Hue 5.5, 5.6

See Note 1

Default 8.4, 9.1, 9.2, 9.3

See Note 2

11gR2
Hive/Impala 5.5, 5.6

See Note 1

8.4, 9.1, 9.2, 9.3

See Note 2

11gR2 Default
Sentry 5.5, 5.6

See Note 1

8.4, 9.1, 9.2,, 9.3

See Note 2

11gR2
Sqoop 1 See Note 3 See Note 3 See Note 3
Sqoop 2 See Note 4 See Note 4 See Note 4 Default

Note:

  1. MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and later.
  2. PostgreSQL 9.2 is supported on CDH 5.1 and later. PostgreSQL 9.3 is supported on CDH 5.2 and later.
  3. For the purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  4. Sqoop 2 can transfer data to and from MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, and Microsoft SQL Server 2012 and above. The Sqoop 2 repository database is supported only on Derby.
  5. Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation and Upgrade guide for recommendations.
Selected tab: SupportedDatabases

CDH 5 is supported with the versions shown in the table that follows.

Table 1. Supported JDK Versions

Latest Certified Version Minimum Supported Version Exceptions
1.7.0_67 1.7.0_67 None
1.8.0_11 1.8.0_11 None

Selected tab: SupportedJDKVersions

CDH requires IPv4. IPv6 is not supported.

See also Configuring Network Names.

Selected tab: SupportedInternetProtocol
Selected tab: SystemRequirements

Issues Fixed in CDH 5.3.10

CDH 5.3.10 fixes the following issues.

 

Apache Hadoop

 

FSImage may get corrupted after deleting snapshot

 

Bug: HDFS-9406

When deleting a snapshot that contains the last record of a given INode, the fsimage may become corrupt because the create list of the snapshot diff in the previous snapshot and the child list of the parent INodeDirectory are not cleaned.

 

 

Apache HBase

 

The ReplicationCleaner process can abort if its connection to ZooKeeper is inconsistent.

 

Bug: HBASE-15234

If the connection with ZooKeeper is inconsistent, the ReplicationCleaner may abort, and the following event is logged by the HMaster:

WARN org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner: Aborting ReplicationLogCleaner
because Failed to get list of replicators

Unprocessed WALs accumulate.

 

The seekBefore() method calculates the size of the previous data block by assuming that data blocks are contiguous, and HFile v2 and higher store Bloom blocks and leaf-level INode blocks with the data. As a result, reverse scans do not work when Bloom blocks or leaf-level INode blocks are present when HFile v2 or higher is used.

Workaround: Restart the HMaster occasionally. The ReplicationCleaner restarts if necessary and process the unprocessed WALs.

 

 

Upstream Issues Fixed

 

The following upstream issues are fixed in CDH 5.3.10:

  • HADOOP-7713 - dfs -count -q should label output column
  • HADOOP-8944 - Shell command fs -count should include human readable option
  • HADOOP-10406 - TestIPC.testIpcWithReaderQueuing may fail
  • HADOOP-12200 - TestCryptoStreamsWithOpensslAesCtrCryptoCodec should be skipped in non-native profile
  • HADOOP-12240 - Fix tests requiring native library to be skipped in non-native profile
  • HADOOP-12280 - Skip unit tests based on maven profile rather than NativeCodeLoader.isNativeCodeLoaded
  • HADOOP-12418 - TestRPC.testRPCInterruptedSimple fails intermittently
  • HADOOP-12464 - Interrupted client may try to fail over and retry
  • HADOOP-12468 - Partial group resolution failure should not result in user lockout
  • HADOOP-12559 - KMS connection failures should trigger TGT renewal
  • HADOOP-12604 - Exception may be swallowed in KMSClientProvider
  • HADOOP-12605 - Fix intermittent failure of TestIPC.testIpcWithReaderQueuing
  • HADOOP-12682 - Fix TestKMS#testKMSRestart* failure
  • HADOOP-12699 - TestKMS#testKMSProvider intermittently fails during 'test rollover draining'
  • HADOOP-12715 - TestValueQueue#testgetAtMostPolicyALL fails intermittently
  • HADOOP-12736 - TestTimedOutTestsListener#testThreadDumpAndDeadlocks sometimes times out
  • HADOOP-12788 - OpensslAesCtrCryptoCodec should log which random number generator is used
  • HDFS-6533 - TestBPOfferService#testBasicFunctionalitytest fails intermittently
  • HDFS-6673 - Add delimited format support to PB OIV tool
  • HDFS-6799 - The invalidate method in SimulatedFSDataset failed to remove (invalidate) blocks from the file system
  • HDFS-7423 - Various typos and message formatting fixes in nfs daemon and doc
  • HDFS-7553 - Fix the TestDFSUpgradeWithHA due to BindException
  • HDFS-7990 - IBR delete ack should not be delayed
  • HDFS-8211 - DataNode UUID is always null in the JMX counter
  • HDFS-8646 - Prune cached replicas from DatanodeDescriptor state on replica invalidation
  • HDFS-9092 - NFS silently drops overlapping write requests and causes data copying to fail
  • HDFS-9250 - Add Precondition check to LocatedBlock#addCachedLoc
  • HDFS-9347 - Invariant assumption in TestQuorumJournalManager.shutdown() is wrong
  • HDFS-9358 - TestNodeCount#testNodeCount timed out
  • HDFS-9364 - Unnecessary DNS resolution attempts when creating NameNodeProxies
  • HDFS-9406 - FSImage may get corrupted after deleting snapshot
  • HDFS-9949 - Add a test case to ensure that the DataNode does not regenerate its UUID when a storage directory is cleared
  • MAPREDUCE-6302 - Incorrect headroom can lead to a deadlock between map and reduce allocations
  • MAPREDUCE-6387 - Serialize the recently added Task#encryptedSpillKey field at the end
  • MAPREDUCE-6460 - TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
  • YARN-2377 - Localization exception stack traces are not passed as diagnostic info
  • YARN-2785 - Fixed intermittent TestContainerResourceUsage failure
  • YARN-3024 - LocalizerRunner should give DIE action when all resources are localized
  • YARN-3074 - Nodemanager dies when localizer runner tries to write to a full disk
  • YARN-3464 - Race condition in LocalizerRunner kills localizer before localizing all resources.
  • YARN-3516 - Killing ContainerLocalizer action does not take effect when private localizer receives FETCH_FAILURE status
  • YARN-3727 - For better error recovery, check if the directory exists before using it for localization
  • YARN-3762 - FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
  • YARN-4204 - ConcurrentModificationException in FairSchedulerQueueInfo
  • YARN-4235 - FairScheduler PrimaryGroup does not handle empty groups returned for a user
  • YARN-4354 - Public resource localization fails with NPE
  • YARN-4380 - TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
  • YARN-4393 - Fix intermittent test failure for TestResourceLocalizationService#testFailedDirsResourceRelease
  • YARN-4613 - Fix test failure in TestClientRMService#testGetClusterNodes
  • YARN-4717 - TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
  • HBASE-10153 - Improve VerifyReplication to compute BADROWS more accurately
  • HBASE-11394 - AmendReplication can have data loss if peer id contains hyphen
  • HBASE-11394 - Replication can have data loss if peer id contains hyphen "-"
  • HBASE-11992 - Backport HBASE-11367 (Pluggable replication endpoint) to 0.98
  • HBASE-12136 - Race condition between client adding tableCF replication znode and server triggering TableCFsTracker
  • HBASE-12150 - Backport replication changes from HBASE-12145
  • HBASE-12336 - RegionServer failed to shutdown for NodeFailoverWorker thread
  • HBASE-12631 - Backport HBASE-12576 (Add metrics for rolling the HLog if there are too few DNs in the write pipeline) to 0.98
  • HBASE-12658 - Backport HBASE-12574 (Update replication metrics to not do so many map look ups) to 0.98
  • HBASE-12865 - WALs may be deleted before they are replicated to peers
  • HBASE-13035 - Backport HBASE-12867 Shell does not support custom replication endpoint specification
  • HBASE-13084 - Add labels to VisibilityLabelsCache asynchronously causes TestShell flakey
  • HBASE-13437 - ThriftServer leaks ZooKeeper connections
  • HBASE-13703 - ReplicateContext should not be a member of ReplicationSource
  • HBASE-13746 - list_replicated_tables command is not listing table in hbase shell
  • HBASE-14146 - Fix Once replication sees an error it slows down forever
  • HBASE-14501 - NPE in replication with TDE
  • HBASE-14621 - ReplicationLogCleaner gets stuck when a regionserver crashes
  • HBASE-14923 - VerifyReplication should not mask the exception during result comparison
  • HBASE-15019 - Replication stuck when HDFS is restarted
  • HBASE-15032 - hbase shell scan filter string assumes UTF-8 encoding
  • HBASE-15035 - bulkloading hfiles with tags that require splits do not preserve tags
  • HBASE-15052 - Use EnvironmentEdgeManager in ReplicationSource
  • HIVE-7524 - Enable auto conversion of SMBjoin in presence of constant propagate optimization
  • HIVE-7575 - GetTables thrift call is very slow
  • HIVE-8115 - Fixing text failures caused in CDH
  • HIVE-8115 - Hive select query hang when fields contain map
  • HIVE-8184 - Inconsistency between colList and columnExprMap when ConstantPropagate is applied to subquery
  • HIVE-9112 - Query may generate different results depending on the number of reducers
  • HIVE-9500 - Support nested structs over 24 levels
  • HIVE-9860 - MapredLocalTask/SecureCmdDoAs leaks local files
  • HIVE-10956 - Fallout fix from backport to CDH 5.3.x
  • HIVE-11977 - Hive should handle an external avro table with zero length files present
  • HIVE-12388 - GetTables cannot get external tables when TABLE type argument is given
  • HIVE-12406 - HIVE-9500 introduced incompatible change to LazySimpleSerDe public interface
  • HIVE-12713 - Miscellaneous improvements in driver compile and execute logging
  • HIVE-12790 - Metastore connection leaks in HiveServer2
  • HIVE-12946 - alter table should also add default scheme and authority for the location similar to create table
  • HUE-2767 - [impala] Issue showing sample data for a table
  • HUE-2941 - [hadoop] Cache the active RM HA
  • IMPALA-1702 - "invalidate metadata" can cause duplicate TableIds (issue not entirely fixed, but now fails gracefully)
  • IMPALA-2125 - Improve perf when reading timestamps from parquet files written by hive
  • IMPALA-2565 - Planner tests are flaky due to file size mismatches
  • IMPALA-3095 - Allow additional Kerberos users to be authorized to access internal APIs
  • OOZIE-2432 - TestPurgeXCommand fails
  • SENTRY-565 - Improve performance of filtering Hive SHOW commands
  • SENTRY-780 - HDFS Plugin should not execute path callbacks for views
  • SENTRY-835 - Drop table leaves a connection open when using metastorelistener
  • SENTRY-885 - DB name should be case insensitive in HDFS sync plugin.
  • SENTRY-936 - getGroup and getUser should always return orginal hdfs values for paths in prefix which are not sentry managed
  • SENTRY-944 - Setting HDFS rules on Sentry managed hdfs paths should not affect original hdfs rules
  • SENTRY-957 - Exceptions in MetastoreCacheInitializer should probably not prevent HMS from starting up
  • SENTRY-988 - It is better to let SentryAuthorization setter path always fall through and update HDFS
  • SENTRY-994 - SentryAuthorizationInfoX should override isSentryManaged
  • SENTRY-1002 - PathsUpdate.parsePath(path) will throw an NPE when parsing relative paths
  • SENTRY-1044 - Tables with non-hdfs locations break HMS startup
  • SPARK-12617 - [PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming

 

Selected tab: WhatsNew

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University

Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.