What's New in CDH 5.0.x

What's New in CDH 5.0.0

This is a major release which includes new features, changes, and fixed issues. See also Issues Fixed in CDH 5.0.x and Known Issues in CDH 5.

For information about CDH 5 Beta releases, see What's New In CDH 5 Beta Releases.

New Features and Changes in CDH 5.0.0

CDH 5.0.0 introduces the following new features and changes, organized by component.

Apache Hadoop

Apache Hadoop MapReduce Version 1 (MRv1) and Version 2 (MRv2)
  • MapReduce 2.0 (MRv2): CDH 5 includes MapReduce 2.0 (MRv2) running on YARN. The YARN architecture splits up the two primary responsibilities of the JobTracker — resource management and job scheduling/monitoring — into separate daemons: a global ResourceManager (RM) and per-application ApplicationMasters (AM). With MRv2, the ResourceManager (RM) and per-node NodeManagers (NM) form the data-computation framework. The ResourceManager service effectively replaces the functions of the JobTracker, and NodeManagers run on worker nodes instead of TaskTracker daemons. The per-application ApplicationMaster is, in effect, a framework-specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManagers to execute and monitor tasks. See Apache Hadoop NextGen MapReduce (YARN) for more information.
  • MapReduce Version 1 (MRv1): For backward compatibility, CDH 5 continues to support the original MapReduce JobTracker and TaskTrackers, but you should migrate to MRv2. For more information, see Migrating from MapReduce 1 (MRv1) to MapReduce 2 (MRv2, YARN).
  • Deprecated properties:
    In Hadoop 2.0.0 and later (MRv2), several Hadoop and HDFS properties have been deprecated. (The change dates from Hadoop 0.23.1, on which the Beta releases of CDH 4 were based). See Hadoop Deprecated Properties for a list of deprecated properties and their replacements.
HDFS
New Features:
  • HDFS-5776- Hedged reads in HDFS for improved HBase MTTR.
  • HDFS-4685- Implementation of extended file access control lists in HDFS.
Notable Bug Fixes:
  • HDFS-5339 - WebHDFS URI does not accept logical nameservices when security is enabled.
  • HDFS-5898 - Allow NFS gateway to login/relogin from its Kerberos keytab.
  • HDFS-5921 - "Browse filesystem" on the Namenode UI does not work if any directory has the sticky bit set.
  • HDFS and Hive replication between different Kerberos realms now works.
  • HDFS-5922 - DataNode heartbeat thread can get stuck in a tight loop.
MapReduce & YARN
New Feature:
  • FairScheduler supports moving running applications between queries.
Notable Bug Fixes:
  • Several critical fixes to stabilize ResourceManager HA - Web UI, unmanaged ApplicationMasters and secure-cluster support.
  • Support for large values of mapreduce.task.io.sort.mb.
  • JobHistory Server has information on failed MapReduce jobs.

Apache HBase

New Features:
  • HBASE-10436- Restore RegionServer lists removed from HBase 0.96.0 JMX.

    Many of the metrics exposed in CDH 4/0.94 were removed with the refactorization of metrics in CDH 5/0.96. This patch restores the availability of the lists of live and dead RegionServers. In 0.94 this was a large nested structure as shown below, which included the RegionServer lists and metrics from each region.

    {
        "name" : "hadoop:service=Master,name=Master",
        "modelerType" : "org.apache.hadoop.hbase.master.MXBeanImpl",
        "ZookeeperQuorum" : "localhost:2181",
    ....
        "RegionsInTransition" : [ ],
          "RegionServers" : [ {
            "key" : "localhost,48346,1390857257246",
            "value" : {
              "load" : 2,
    .... 

    CDH 5 Beta 1 and Beta 2 did not contain this list; they only displayed counts of the number of live and dead RegionServers. As of CDH 5.0.0, this list is now presented in a semi-colon separated field as follows:

    {
        "name" : "Hadoop:service=HBase,name=Master,sub=Server",
        "modelerType" : "Master,sub=Server",
        "tag.Context" : "master",
        "tag.liveRegionServers" : "localhost,56196,1391992019130",
        "tag.deadRegionServers" :
        "localhost,40010,1391035309673;localhost,41408,1391990380724;localhost,38682,1390950017735",
        ...
    }
  • Assorted usability and compatibility improvements as well as improvements to exporting snapshots.

Apache Flume

New Feature:
  • The HBase Sink now supports coalescing multiple Increment RPCs into one (FLUME-2338).
Changed Behavior:
  • File Channel Write timeout has been removed and the configuration parameter is now ignored (FLUME-2307).
  • Syslog UDP source can now accept larger messages (FLUME-2130).
  • AsyncHBase Sink is now fully functional (FLUME-2334).
  • Use standard lookup to find queue/topic in JMS Source (FLUME-2311).
Notable Bug Fixes:
  • Deadlock fixed in Dataset sink (FLUME-2320).
  • FileChannel Dual Checkpoint Backup Thread is now released on application stop (FLUME-2328).
  • Spool Dir source now checks interrupt flag before writing to channel (FLUME-2283).
  • Morphline sink increments eventDrainAttemptCount when it takes event from channel (FLUME-2323).
  • Bucketwriter now permanently closed only on idle and roll timeouts (FLUME-2325).
  • BucketWriter#close now cancels idleFuture (FLUME-2305).

Apache Oozie

As of CDH 5.0.0 Oozie includes a glob pattern feature (OOZIE-1471), allowing you do a move of wild cards in the FS Action. For example:
<fs name="archive-files">
<move source="hdfs://namenode/output/*"
target="hdfs://namenode/archive" />
<ok to="next"/>
<error to="fail"/>
</fs>

By default, up to 1000 files can be matched; you can change this default by means of the oozie.action.fs.glob.max parameter.

Cloudera Search

What's New in CDH 5.0.1

This is a maintenance release which fixes several issues. In addition, it introduces a change to the configuration for HTTPS communication between HDFS and YARN. See also Issues Fixed in CDH 5.0.1.

New Features and Changes in CDH 5.0.1

Enabling TLS/SSL in CDH 5: Enabling HTTPS communication in CDH 5 requires extra configuration properties to be added to YARN (yarn-site.xml and mapred-site.xml) and HDFS (hdfs-site.xml), in addition to the existing configuration settings described here. For additional information, see YARN and HDFS.

What's New in CDH 5.0.2

This is a maintenance release which fixes several issues. See Issues Fixed in CDH 5.0.2.

What's New in CDH 5.0.3

This is a maintenance release that fixes several issues. See Issues Fixed in CDH 5.0.3.

What's New in CDH 5.0.4

This is a maintenance release that fixes several issues. See Issues Fixed in CDH 5.0.4.

What's New in CDH 5.0.5

This is a maintenance release that fixes the “POODLE” and Apache Hadoop Distributed Cache vulnerabilities described in “POODLE” Vulnerability on TLS/SSL enabled ports, as well as other issues. All CDH 5.0.x users should upgrade to 5.0.5 as soon as possible. See Issues Fixed in CDH 5.0.5.

What's New in CDH 5.0.6

This is a maintenance release that fixes several issues. See Issues Fixed in CDH 5.0.6.