This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

What's New in CDH 5.0.0

The following topics describe new features introduced in CDH 5.0.0.

Apache Hadoop

HDFS

New Features:
  • HDFS-5776- Hedged reads in HDFS for improved HBase MTTR.
  • HDFS-4685- Implementation of extended file access control lists in HDFS.
Notable Bug Fixes:
  • HDFS-5339 - WebHDFS URI does not accept logical nameservices when security is enabled.
  • HDFS-5898 - Allow NFS gateway to login/relogin from its Kerberos keytab.
  • HDFS-5921 - "Browse filesystem" on the Namenode UI doesn't work if any directory has the sticky bit set.
  • HDFS and Hive replication between different Kerberos realms now works.
  • HDFS-5922 - DataNode heartbeat thread can get stuck in a tight loop.

MapReduce & YARN

New Feature:
  • FairScheduler supports moving running applications between queries.
Notable Bug Fixes:
  • Several critical fixes to stabilize ResourceManager HA - Web UI, unmanaged ApplicationMasters and secure-cluster support.
  • Support for large values of mapreduce.task.io.sort.mb.
  • JobHistory Server has information on failed MapReduce jobs.

Apache HBase

New Features:
  • HBASE-10436- Restore RegionServer lists removed from HBase 0.96.0 JMX.

    Many of the metrics exposed in CDH 4/0.94 were removed with the refactorization of metrics in CDH 5/0.96. This patch restores the availability of the lists of live and dead RegionServers. In 0.94 this was a large nested structure as shown below, which included the RegionServer lists and metrics from each region.

    {
        "name" : "hadoop:service=Master,name=Master",
        "modelerType" : "org.apache.hadoop.hbase.master.MXBeanImpl",
        "ZookeeperQuorum" : "localhost:2181",
    ....
        "RegionsInTransition" : [ ],
          "RegionServers" : [ {
            "key" : "localhost,48346,1390857257246",
            "value" : {
              "load" : 2,
    .... 

    CDH 5 Beta 1 and Beta 2 did not contain this list; they only displayed counts of the number of live and dead RegionServers. As of CDH 5.0.0, this list is now presented in a semi-colon separated field as follows:

    {
        "name" : "Hadoop:service=HBase,name=Master,sub=Server",
        "modelerType" : "Master,sub=Server",
        "tag.Context" : "master",
        "tag.liveRegionServers" : "localhost,56196,1391992019130",
        "tag.deadRegionServers" :
        "localhost,40010,1391035309673;localhost,41408,1391990380724;localhost,38682,1390950017735",
        ...
    }
  • Assorted usability and compatibility improvements as well as improvements to exporting snapshots.

Apache Flume

New Feature:
  • The HBase Sink now supports coalescing multiple Increment RPCs into one (FLUME-2338).
Changed Behavior:
  • File Channel Write timeout has been removed and the configuration parameter is now ignored (FLUME-2307).
  • Syslog UDP source can now accept larger messages (FLUME-2130).
  • AsyncHBase Sink is now fully functional (FLUME-2334).
  • Use standard lookup to find queue/topic in JMS Source (FLUME-2311).
Notable Bug Fixes:
  • Deadlock fixed in Dataset sink (FLUME-2320).
  • FileChannel Dual Checkpoint Backup Thread is now released on application stop (FLUME-2328).
  • Spool Dir source now checks interrupt flag before writing to channel (FLUME-2283).
  • Morphline sink increments eventDrainAttemptCount when it takes event from channel (FLUME-2323).
  • Bucketwriter now permanently closed only on idle and roll timeouts (FLUME-2325).
  • BucketWriter#close now cancels idleFuture (FLUME-2305).

Apache Oozie

As of CDH 5.0.0 Oozie includes a glob pattern feature (OOZIE-1471), allowing you do a move of wild cards in the FS Action. For example:
<fs name="archive-files">
<move source="hdfs://namenode/output/*"
target="hdfs://namenode/archive" />
<ok to="next"/>
<error to="fail"/>
</fs>

By default, up to 1000 files can be matched; you can change this default by means of the oozie.action.fs.glob.max parameter.

Cloudera Search

Page generated September 3, 2015.