This is the documentation for CDH 4.4.0.
Documentation for other versions is available at Cloudera Documentation.

New Features in CDH4

This section lists new features in CDH4. See the following sections for more information. For links to the detailed change lists that describe the bug fixes and improvements to all of the CDH4 projects, including bug-fix reports for the corresponding upstream Apache projects, see the packaging section of CDH Version and Packaging Information.

About Apache Hadoop MapReduce Version 1 (MRv1) and Version 2 (MRv2 or YARN)

  • MapReduce 2.0 (MRv2 or YARN): CDH4 includes (but does not require) MapReduce 2.0 (MRv2 or YARN). The fundamental idea of the YARN architecture is to split up the two primary responsibilities of the JobTracker — resource management and job scheduling/monitoring — into separate daemons: a global ResourceManager (RM) and per-application ApplicationMasters (AM). With MRv2, the ResourceManager (RM) and per-node NodeManagers (NM), form the data-computation framework. The ResourceManager service effectively replaces the functions of the JobTracker, and NodeManagers run on slave nodes instead of TaskTracker daemons. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks. For details of the new architecture, see Apache Hadoop NextGen MapReduce (YARN).
  Important: Cloudera does not consider the current upstream MRv2 release stable yet, and it could potentially change in non-backwards-compatible ways. Cloudera recommends that you use MRv1 unless you have particular reasons for using MRv2, which should not be considered production-ready.
  • MapReduce Version 1 (MRv1): CDH4 continues to support the original MapReduce framework (i.e. the JobTracker and TaskTrackers). This framework is referred to as MRv1. You can deploy either MRv1 or MRv2; Cloudera does not support running MRv1 and YARN daemons on the same nodes at the same time.
    • MRv1 in CDH4 is based on its counterpart in CDH3, with some changes to make the MR API compatible with Hadoop 2.0.0 (and Hadoop 0.23 and later). This means that users will need to recompile their applications when going from CDH3 to CDH4 (even when continuing to use MRv1). Recompilation will not be necessary when going from MRv1 to MRv2 within CDH4.
  • Deprecated properties:
    In Hadoop 2.0.0 and later (MRv2), a number of Hadoop and HDFS properties have been deprecated. (The change dates from Hadoop 0.23.1, on which the Beta releases of CDH4 were based). A list of deprecated properties and their replacements can be found at http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-common/DeprecatedProperties.html.
      Note: All of these deprecated properties continue to work in MRv1. Conversely the newmapreduce* properties listed do not work in MRv1.