CDH4 and MapReduce
CDH4 introduces a new version of MapReduce: MapReduce 2.0 (MRv2) built on the YARN framework. In this document we usually refer to this new version as YARN. CDH4 also provides an implementation of the previous version of MapReduce, now referred to as MRv1. You can use the instructions on this page to install:
- MRv1 or
- YARN or
- both implementations.
MRv1 and YARN share a common set of configuration files, so it is safe to configure both of them so long as you run only one set of daemons at any one time. Cloudera does not support running MRv1 and YARN daemons on the same nodes at the same time; it will degrade performance and may result in an unstable cluster deployment.
Before deciding to deploy YARN, make sure you read the discussion below under MapReduce 2.0 (YARN).
MapReduce 2.0 (YARN)
MapReduce has undergone a complete overhaul and CDH4 now includes MapReduce 2.0 (MRv2). The fundamental idea of MRv2's YARN architecture is to split up the two primary responsibilities of the JobTracker — resource management and job scheduling/monitoring — into separate daemons: a global ResourceManager (RM) and per-application ApplicationMasters (AM). With MRv2, the ResourceManager (RM) and per-node NodeManagers (NM), form the data-computation framework. The ResourceManager service effectively replaces the functions of the JobTracker, and NodeManagers run on slave nodes instead of TaskTracker daemons. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks. For details of the new architecture, see Apache Hadoop NextGen MapReduce (YARN).
For more information about the two implementations (MRv1 and MRv2) see the discussion under Apache Hadoop MapReduce in the "What's New in Beta 1" section of New Features in CDH4.