Managing MapReduce and YARN

CDH supports two versions of the MapReduce computation framework: MRv1 and MRv2, which are implemented by the MapReduce (MRv1) and YARN (MRv2) services. YARN is backwards-compatible with MapReduce. (All jobs that run against MapReduce will also run in a YARN cluster).

The MRv2 YARN architecture splits the two primary responsibilities of the JobTracker — resource management and job scheduling/monitoring — into separate daemons: a global ResourceManager (RM) and per-application ApplicationMasters (AM). With MRv2, the ResourceManager (RM) and per-node NodeManagers (NM) form the data-computation framework. The ResourceManager service effectively replaces the functions of the JobTracker, and NodeManagers run on worker hosts instead of TaskTracker daemons. The per-application ApplicationMaster is, in effect, a framework-specific library and negotiates resources from the ResourceManager and works with the NodeManagers to execute and monitor the tasks. For details of this architecture, see Apache Hadoop NextGen MapReduce (YARN).

Defaults and Recommendations

  • In a Cloudera Manager deployment of a CDH 4 cluster, the MapReduce service is the default MapReduce computation framework.You can create a YARN service in a CDH 4 cluster, but it is not considered production ready.
  • In a Cloudera Manager deployment of a CDH 5 cluster, the YARN service is the default MapReduce computation framework.In CDH 5, the MapReduce service has been deprecated. However, the MapReduce service is fully supported for backward compatibility through the CDH 5 life cycle.
  • For production uses, Cloudera recommends that only one MapReduce framework should be running at any given time. If development needs or other use case requires switching between MapReduce and YARN, both services can be configured at the same time, but only one should be in a running (to fully optimize the hardware resources available).

Migrating from MapReduce to YARN

Cloudera Manager provides a wizard described in Importing MapReduce Configurations to YARN to easily migrate MapReduce configurations to YARN. The wizard performs all the steps (Switching Between MapReduce and YARN Services, Updating Dependent Services, and Configuring Alternatives Priority) on this page.

For detailed information on migrating from MapReduce to YARN, see Migrating from MapReduce 1 (MRv1) to MapReduce 2 (MRv2, YARN).

Switching Between MapReduce and YARN Services

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

MapReduce and YARN use separate sets of configuration files. No files are removed or altered when you change to a different framework. To change from YARN to MapReduce (or vice versa):
  1. (Optional) Configure the new MapReduce or YARN framework service.
  2. Update dependent services to use the chosen framework.
  3. Configure the alternatives priority.
  4. Redeploy the Oozie ShareLib.
  5. Redeploy the client configuration.
  6. Start the framework service to switch to.
  7. (Optional) Stop the unused framework service to free up the resources it uses.

Updating Dependent Services

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

When you change the MapReduce framework, the dependent services that must be updated to use the new framework are:
  • Hive
  • Sqoop 2
  • Oozie
To update a service:
  1. Go to the service.
  2. Click the Configuration tab.
  3. Select Service-Wide.
  4. Click the MapReduce Service property and select the YARN or MapReduce service.
  5. Click Save Changes to commit the changes.
  6. Select Actions > Restart.
The Hue service is automatically reconfigured to use the same framework as Oozie and Hive. This cannot be changed. To update the Hue service:
  1. Go to the Hue service.
  2. Select Actions > Restart.

Configuring Alternatives Priority

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

The alternatives priority property determines which service—MapReduce or YARN—is used by clients to run MapReduce jobs. The service with a higher value of the property is used. In CDH 4, the MapReduce service alternatives priority is set to 92 and the YARN service is set to 91. In CDH 5, the values are reversed; the MapReduce service alternatives priority is set to 91 and the YARN service is set to 92.

To configure the alternatives priority:
  1. Go to the MapReduce or YARN service.
  2. Click the Configuration tab.
  3. Expand the Gateway Default Group node.
  4. In the Alternatives Priority property, set the priority value.
  5. Click Save Changes to commit the changes.
  6. Redeploy the client configuration.