Managing MapReduce

For an overview of computation frameworks, insight into their usage and restrictions, and examples of common tasks they perform, see Managing YARN (MRv2) and MapReduce (MRv1).

Configuring the MapReduce Scheduler

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

The MapReduce service is configured by default to use the FairScheduler. You can change the scheduler type to FIFO or Capacity Scheduler. You can also modify the Fair Scheduler and Capacity Scheduler configuration. For further information on schedulers, see YARN (MRv2) and MapReduce (MRv1) Schedulers.

Configuring the Task Scheduler Type

  1. Go to the MapReduce service.
  2. Click the Configuration tab.
  3. Select Scope > JobTracker.
  4. Select Category > Classes .
  5. In the Task Scheduler property, select a scheduler.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  6. Click Save Changes to commit the changes.
  7. Restart the JobTracker to apply the new configuration:
    1. Click the Instances tab.
    2. Click the JobTracker role.
    3. Select Actions for Selected > Restart.

Modifying the Scheduler Configuration

  1. Go to the MapReduce service.
  2. Click the Configuration tab.
  3. Select Scope > JobTracker.
  4. Select Category > Jobs.
  5. Modify the configuration properties.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  6. Click Save Changes to commit the changes.
  7. Restart the JobTracker to apply the new configuration:
    1. Click the Instances tab.
    2. Click the JobTracker role.
    3. Select Actions for Selected > Restart.

Configuring the MapReduce Service to Save Job History

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

Normally job history is saved on the host on which the JobTracker is running. You can configure JobTracker to write information about every job that completes to a specified HDFS location. By default, the information is retained for 7 days.

Enabling Map Reduce Job History To Be Saved to HDFS

  1. Create a folder in HDFS to contain the history information. When creating the folder, set the owner and group to mapred:hadoop with permission setting 775.
  2. Go to the MapReduce service.
  3. Click the Configuration tab.
  4. Select Scope > JobTracker.
  5. Select Category > Paths.
  6. Set the Completed Job History Location property to the location that you created in step 1.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  7. Click Save Changes.
  8. Restart the MapReduce service.

Setting the Job History Retention Duration

  1. Select the JobTracker Default Group category.
  2. Set the Job History Files Maximum Age property (mapreduce.jobhistory.max-age-ms) to the length of time (in milliseconds, seconds, minutes, or hours) that you want job history files to be kept.
  3. Restart the MapReduce service.
The Job History Files Cleaner runs at regular intervals to check for job history files that are ready to be deleted. By default, the interval is 24 hours. To change the frequency with which the Job History Files Cleaner runs:
  1. Select the JobTracker Default Group category.
  2. Set the Job History Files Cleaner Interval property (mapreduce.jobhistory.cleaner.interval) to the desired frequency (in milliseconds, seconds, minutes, or hours).
  3. Restart the MapReduce service.

Configuring Client Overrides

A configuration property qualified with (Client Override) is a server-side setting that ignores any value a client tries to set for that property. It performs the same role as its unqualified counterpart, and applies the configuration to the service with the setting <final>true</final>.

For example, if you set the Map task heap property to 1 GB in the job configuration code, but the service's heap property qualified with (Client Override) is set to 500 MB, then 500 MB is applied.