This is the documentation for Cloudera 5.2.x.
Documentation for other versions is available at Cloudera Documentation.

The MapReduce Service

For an overview of computation frameworks, and their usage and restrictions, and common tasks, see The MapReduce and YARN Services.

Continue reading:

Configuring the MapReduce Scheduler

Required Role:

The MapReduce service is configured by default to use the FairScheduler. You can change the scheduler type to FIFO or Capacity Scheduler. You can also modify the Fair Scheduler and Capacity Scheduler configuration. For further information on schedulers, see Schedulers.

Configuring the Task Scheduler Type

  1. Go to the MapReduce service.
  2. Click the Configuration tab.
  3. Expand the JobTracker Default Group category and click the Classes category.
  4. Click the Value field of the Task Scheduler row and select a scheduler.
  5. Click Save Changes to commit the changes.
  6. Restart the JobTracker to apply the new configuration:
    1. Click the Instances tab.
    2. Click the JobTracker role.
    3. Select Actions for Selected > Restart.

Modifying the Scheduler Configuration

  1. Go to the MapReduce service.
  2. Click the Configuration tab.
  3. Click the Jobs subcategory of the JobTracker Default Group category.
  4. Click a property and modify the configuration.
  5. Click Save Changes to commit the changes.
  6. Restart the JobTracker to apply the new configuration:
    1. Click the Instances tab.
    2. Click the JobTracker role.
    3. Select Actions for Selected > Restart.

Configuring the MapReduce Service to Save Job History

Required Role:

Normally job history is saved on the host on which the JobTracker is running. You can configure JobTracker to write information about every job that completes to a specified HDFS location. By default, the information is retained for 7 days.

Enabling Map Reduce Job History To Be Saved to HDFS

  1. Create a folder in HDFS to contain the history information. When creating the folder, set the owner and group to mapred:hadoop with permission setting 775.
  2. Go to the MapReduce service.
  3. Click the Configuration tab.
  4. Expand the JobTracker Default Group category and click the Paths subcategory.
  5. Set the Completed Job History Location property to the location that you created in step 1.
  6. Click Save Changes.
  7. Restart the MapReduce service.

Setting the Job History Retention Duration

  1. Select the JobTracker Default Group category.
  2. Set the Job History Files Maximum Age property (mapreduce.jobhistory.max-age-ms) to the length of time (in milliseconds, seconds, minutes, or hours) that you want job history files to be kept.
  3. Restart the MapReduce service.
The Job History Files Cleaner runs at regular intervals to check for job history files that are ready to be deleted. By default, the interval is 24 hours. To change the frequency with which the Job History Files Cleaner runs:
  1. Select the JobTracker Default Group category.
  2. Set the Job History Files Cleaner Interval property (mapreduce.jobhistory.cleaner.interval) to the desired frequency (in milliseconds, seconds, minutes, or hours).
  3. Restart the MapReduce service.

Configuring Client Overrides

A configuration property qualified with (Client Override) is a server-side setting that ignores any value a client tries to set for that property. It performs the same role as its unqualified counterpart, and applies the configuration to the service with the setting <final>true</final>.

For example, if you set the Map task heap property to 1 GB in the job configuration code, but the service's heap property qualified with (Client Override) is set to 500 MB, then 500 MB is applied.