Job Submission and Monitoring

Job is the primary interface by which a user job interacts with the ResourceManagerJob provides facilities to submit jobs, track their progress, access component task reports and logs, and obtain MapReduce cluster status information.

The job submission process includes:

  1. Checking the input and output specifications of the job.
  2. Computing the InputSplit values for the job.
  3. Setting up the requisite accounting information for theDistributedCache of the job, if necessary.
  4. Copying the job's JAR file and configuration to the MapReduce system directory on the filesystem.
  5. Submitting the job to the ResourceManager and optionally monitoring its status.

Job history files are also logged to user-specified directories mapreduce.jobhistory.intermediate-done-dir and mapreduce.jobhistory.done-dir.

You can view the history logs summary in the specified directory by using the following command:
$ hadoop job -history output.jhist
This command prints job details and failed and killed task-in-progress details. You can view more details about the job, such as successful tasks and attempts made for each task, by using the following command:
$ hadoop job -history all output.jhist

You can use OutputLogFilter to filter log files from the output directory listing.

Normally, you create the application, describe various facets of the job, submit the job, and then monitor its progress.

Job Control

You might need to chain MapReduce jobs to accomplish complex tasks that cannot be done with a single job. This is fairly easy, because output of the job typically goes to the distributed filesystem, and that output can be used as input for the next job.

However, clients must ensure that jobs are complete (success/failure). Job control options are:

You can also use Oozie to implement chains of MapReduce jobs.