Important Note:

Cloudera Manager version 3 and CDH3 have reached End of Maintenance (EOM) as of June 20th, 2013. Cloudera will not support or provide patches for any of the Cloudera Manager version 3 and CDH3 releases. To view documentation related to later releases, click the Documentation link at the top of this page.

Job Designer

Introducing Hue Job Designer

The Job Designer application enables you to create and submit Hadoop Map/Reduce jobs to the Hadoop cluster. You can include variables with your jobs to enable you and other users to enter values for the variables when they run your job. The Job Designer supports streaming and JAR jobs. For more information about Hadoop Map/Reduce, see the Hadoop Map/Reduce Tutorial.

  Note:

A job's input files must be uploaded to the cluster before you can submit the job.

Job Designer Installation and Configuration

Job Designer is one of the applications that can be installed as part of Hue. For more information about installing Hue, see Hue Installation.

Using Job Designer

The following sections describe how to start and use Job Designer.

Starting Job Designer

To start Job Designer, click this icon images/image30.jpeg in the application bar at the bottom of the Hue web page. The Job Design List window opens in the Hue web page.
images/image31.jpeg

Installing the Job Designer Samples

The Job Designer sample jobs can help you learn how to use Job Designer. To install the Job Designer samples, click Install Samples in the Job Design List window and then click Ok. The sample jobs are displayed in the Job Design List window. Job Designer removes the Install Samples button after the samples are installed so you can only install the samples once.
images/image32.jpeg

Working with Job Designs

In the Job Designer, a job design specifies several meta-level properties of a Map/Reduce job, including the job design name, description, the Map/Reduce executable scripts or classes, and any parameters for those scripts or classes. You can create two types of job designs: a streaming job design and a JAR job design.

Creating a Streaming Job Design

Hadoop streaming jobs enable you to create Map/Reduce functions in any non-Java language that reads standard Unix input and writes standard Unix output. For more information about Hadoop streaming jobs, see http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320/streaming.html

To create a streaming job design:

  1. In the Job Design List window, click Streaming.The Job Design Editor:Streaming Job window opens to enable you to specify information about the streaming job.
    images/image33.jpeg
  2. In the Job Design Editor:Streaming Job window, specify the following information.

    Setting

    Description

     

    Note

    You can use variables of the form $variable_name for the Input, Output, Mapper Cmd, and Reducer Cmd settings described in the following table. When the streaming job is run, a dialog box will appear to enable you or users to specify the values of the variables.

    Name

    The Name identifies the streaming job design including the associated properties and parameters.

    Description

    Specify a description of the streaming job design. The description is displayed in the dialog box that appears if you specify variables for the job.

    Input

    Specify the path to the file or directory you want to use as the input data for the streaming job. If you specify a directory, all files in that directory are used for input. Equivalent to the Hadoop -input option.

    Output

    Specify the path to the directory where you want to save the output of the streaming job. The directory cannot exist before you run the job or else the job will not run. (This requirement is a precaution to prevent overwriting data from other jobs.) Equivalent to the Hadoop -output option.

    Mapper Cmd

    Specify the path to the mapper script or class. If the mapper file is not on the machines on the cluster, use the Required Files option to pack it as a part of job submission. Equivalent to the Hadoop -mapper option.

    Reducer Cmd

    Specify the path to the reducer script or class. If the reducer file is not on the machines on the cluster, use the Required Files option to pack it as a part of job submission. Equivalent to the Hadoop -reducer option.

    Num Reduce Tasks

    Specify the number of reduce tasks you want to use. Specify zero if you do not want to run any reducer tasks. If you don't specify a value for this setting, the default specified in your cluster configuration takes effect. The optimal number of reduce tasks is the product of the following values:-- a factor of 0.95 or 1.75multiplied by:-- the number of nodes in your cluster multiplied by the mapred.tasktracker.reduce.tasks.maximum propertyIf your reduce tasks are not very big, use a factor of 0.95 to use fewer reduce tasks than the number of nodes in your cluster. This factor allows for a small number of failed reduce tasks without increasing the time required for running the jobs.If your reduce tasks are very big, use a factor of 1.75 to use more reduce tasks than the number of nodes in your cluster. This factor allows for better load balancing and failed reduce tasks do not significantly increase the time required for running the jobs.

    Required Files

    Specify the executable files that do not exist on the machines in the cluster to pack your executable files as a part of job submission.

  3. Select Submit upon save to submit the job to the cluster immediately after you click Save.
  4. Click Save to save the job settings.

Creating a JAR Job Design

A Hadoop JAR consists of Map/Reduce functions written in Java.

To create a JAR job design:

  1. In the Job Design List window, click Jar.The Job Design Editor:Jar Job window opens where you can specify information about the JAR job.
    images/image34.jpeg
  2. In the Job Design Editor:Jar Job window, specify the following information.

    Setting

    Description

     

    Note

    You can use variables of the form $variable_name for the Arguments setting described in the following table. When the JAR job is run, a dialog box will appear to enable you or users to specify the values of the variables.

    Name

    The Name identifies the JAR job and it's collection of parameters.

    Description

    Specify a description of the JAR job. The description is displayed in the dialog box that appears if you specify variables for the job.

    Jarfile

    Specify the name of the JAR file, including the path.

    Arguments

    Specify the arguments you want to pass to the running JAR job.

  3. Select Submit upon save to submit the job to the cluster immediately after you click Save.
  4. Click Save to save the job settings.

Submitting a Job to a Cluster

To submit a job to a cluster:

  1. In the Job Design List window, click job designs in the upper left corner.Your jobs and other users' jobs are displayed in the Job Design List window.
    images/image35.jpeg
  2. In the Job Design List window, double-click the job you want to submit. You can also right-click and choose Submit to Cluster.
  3. If the job contains variables, enter the information requested in the dialog box that appears.For example, the sample streaming PI Calculator job displays the following dialog box to enable you to specify the settings for Iterations per Mapper and Num of mappers.
    images/image36.jpeg
  4. Click Ok to submit the job.After the job is complete, the Job Designer displays the results of the job including the last 10 KB of stdout and stderr for a streaming job. For example, after the sample streaming PI Calculator job is complete, the following results appear.
    images/image37.jpeg
    For information about displaying job results, see Displaying Job Results.

Copying, Editing, and Deleting a Job Design

If you want to edit and use a job but you don't own it, you can make a copy of it and then edit and use the copied job.

To copy a job design:

  1. In the Job Design List window, click job designs.The jobs are displayed in the Job Design List window.
  2. In the Job Design List window, select the job, right click, and choose Copy from the context menu.
    images/image38.jpeg
  3. In the Job Design Editor window, change the settings and then click Save to save the job settings.

To edit a job design:

  1. In the Job Design List window, click job designs.The jobs are displayed in the Job Design List window.
  2. In the Job Design List window, select the job, right click, and choose Edit from the context menu.
  3. In the Job Design Editor window, change the settings and then click Save to save the job settings.

To delete a job design:

  1. In the Job Design List window, click job designs.The jobs are displayed in the Job Design List window.
  2. In the Job Design List window, select the job, right click, and choose Delete from the context menu.
  3. Click Ok to confirm the deletion.

Filtering the Job Design List

You can filter the Job Design List by owner, by job name, or both.

To filter the Job Design list:

  1. In the Job Design List window, click job designs.
  2. Enter the name of the owner, job name, or both in the query search boxes at the top of the Job Design List window.Job Designer displays the jobs that match the filter criteria.
    images/image39.jpeg

Displaying Job Results

To display job results:

  1. In the Job Design List window, click history.The jobs are displayed in the Job History window.
    images/image40.jpeg
  2. To display details about a job, double-click it or click this icon images/image41.jpeg next to the task.The results of the job appear.
    images/image37.jpeg
  3. To display details about the job, click the job name under the Launched Jobs heading at the top of the results window.The following Job Browser screen appears with details about the job.
    images/image42.jpeg
  4. To display details about a task, double-click a task under Recent Tasks or click this icon images/image43.jpeg next to the task.
  5. To view other types of job information, click the Metadata and Counters tabs.
  6. To view the output of the job, click the link next to Output at the top of the screen.
  7. To view status information of all jobs, click the view all tasks link.Click here for more information about using Job Browser.