Running and Monitoring Jobs on the Console

You can submit a single job or a group of jobs on the Cloudera Altus console. When you view the list of jobs, you can file a support ticket with Cloudera for any job that has issues with which you require help.

Submitting a Job on the Console

You can submit a job to run on an existing cluster or create a cluster specifically for the job.

To submit a job on the console:
  1. Sign in to the Cloudera Altus console:

    https://console.altus.cloudera.com/

  2. On the side navigation panel, click Jobs.

    By default, the Jobs page displays the list of all the Altus Data Engineering jobs in your Altus account. You can filter the list of jobs by Altus environment, the cluster on which the jobs run, or the time frame when the jobs run. You can also filter by the user who submitted the job and the job type and status.

  3. Click Submit Jobs.
  4. On the Job Settings page, select Single job.
  5. Select the type of job you want to submit.
    You can select from the following types of jobs:
    • Hive
    • MapReduce2
    • PySpark
    • Spark
  6. Enter the job name.

    The job name is optional. If you do not specify a name, Altus sets the job name to be the same as the job ID.

  7. Specify the properties for the job based on the job type.
    Hive Job Properties
    Property Description
    Script The Hive script to execute. Select one of the following sources for the hive script:
    • Script Path. Specify the path and file name of the file that contains the script.
    • File Upload. Upload a file that contains the script.
    • Direct Input. Type in the script.

    The Hive script can include parameters. Use the format ${Variable_Name} for the parameter. If the script contains parameters, you must specify the variable name and value for each parameter in Hive Script Parameters field.

    Required.

    Hive Script Parameters Required if the Hive script includes variables.

    Select the option and provide the definition of the variables used as parameters in the Hive script. You must define the value of all variables that you use in the script.

    Click + to add a variable to the list. Click - to delete a variable from the list.

    Job XML Optional. XML document that defines the configuration settings for the job.

    Select the option and provide the job configuration. Select File Upload to upload the configuration XML file or select Direct Input to type in the configuration settings.

    Spark Job Properties
    Property Description
    Main Class Main class and entry point of the Spark application.

    Required.

    Jars Path and file names of jar files to be added to the classpath. You can include jar files that are stored in AWS S3 or Azure ADLS cloud storage or in HDFS.

    Click + to add a jar file to the list. Click - to delete a jar file from the list.

    Required.

    Application Arguments Optional. Arguments to pass to the main method of the main class of the Spark application.

    Click + to add an argument to the list. Click - to delete an argument from the list.

    Spark Arguments Optional. A list of Spark configuration properties for the job.
    For example:
    --executor-memory 4G --num-executors 50
    MapReduce2 Job Properties
    Property Description
    Main Class Main class and entry point of the MapReduce2 application.

    Required.

    Jars Path and file names of jar files to be added to the classpath. You can include jar files that are stored in AWS S3 or Azure ADLS cloud storage or in HDFS.

    Click + to add a jar file to the list. Click - to delete a jar file from the list.

    Required.

    MapReduce Application Arguments Optional. Arguments for the MapReduce2 application. The arguments are passed to the main method of the main class.

    Click + to add an argument to the list. Click - to delete an argument from the list.

    Java Options Optional. A list of Java options for the JVM.
    Job XML Optional. XML document that defines the configuration settings for the job.

    Select the option and provide the job configuration. Select File Upload to upload the configuration XML file or select Direct Input to type in the configuration settings.

    PySpark Job Properties
    Property Description
    Main Python File Path and file name of the main Python file for the Spark application. This is the entry point for your PySpark application. You can specify a file that is stored in cloud storage or in HDFS.

    Required.

    Python File Dependencies Optional. Files required by the PySpark job, such as .zip, .egg, or .py files. Altus adds the path and file names of the files in the PYTHONPATH for Python applications. You can include files that are stored in cloud storage or in HDFS.

    Click + to add a file to the list. Click - to delete a file from the list.

    Application Arguments Optional. Arguments to pass to the main method of the PySpark application.

    Click + to add an argument to the list. Click - to delete an argument from the list.

    Spark Arguments Optional. A list of Spark configuration properties for the job.
    For example:
    --executor-memory 4G --num-executors 50
  8. In Action on Failure, specify the action that Altus takes when the job fails.
    Altus can perform the following actions:
    • None. When a job fails, Altus runs the subsequent jobs in the queue.
    • Interrupt Job Queue. When a job fails, Altus does not run any of the subsequent jobs in the queue. The jobs that do not run after a job fails are set to a status of Interrupted.
    For more information about the Action on Failure option, see Job Queue.
  9. In the Cluster Settings section, select the cluster on which the job will run:
    • Use existing. Select from the list of clusters that is available for your use.

      Altus displays only the names of clusters where the type of job you selected can run and that you have access to. The list displays the number of workers in the cluster.

    • Create new. Configure and create a cluster for the job. If the cluster creation process is not yet complete when you submit the job, Altus adds it to the job queue and runs it when the cluster is created.
    • Clone existing. Select the cluster on which to base the configuration of a new cluster.
  10. If you create or clone a cluster, set the properties and select the options for the new cluster.

    Complete the following steps:

    1. To allow Altus to terminate the cluster after the job completes, select the Terminate cluster once jobs complete option.

      If you create a cluster specifically for this job and you do not need the cluster after the job runs, you can have Altus terminate the cluster when the job completes. If the Terminate cluster once jobs complete option is selected, Altus terminates the cluster after the job runs, whether the job completes successfully or fails. This option is selected by default. If you do not want Altus to terminate the cluster, clear the selection.

    2. You create a cluster within the Jobs page the same way that you create a cluster on the Clusters page.

      To create a cluster for AWS, follow the instructions from Step 4 to Step 7 in Creating a Data Engineering Cluster for AWS.

      To create a cluster for Azure, follow the instructions from Step 4 to Step 7 in Creating a Data Engineering Cluster for Azure.

  11. Verify that all required fields are set and click Submit.

    The Altus Data Engineering service submits the job to run on the selected cluster in your cloud provider account.

Submitting Multiple Jobs on the Console

You can group multiple jobs in one job submission. You can submit a group of jobs to run on an existing cluster or you can create a cluster specifically for the job group.

To submit a job on the console:
  1. Sign in to the Cloudera Altus console:

    https://console.altus.cloudera.com/

  2. On the side navigation panel, click Jobs.

    By default, the Jobs page displays the list of all the Altus Data Engineering jobs in your Altus account. You can filter the list of jobs by Altus environment, the cluster on which the jobs run, or the time frame when the jobs run. You can also filter by the user who submitted the job and the job type and status.

  3. Click Submit Jobs.
  4. On the Job Settings page, select Group of jobs.
  5. Select the type of job you want to submit.
    You can select from the following types of jobs:
    • Hive
    • MapReduce2
    • PySpark
    • Spark
  6. Enter a name for the job group.

    The job group name is optional. By default, Altus assigns an ID to the job group. If you do not specify a name, Altus sets the job group name to be the same as the job group ID.

  7. Click Add <Job Type>.
  8. On the Add Job window, enter the job name.

    The job name is optional. By default, Altus assigns an ID to the job. If you do not specify a name, Altus sets the job name to be the same as the job ID.

  9. Set the properties for the job.

    Altus displays job properties based on the job type.

    Hive Job Properties
    Property Description
    Script The Hive script to execute. Select one of the following sources for the hive script:
    • Script Path. Specify the path and file name of the file that contains the script.
    • File Upload. Upload a file that contains the script.
    • Direct Input. Type in the script.

    The Hive script can include parameters. Use the format ${Variable_Name} for the parameter. If the script contains parameters, you must specify the variable name and value for each parameter in Hive Script Parameters field.

    Required.

    Hive Script Parameters Required if the Hive script includes variables.

    Select the option and provide the definition of the variables used as parameters in the Hive script. You must define the value of all variables that you use in the script.

    Click + to add a variable to the list. Click - to delete a variable from the list.

    Job XML Optional. XML document that defines the configuration settings for the job.

    Select the option and provide the job configuration. Select File Upload to upload the configuration XML file or select Direct Input to type in the configuration settings.

    Spark Job Properties
    Property Description
    Main Class Main class and entry point of the Spark application.

    Required.

    Jars Path and file names of jar files to be added to the classpath. You can include jar files that are stored in AWS S3 or Azure ADLS cloud storage or in HDFS.

    Click + to add a jar file to the list. Click - to delete a jar file from the list.

    Required.

    Application Arguments Optional. Arguments to pass to the main method of the main class of the Spark application.

    Click + to add an argument to the list. Click - to delete an argument from the list.

    Spark Arguments Optional. A list of Spark configuration properties for the job.
    For example:
    --executor-memory 4G --num-executors 50
    MapReduce2 Job Properties
    Property Description
    Main Class Main class and entry point of the MapReduce2 application.

    Required.

    Jars Path and file names of jar files to be added to the classpath. You can include jar files that are stored in AWS S3 or Azure ADLS cloud storage or in HDFS.

    Click + to add a jar file to the list. Click - to delete a jar file from the list.

    Required.

    MapReduce Application Arguments Optional. Arguments for the MapReduce2 application. The arguments are passed to the main method of the main class.

    Click + to add an argument to the list. Click - to delete an argument from the list.

    Java Options Optional. A list of Java options for the JVM.
    Job XML Optional. XML document that defines the configuration settings for the job.

    Select the option and provide the job configuration. Select File Upload to upload the configuration XML file or select Direct Input to type in the configuration settings.

    PySpark Job Properties
    Property Description
    Main Python File Path and file name of the main Python file for the Spark application. This is the entry point for your PySpark application. You can specify a file that is stored in cloud storage or in HDFS.

    Required.

    Python File Dependencies Optional. Files required by the PySpark job, such as .zip, .egg, or .py files. Altus adds the path and file names of the files in the PYTHONPATH for Python applications. You can include files that are stored in cloud storage or in HDFS.

    Click + to add a file to the list. Click - to delete a file from the list.

    Application Arguments Optional. Arguments to pass to the main method of the PySpark application.

    Click + to add an argument to the list. Click - to delete an argument from the list.

    Spark Arguments Optional. A list of Spark configuration properties for the job.
    For example:
    --executor-memory 4G --num-executors 50
  10. In Action on Failure, specify the action that Altus takes when a job fails.
    Altus can perform the following actions:
    • None. When a job fails, Altus runs the subsequent jobs in the queue.
    • Interrupt Job Queue. When a job fails, Altus does not run any of the subsequent jobs in the queue. The jobs that do not run after a job fails are set to a status of Interrupted.
    For more information about the Action on Failure option, see Job Queue.
  11. Click OK.

    The Add Job window closes and the job is added to the list of jobs for the group. You can edit the job or delete the job from the group. To add another job to the group, click Add <Job Type> and set the properties for the new job.

    When you complete setting up all jobs in the group, specify the cluster on which the jobs will run.

  12. In the Cluster Settings section, select the cluster on which the jobs will run:
    • Use existing. Select from the list of clusters that is available for your use.

      Altus displays only the names of clusters where the type of job you selected can run and that you have access to. The list displays the number of workers in the cluster.

    • Create new. Configure and create a cluster for the job. If the cluster creation process is not yet complete when you submit the job, Altus adds it to the job queue and runs it when the cluster is created.
    • Clone existing. Select the cluster on which to base the configuration of a new cluster.
  13. If you create or clone a cluster, set the properties and select the options for the new cluster:

    Complete the following steps:

    1. To allow Altus to terminate the cluster after the job completes, select the Terminate cluster once jobs complete option.

      If you create a cluster specifically for this job and you do not need the cluster after the job runs, you can have Altus terminate the cluster when the job completes. If the Terminate cluster once jobs complete option is selected, Altus terminates the cluster after the job runs, whether the job completes successfully or fails. This option is selected by default. If you do not want Altus to terminate the cluster, clear the selection.

    2. You create a cluster within the Jobs page the same way that you create a cluster on the Clusters page.

      To create a cluster for AWS, follow the instructions from Step 4 to Step 7 in Creating a Data Engineering Cluster for AWS.

      To create a cluster for Azure, follow the instructions from Step 4 to Step 7 in Creating a Data Engineering Cluster for Azure.

  14. Verify that all required fields are set and click Submit.

    The Altus Data Engineering service submits the jobs as a group to run on the selected cluster in your cloud service account.

Viewing Job Status and Information

To view Altus Data Engineering jobs on the console:
  1. Sign in to the Cloudera Altus console:

    https://console.altus.cloudera.com/

  2. On the side navigation panel, click Jobs.

    By default, the Jobs page displays the list of all the Altus Data Engineering jobs in your Altus account. You can filter the list of jobs by Altus environment, the cluster on which the jobs run, or the time frame when the jobs run. You can also filter by the user who submitted the job and the job type and status.

    The jobs list displays the name of the group to which the job belongs and the name of the cluster on which the job runs. Click the group name to view the details of the job group and the jobs in the group. Click the cluster name to view the cluster details.

    The Jobs list displays the status of the job. For more information about the different statuses that a job can have, see Altus Data Engineering Jobs.

  3. You can click the Actions button for the job to perform the following tasks:
    • Clone a Job. To create a job of the same type as the job that you are viewing, select the Clone Job action. On the Submit Job page, you can submit a job with the same properties as the job you are cloning. You can modify or add to the properties before you submit the job.
    • Terminate a Job. If the job has a status of Queued, Running, or Submitting, you can select Terminate Job to stop the process. If you terminate a job with a status of Running, the job run is aborted. If you terminate a job with a status of Queued or Submitting, the job will not run.

      If the job status is Complete, the Terminate Job selection does not appear.

Viewing the Job Details

To view the details of a cluster on the console:
  1. Sign in to the Cloudera Altus console:

    https://console.altus.cloudera.com/

  2. On the side navigation panel, click Jobs.

    By default, the Jobs page displays the list of all the Altus Data Engineering jobs in your Altus account. You can filter the list of jobs by Altus environment, the cluster on which the jobs run, or the time frame when the jobs run. You can also filter by the user who submitted the job and the job type and status.

  3. Click the name of a job.
    The Job details page displays information about the job, including the job type and the properties and status of the job.
    • The Job Settings section displays the properties configured for the job, according to the type of job. The section also shows the action that Altus will take if a job fails in the queue fails.
    • The Job details page shows the name of the cluster on which the job runs and the user account that submitted the job. If the cluster is not terminated, the cluster name is a link to the details page of the cluster where you can see more information about the cluster.
    • The Job details page also show the time line when the job changed status as it moved through the job execution process. It shows the amount of time the job was in the queue and the amount of time it took to complete the job.
    • The Job details page displays the job ID and CRN. The job CRN is a long string of characters. If you need to use the CRN to identify a job, the jobs details page makes it easy for you to copy the CRN to the clipboard so you can paste it in the command or support case. For example, if you need to include the job CRN when you run a command or create a support case, copy the job CRN from the job details page and paste it on the command line or the support case.