Altus Data Engineering Jobs

You can use the Cloudera Altus console or the command-line interface to run and monitor jobs. When you submit a job, configure it to run on a cluster that contains the service you require to run the job.

The following list describes the services available in Altus clusters and the types of jobs you can run with each service:
Service Type Job Type
Hive Hive
Hive on Spark Hive
Spark 2.x Spark or PySpark
Spark 1.6 Spark or PySpark
MapReduce2 MapReduce2
Multi Hive, Spark, PySpark, MapReduce2

The Multi service cluster supports Spark 2.x. It does not support Spark 1.6.

Altus creates a job queue for each cluster. When you submit a job, Altus adds the job to the queue of the cluster on which you configure the job to run. For more information about the job queue, see Job Queue .

Altus generates a job ID for every job that you submit. If you submit a group of jobs, Altus generates a group ID. If you do not specify a job name or a group name, Altus sets the job name to the job ID and the group name to the group ID.

If the Altus environment has Workload Analytics enabled, you can view performance information for a job after it ends, including health checks, baselines, and other execution information. Use this information to analyze a job's current performance and compare it to past runs of the same job.

Job Status

A job periodically changes status from the time that you submit it until the time it completes or is terminated. A user action or the configuration of the job or the cluster on which it runs can affect the status of the job.

A data engineering job can have the following statuses:
  • Queued. The job is queued to run on the selected cluster.
  • Submitting. The job is being added to the job queue.
  • Running. The job is in progress.
  • Interrupted. A job is set to Interrupted status in the following situations:
    • If you create a cluster when you submit a job and the cluster is not successfully created, the job status is set to Interrupted. You can create a cluster and rerun the job on the new cluster.
    • If the job is queued to run on a cluster but the cluster is deleted, the job status is set to Interrupted. You can rerun the job on another cluster.
    • If the job does not run because a previous job in the queue has the Action on Failure option set to Interrupt Job Queue, the job status is set to Interrupted.
  • Completed. The job completed successfully.
  • Terminating. You have initiated termination of the job and the job is in the process of being terminated.
  • Terminated. The job termination process is complete.
  • Failed. The job did not complete.

Job Queue

When you create a cluster, Altus sets up a job queue for the jobs submitted to the cluster. Altus sets up one job queue for each cluster and adds all jobs that are submitted to a cluster to the same job queue.

Altus runs the jobs in the queue in the sequence that the job requests are received. Whether you submit single jobs or groups of jobs to the cluster, Altus runs the jobs sequentially in the order that each job request is received.

You can configure the following options to manage how jobs run in the queue:
  • Job Failure Action

    When you submit a job, you can specify the action that Altus takes when a job fails. You can use the job failure action to specify whether Altus runs the jobs in the queue following a failed job.

    This option is useful for handling job dependencies. If a job must complete before the next job can run, you can set the option to interrupt the job queue so that, if the job fails, Altus does not run the rest of the jobs in the queue.

    If a job failure does not affect subsequent jobs in the queue, you can set the option to an action of NONE so that, when a job fails, Altus continues to run the subsequent jobs in the queue.

    If you do not specify any action, Altus sets the option to interrupt the job queue by default.

    The following table shows the option on the console and parameter in the CLI that you can use to specify the action Altus takes when a job fails:
    Interface Option/Parameter Description
    Console Action on Failure When you submit a job, you can set the Action on Failure option to one of the following actions:
    • None. When a job fails, Altus continues with job execution and performs no special action. Altus runs the next job in the queue.
    • Interrupt Job Queue. When the job fails, Altus does not run any of the subsequent jobs in the queue. The jobs that do not run after a job fails are set to a status of Interrupted.
    CLI failureAction If you use the CLI to submit a job, you can set the failureAction parameter to one of the following actions:
    • NONE. When a job fails, Altus continues with job execution and performs no special action. Altus runs the next job in the queue.
    • INTERRUPT_JOB_QUEUE. When the job fails, Altus does not run any of the subsequent jobs in the queue. The jobs that do not run after a job fails are set to a status of Interrupted.
  • Cluster termination after all jobs are processed

    When you create a cluster, you can configure how Altus handles a cluster when all jobs sent to the cluster are processed and the job queue becomes empty.

    The following table shows the option on the console and parameter in the CLI that you can use to specify the condition by which Altus terminates a cluster:
    Interface Option/Parameter Description
    Console Terminate cluster once jobs complete When you submit a job and you create a cluster on which to run the job, you can enable the Terminate cluster once jobs complete option to terminate the cluster when the job queue is empty.

    If you do not enable the option, Altus does not terminate the cluster when the job queue is empty. You must manually terminate the cluster if you do not plan to submit jobs to the cluster again.

    CLI --automatic-termination-condition If you use the CLI to create a cluster, you can use the --automatic-termination-condition parameter to specify whether to terminate the cluster when the job queue is empty.
    You can set the parameter to one of the following conditions:
    • NONE. When the job queue is empty, Altus does not terminate the cluster.
    • EMPTY_JOB_QUEUE. When all jobs in the queue are processed and the queue is empty, Altus terminates the cluster.

      If you set the option to terminate the cluster, you must include the --jobs parameter and submit at least one job to the cluster.

    The --automatic-termination-condition parameter is optional. If you do not include the parameter, Altus does not terminate the cluster when the job queue is empty.