Clusters

You can use the Cloudera Altus console or the command-line interface to create and manage data engineering clusters. The Data Engineering service provisions single-user, transient clusters.

By default, Altus creates a cluster that contains a master node and multiple worker nodes. Altus also creates a Cloudera Manager instance to manage the cluster. The Cloudera Manager instance provides visibility into the cluster but is not a part of the cluster. You cannot use the Cloudera Manager instance as a gateway node for the cluster.

Cloudera Manager configures the master node with roles that give it the capabilities of a gateway node. The master node has a resource manager, Hive server and metastore, Spark service, and other roles and client configurations that essentially turns the master node into a gateway node. You can use the master node as a gateway node in an Altus cluster to run Hive and Spark shell commands and Hadoop commands.

Altus creates a read-only user account to connect to the Cloudera Manager instance. When you create a cluster on the Altus console, specify the user name and password for the read-only user account. Use the user name and password to log in to Cloudera Manager.

When you create a cluster using the CLI and you do not specify a user name and password, the Data Engineering service creates a guest user account with a randomly generated password. You can use the guest user name and password to log in to Cloudera Manager.

For more information about the guest user account generated through the CLI, see Cloudera Manager Connection.

In addition to the worker nodes in the cluster, you can also add compute worker nodes to the cluster to improve job execution performance. For more information, see Worker Nodes.

Altus appends tags to each node of a cluster. You can use the tags to identify the nodes and the cluster that they belong to. For more information about the tags, see Altus Tags.

When you create a cluster, you specify which service runs in the cluster. Select the service appropriate for the type of job that you plan to run on the cluster.

The following list describes the services available in Altus clusters and the types of jobs you can run with each service:
Service Type Job Type
Hive Hive
Hive on Spark Hive
Spark 2.x Spark or PySpark
Spark 1.6 Spark or PySpark
MapReduce2 MapReduce2

Cluster Status

A cluster periodically changes status from the time that you create it until the time it is terminated.

An Altus cluster can have the following statuses:
  • Creating. The cluster creation process is in progress.
  • Created. The cluster was successfully created.
  • Failed. The cluster can be in a failed state at creation or at termination time. View the failure message to get more information about the failure.
  • Terminating. The cluster is in the process of being terminated.

    When the cluster is terminated, it is removed from the list of clusters displayed in the Clusters page on the console. It is also not included in the list of clusters displayed when you run the list-clusters command.

Worker Nodes

An Altus cluster can have the following types of worker nodes:
Worker node
A worker node runs both data storage and computational processes. Altus requires a minimum of three worker nodes in a cluster.
Compute worker node
A compute worker node is a type of worker node in an Altus cluster that runs only computational processes. It does not run data storage processes.

Altus does not require compute worker nodes in a cluster. You can configure compute worker nodes for a cluster to add compute power and improve cluster performance.

Compute worker nodes are stateless. They can be terminated and restarted without risking job execution.

A cluster can have a total of 50 worker and compute worker nodes. You determine the combination of worker and compute worker nodes that provides the best performance for your workload. The worker nodes and compute worker nodes use the same instance type.

If you add compute worker nodes to a cluster, Altus manages the provisioning of new instances to replace terminated or failed worker and compute worker instances in a cluster. For more information about reprovisioning cluster instances, see Instance Reprovisioning.

All compute worker nodes in a cluster use the same instance pricing. You can configure the compute worker nodes to use On-Demand instances or Spot instances. For more information about using Spot instances for compute worker nodes, see Spot Instances.

Instance Reprovisioning

By default, if you add compute worker nodes to a cluster, Altus manages the provisioning of new instances to replace terminated or failed instances in the cluster.

Altus periodically attempts to replace failed or terminated worker nodes and compute worker nodes in the cluster. When an instance with a running job fails or terminates, Altus attempts to provision a new instance every 15 minutes.

Altus provisions new instances for failed or terminated cluster nodes in the following manner:
  • Altus provisions On-Demand instances to replace failed or terminated worker nodes and maintain the number of worker nodes configured for the cluster.
  • If compute worker nodes are configured to use On-Demand instances, Altus provisions On-Demand instances to replace failed or terminated compute worker nodes and maintain the number of worker nodes configured for the cluster.
  • If compute worker nodes are configured to use Spot instances, Altus provisions Spot instances to replace failed or terminated compute worker nodes and as much as possible maintain the number of compute worker nodes configured for the cluster.

    Altus attempts to provision a new Spot instance every 15 minutes. Depending on the availability of Spot instances, the number of compute worker nodes might not always match the number of compute worker nodes configured for the cluster.

Spot Instances

A Spot instance is an EC2 instance for which the hourly price fluctuates based on demand. The hourly price for a Spot instance is typically much lower than the hourly price of an On-Demand instance. However, you do not have control on when Spot instances are available for your cluster. When you bid a price on Spot instances, your Spot instances run only when your bid price is higher than the current market price and terminate when your bid price becomes lower than the market price.

If an increase in the number of nodes in your cluster can improve job performance, you might want to use Spot instances for compute worker nodes in your cluster. To ensure that jobs continue to run when Spot instances are terminated, Altus allows you to use Spot instances only for compute worker nodes. Compute worker nodes are stateless and can be terminated and restarted without risking job execution.

Altus manages the use of Spot instances in a cluster. When a Spot instance with a running job terminates, Altus attempts to provision a new instance every 15 minutes. Altus uses the new instance to accelerate the running job.

Use the following guidelines when deciding to use Spot instances for compute worker nodes in a cluster:
  • You can use Spot instances only for compute worker nodes. You cannot use Spot instances for worker nodes.

    You can configure compute worker nodes to use On-Demand or Spot instances. If you configure compute worker nodes to use Spot instances, and no Spot instances are available, jobs run on the worker nodes.

    To ensure that worker nodes are available in a cluster to run the processes required to complete a job, worker nodes must use On-Demand instances. You cannot configure worker nodes to use Spot instances.

  • Set your bid price for Spot instances high enough to have a good chance of exceeding market price.

    Generally, a bid price that is 75% of the On-Demand instance price is a good convention to follow. As you use Spot instances more, you can develop a better standard for setting a bid price that is reasonable but still has a good chance of exceeding market price.

  • Use less On-Demand instances than required and offset the shortfall with a larger number of Spot instances.

    For example, you know that a job must run on a cluster with 10 On-Demand instances to meet a service level agreement. You can use 5 On-Demand instances and 15 Spot instances to increase the number of instances on which the job runs with the same or lower cost.

    This strategy means that most of the job processes run on the cheaper instances and is a cost-effective way to meet the SLA.

For more information about AWS Spot instances, see Spot Instances on the AWS console.