Clusters on AWS

If you create clusters on AWS, you can take advantage of the EC2 Spot instances that AWS offers at a discount. You can add compute worker nodes to your clusters and configure them to use Spot instances.

Worker Nodes

An Altus cluster on AWS can have the following types of worker nodes:
Worker node
A worker node runs both data storage and computational processes. Altus requires a minimum of three worker nodes in a cluster.
Compute worker node
A compute worker node is a type of worker node in an Altus cluster that runs only computational processes. It does not run data storage processes.

Altus does not require compute worker nodes in a cluster. You can configure compute worker nodes for a cluster to add compute power and improve cluster performance.

Compute worker nodes are stateless. They can be terminated and restarted without risking job execution.

A cluster can have a total of 50 worker and compute worker nodes. You determine the combination of worker and compute worker nodes that provides the best performance for your workload. The worker nodes and compute worker nodes use the same instance type.

If you add compute worker nodes to a cluster, Altus manages the provisioning of new instances to replace terminated or failed worker and compute worker instances in a cluster. For more information about reprovisioning cluster instances, see Instance Reprovisioning.

All compute worker nodes in a cluster use the same instance pricing. You can configure the compute worker nodes to use On-Demand instances or Spot instances. For more information about using Spot instances for compute worker nodes, see Spot Instances.

Spot Instances

A Spot instance is an EC2 instance for which the hourly price fluctuates based on demand. The hourly price for a Spot instance is typically much lower than the hourly price of an On-Demand instance. However, you do not have control on when Spot instances are available for your cluster. When you bid a price on Spot instances, your Spot instances run only when your bid price is higher than the current market price and terminate when your bid price becomes lower than the market price.

If an increase in the number of nodes in your cluster can improve job performance, you might want to use Spot instances for compute worker nodes in your cluster. To ensure that jobs continue to run when Spot instances are terminated, Altus allows you to use Spot instances only for compute worker nodes. Compute worker nodes are stateless and can be terminated and restarted without risking job execution.

Altus manages the use of Spot instances in a cluster. When a Spot instance with a running job terminates, Altus attempts to provision a new instance every 15 minutes. Altus uses the new instance to accelerate the running job.

Use the following guidelines when deciding to use Spot instances for compute worker nodes in a cluster:
  • You can use Spot instances only for compute worker nodes. You cannot use Spot instances for worker nodes.

    You can configure compute worker nodes to use On-Demand or Spot instances. If you configure compute worker nodes to use Spot instances, and no Spot instances are available, jobs run on the worker nodes.

    To ensure that worker nodes are available in a cluster to run the processes required to complete a job, worker nodes must use On-Demand instances. You cannot configure worker nodes to use Spot instances.

  • Set your bid price for Spot instances high enough to have a good chance of exceeding market price.

    Generally, a bid price that is 75% of the On-Demand instance price is a good convention to follow. As you use Spot instances more, you can develop a better standard for setting a bid price that is reasonable but still has a good chance of exceeding market price.

  • Use less On-Demand instances than required and offset the shortfall with a larger number of Spot instances.

    For example, you know that a job must run on a cluster with 10 On-Demand instances to meet a service level agreement. You can use 5 On-Demand instances and 15 Spot instances to increase the number of instances on which the job runs with the same or lower cost.

    This strategy means that most of the job processes run on the cheaper instances and is a cost-effective way to meet the SLA.

For more information about AWS Spot instances, see Spot Instances on the AWS console.

Instance Reprovisioning

By default, if you add compute worker nodes to a cluster, Altus manages the provisioning of new instances to replace terminated or failed instances in the cluster.

Altus periodically attempts to replace failed or terminated worker nodes and compute worker nodes in the cluster. When an instance fails or terminates, Altus attempts to provision a new instance every 15 minutes.

Altus provisions new instances of worker nodes and compute worker nodes in the following manner:
  • Altus provisions On-Demand instances to replace failed or terminated worker nodes and maintain the number of worker nodes configured for the cluster.
  • If compute worker nodes are configured to use On-Demand instances, Altus provisions On-Demand instances to replace failed or terminated compute worker nodes and maintain the number of compute worker nodes configured for the cluster.
  • If compute worker nodes are configured to use Spot instances, Altus provisions Spot instances to replace failed or terminated compute worker nodes and as much as possible maintain the number of compute worker nodes configured for the cluster. Depending on the availability of Spot instances, the number of compute worker nodes might not always match the number of compute worker nodes configured for the cluster.

System Volume

By default, when you create an Altus cluster for AWS, each node in the cluster includes a root device volume. In addition, Altus attaches an EBS volume to the node to store data generated by the cluster.

The EBS volume that Altus adds to the node is a system volume meant to hold logs and other data generated by Altus services and systems. Although Altus manages it, the system volume counts as a volume that you pay for in your instance. The system volume is deleted when the cluster is terminated.

Altus configures the cluster so that sensitive information is not written to the root volume, but to the system volume. When you enable the secure cluster option for Altus clusters, Altus encrypts the system volume and the EBS volumes that you configure for the cluster. Altus does not need to encrypt the root device volume since it does not contain sensitive data.