Creating and Working with Clusters Using the CLI

You can use the Cloudera Altus client to create a cluster, view the properties of a cluster, or terminate a cluster. You can use the commands listed here as examples for how to use the Cloudera Altus commands.

For more information about the commands available in the Altus client , run the following command:
altus dataeng help 

Creating a Cluster

You can use the following command to create a cluster:

altus dataeng create-aws-cluster
--compute-workers-configuration='{"groupSize": NumberOfComputeWorkers, "useSpot": true, "bidUSDPerHr": BidPrice}'
Guidelines for using the create-aws-cluster command:
  • You must specify the service to include in the cluster. In the service-type parameter, use one of the following service names to specify the service in the cluster:
    • HIVE
    • SPARK

      Use this service type for Spark 2.1 or Spark 2.2.

    • SPARK_16

      Use this service type only if your application specifically requires Spark version 1.6. If you specify SPARK_16 in the service-type parameter, you must specify CDH511 in the cdh-version parameter.

    • MR2
  • You must specify the version of CDH to include in the cluster. In the cdh-version parameter, use one of the following version names to specify the CDH version:
    • CDH513
    • CDH512
    • CDH511
    The CDH version that you specify can affect the service that runs on the cluster:
    Spark 2.x or Spark 1.6
    For a Spark service type, you must select the CDH version that supports the selected Spark version. Altus supports the following combinations of CDH and Spark versions:
    • CDH 5.13 and CDH 5.12 with Spark 2.2
    • CDH 5.11 with Spark 2.1 or Spark 1.6
    Hive on Spark
    On CDH 5.13, dynamic partition pruning (DPP) is enabled for Hive on Spark by default. For details, see Dynamic Partition Pruning for Hive Map Joins in the Cloudera Enterprise documentation set.
  • The public-key parameter requires the full path and file name of a .pub file prefixed with file://. For example: --public-key=file:///my/file/path/to/ssh/

    Altus adds the public key to the authorized_keys file on each node in the cluster.

    You can also provide a private SSH key instead of a public key with the ssh-private-key parameter. However, for security reasons, Cloudera does not recommend that you provide a private key and will discontinue its use in the near future. If you provide a private key, the key must be registered in the region specified for the Altus environment that you use for the cluster. Altus does not support password-protected SSH keys.

  • The --compute-workers-configuration parameter is optional. It adds compute worker nodes to the cluster in addition to worker nodes. Compute worker nodes run only computational processes. If you do not set the configuration for the compute workers, Altus creates a cluster with no compute worker nodes.
  • The response object for the create-aws-cluster command contains the credentials for a read-only account for the Cloudera Manager instance in the cluster. You must note down the credentials from this response since the credentials are not made available again.

Example: Creating a Cluster for a PySpark Job

This example shows how to create a cluster with a bootstrap script and run a PySpark job on the cluster. The bootstrap script installs a custom Python environment in which to run the job. The Python script file is available in the Cloudera Altus S3 bucket of job examples.

The following command creates a cluster with a bootstrap script and runs a job to implement an alternating least squares (ALS) algorithm. dataeng create-aws-cluster 
    --public-key YourPublicSSHKey  
    --jobs '{
        "name": "PySpark ALS Job",
        "pySparkJob": {
            "mainPy": "s3a://cloudera-altus-data-engineering-samples/pyspark/als/",
            "sparkArguments" : "--executor-memory 1G --num-executors 2 --conf spark.pyspark.python=/tmp/pyspark-env/bin/python"
The in this example creates a Python environment using the default Python version shipped with Altus and installs the NumPy package. It has the following content:

echo "Provisioning pyspark environment ..."

virtualenv ${target}
${mypip} install numpy

if [ $? -eq 0 ]; then
    echo "Successfully installed new python environment at ${target}"
    echo "Failed to install custom python environment at ${target}"

Connecting to a Cluster

You can access a cluster created in Altus in the same way that you access other CDH clusters. You can use SSH to connect to a service port in the cluster. If you use SSH, you might need to modify your security group to allow an SSH connection to your instances from the public Cloudera IP addresses.

For more information about setting up an SSH connection to the cluster, see SSH Connection.

You can use the Altus client to set up a SOCKS proxy server to access the Cloudera Manager instance in the cluster. For more information, see SOCKS Proxy.

Checking the Status of a Cluster

When you create a cluster, you can immediately check its status. If the cluster creation process is not yet complete, you can view information regarding the progress of cluster creation.

You can use the following command to display information about a cluster:

altus dataeng describe-cluster 

cluster-name is a required parameter.

Deleting a Cluster

You can use the following command to delete a cluster:

altus dataeng delete-cluster