Creating and Working with Clusters on the Console

You can create a cluster on the Cloudera Altus Console. You can also view the status and configuration of all clusters created through Altus in your cloud provider account.

Creating a Data Engineering Cluster for AWS

To create an Altus Data Engineering cluster on the console for AWS:
  1. Sign in to the Cloudera Altus console:

    https://console.altus.cloudera.com/

  2. On the side navigation panel, click Clusters.

    By default, the Clusters page displays the list of all the Altus Data Engineering clusters in your Altus account. The cloud icon next to the cluster name indicates the cloud service provider for the cluster. You can filter the list by environment and status. You can also search for clusters by name.

  3. Click Create Cluster.
  4. In the General Information section, specify the following information:
    Property Description
    Cluster Name The name to identify the cluster that you are creating. The cluster name is an alphanumeric string of any length. It can include dashes (-) and underscores (_). It cannot include a space.
    Service Type Indicates the service to be installed on the cluster. Select the service based on the types of jobs you plan to run on the cluster. You can select from the following service types:
    • Hive
    • Hive on Spark
    • Spark 2.x
    • Spark 1.6

      Select Spark1.6 only if your application specifically requires Spark version 1.6. Altus supports Spark 1.6 only on CDH 5.11.

    • MapReduce2
    • Multi

      A cluster with service type Multi allows you to run different types of jobs. You can run the following types of jobs in a Multi cluster: Spark2.x, Hive, MapReduce2.

    CDH Version The CDH version that the cluster will use.
    You can select from the following CDH versions:
    • CDH 6.1
    • Any version from CDH 5.11 to CDH 5.15
    The CDH version that you select can affect the service that runs on the cluster:
    Spark 2.x or Spark 1.6
    For a Spark service type, you must select the CDH version that supports the selected Spark version. Altus supports the following combinations of CDH and Spark versions:
    • CDH 6.1 with Spark 2.4
    • CDH 5.12 or later 5.x versions with Spark 2.2
    • CDH 5.11 with Spark 2.1 or Spark 1.6
    Hive on Spark
    On CDH version 5.13 or later, dynamic partition pruning (DPP) is enabled for Hive on Spark by default. For details, see Dynamic Partition Pruning for Hive Map Joins in the Cloudera Enterprise documentation set.
    The CDH version that you select affects the SDX namespace you can use with the cluster:
    CDH 6.1
    You can use a CDH 6.1 cluster only with a configured SDX namespace that points to version 6.1 of the Hive metastore and Sentry databases.
    CDH 5.x
    You can use a CDH 5.x cluster only with a configured SDX namespace that points to version 5.x of the Hive metastore and Sentry databases.

    The Cloudera Navigator integration option is not available for Altus Data Engineering clusters with CDH 5.11 or CDH 6.x.

    Environment Name of the Altus environment that describes the resources to be used for the cluster. The Altus environment specifies the network and instance settings for the cluster.

    If a lock icon appears next to the environment name, clusters that you create using this environment are secure.

    If you do not know which Altus environment to select, check with your Altus administrator.

  5. In the Node Configuration section, specify the number of workers to create and the instance type to use for the cluster.
    Property Description
    Worker The worker nodes in a cluster can run data storage and computational processes. For more information about worker nodes, see Worker Nodes.
    You can configure the following properties for the worker node:
    Instance Type
    Select the instance type from the list of supported instance types.

    Default: m4.xlarge (16.0 GB 4 vCPUs)

    Number of Nodes
    Select the number of worker nodes to include in the cluster. A cluster must have a minimum of 3 worker nodes.

    Default: 5

    EBS Storage
    In the EBS Volume Configuration window, configure the following properties for the EBS Volume:
    • Storage Type. Select the EBS volume type best suited for the job you want to run.
    • Storage Size. Set the storage size of the EBS volume expressed in gibibyte (GiB).
    • Volumes per Instance. Set the number of EBS volumes for each instance in the worker node. All EBS volumes are configured with the same volume size and type.
    If you do not configure the EBS volumes, Altus sets the optimum configuration for the EBS volumes based on the service type and instance type.

    For more information about Amazon EBS, see Amazon EBS Product Details on the AWS website.

    Purchasing Option
    By default, the worker nodes use On-Demand instances. You cannot modify the worker nodes to use Spot instances.
    Compute Worker In addition to the worker nodes, an Altus cluster can have compute worker nodes. Compute worker nodes run only computational processes. For more information about compute worker nodes, new see Worker Nodes.
    You can configure the following properties for the compute worker node:
    Instance Type
    You cannot directly modify the instance type for a compute worker node.
    Number of Nodes
    Select the number of compute worker nodes to include in the cluster.

    Default: 0

    EBS Storage
    In the EBS Volume Configuration window, configure the following properties for the EBS Volume:
    • Storage Type. Select the EBS volume type best suited for the job you want to run.
    • Storage Size. Set the storage size of the EBS volume expressed in gibibyte (GiB).
    • Volumes per Instance. Set the number of EBS volumes for each instance in the worker node. All EBS volumes are configured with the same volume size and type.
    If you do not configure the EBS volumes, Altus sets the optimum configuration for the EBS volumes based on the service type and instance type.

    For more information about Amazon EBS, see Amazon EBS Product Details on the AWS website.

    Purchasing Option
    Select whether to use On-Demand instances or Spot instances. If you use Spot instances, you must specify the spot price.

    For more information about using Spot instances for compute worker nodes, see Spot Instances.

    Master Altus configures the master node for the cluster. You cannot modify the master node configuration.
    By default, Altus sets the following configuration for the master node:
    Instance Type
    m4.xlarge (16.0 GB 4 vCPUs)
    Number of Nodes
    1
    EBS Storage
    Altus sets the optimum configuration for the master node based on the service type and instance type.
    Purchasing Option
    On-Demand instance
    Cloudera Manager Altus configures the Cloudera Manager instance for the cluster. You cannot modify the Cloudera Manager instance configuration.
    By default, Altus sets following configuration for the Cloudera Manager instance:
    Instance Type
    c4.2xlarge (15 GB 8 vCPUs)
    Number of Nodes
    1
    EBS Storage
    Altus sets the optimum configuration for the Cloudera Manager node based on the service type and instance type.
    Purchasing Option
    On-Demand instance
  6. In the Credentials section, provide the credentials for the user account to log in to Cloudera Manager.
    Property Description
    Public SSH Key You use an SSH key to access instances in the cluster that you are creating. You can provide a public key that Altus will add to the authorized_keys file on each node in the cluster. To connect to the cluster through SSH, use the private key that corresponds to the public key.

    Select File Upload to upload a file that contains the public key or select Direct Input to enter the full key code.

    If you select Skip and you do not provide an SSH public key, you cannot access the cluster through SSH or access the Cloudera Manager instance through a SOCKS proxy.

    For more information about connecting to Altus clusters through SSH, see SSH Connection.

    Cloudera Manager Access Altus creates a read-only user account that you can use to o access the Cloudera Manager instance in the cluster. You can allow Altus to generate the user name and password for the user account or you can specify the user name and password for the account.

    To allow Altus to generate the credentials, select Auto-generate. After you click Create Cluster, Altus displays a window with the user name and password for the Cloudera Manager instance. Save the credentials before you close the window.

    To specify the user credentials, click Customize. Specify the user name and password for the user account and then confirm the password. Take note of the user name and password that you specify for the Cloudera Manager user account.

  7. In the Advanced Settings section, set the following optional properties:
    Property Description
    Instance bootstrap script Bootstrap script that is executed on all the cluster instances immediately after start-up before any service is configured and started. You can use the bootstrap script to install additional OS packages or application dependencies.

    You cannot use the bootstrap script to change the cluster configuration.

    Select File Upload to upload a script file or select Direct Input to type the script on the screen.

    The bootstrap script must be a local file. It can be in any executable format, such as a Bash shell script or Python script. The size of the script cannot be larger than 4096 bytes.

    Resource Tags Tags that you define and that you want Altus to append to the cluster that you are creating. Altus appends the tags you define to the nodes and resources associated with the cluster.

    You create the tag as a name-value pair. Click + to add a tag name and set the value for that tag. Click - to delete a tag from the list.

    By default, Altus appends tags to the cluster instance to make it easy to identify nodes in a cluster. When you define tags for the cluster, Altus adds your tags in addition to the default tags.

    For more information about the tags that Altus appends to the cluster, see Altus Tags.

  8. Verify that all required fields are set and click Create Cluster.

    The Data Engineering service creates a CDH cluster with the configuration you set. On the Clusters page, the new cluster displays at the top of the list of clusters.

Creating a Data Engineering Cluster for Azure

To create an Altus Data Engineering cluster on the console for Azure:
  1. Sign in to the Cloudera Altus console:

    https://console.altus.cloudera.com/

  2. On the side navigation panel, click Clusters.

    By default, the Clusters page displays the list of all the Altus Data Engineering clusters in your Altus account. The cloud icon next to the cluster name indicates the cloud service provider for the cluster. You can filter the list by environment and status. You can also search for clusters by name.

  3. Click Create Cluster.
  4. In the General Information section, specify the following information:
    Property Description
    Cluster Name The name to identify the cluster that you are creating. The cluster name is an alphanumeric string of any length. It can include dashes (-) and underscores (_). It cannot include a space.
    Service Type Indicates the service to be installed on the cluster. Select the service based on the types of jobs you plan to run on the cluster. You can select from the following service types:
    • Hive
    • Hive on Spark

      Dynamic partition pruning (DPP) is enabled for Hive on Spark by default. For details, see Dynamic Partition Pruning for Hive Map Joins in the Cloudera Enterprise documentation set.

    • Spark 2.x

      Altus supports Spark 2.2 in clusters with CDH 5.x and Spark 2.4 in clusters with CDH 6.1.

    • Spark 1.6

      Select Spark 1.6 only if your application specifically requires Spark version 1.6.

    • MapReduce2
    • Multi

      A cluster with service type Multi allows you to run different types of jobs. You can run the following types of jobs in a Multi cluster: Spark2.x, Hive, MapReduce2.

    CDH Version The CDH version that the cluster will use.

    Altus supports CDH 5.14, CDH 5.15, and CDH 6.1.

    The CDH version that you select affects how you use the cluster:
    CDH 6.1
    • You can use a CDH 6.1 cluster only with a configured SDX namespace that points to version 6.1 of the Hive metastore and Sentry databases.
    • For clusters with CDH 6.1, Altus archives logs to ADLS Gen1 or Gen2, based on the folder you specify.
    CDH 5.x
    • You can use a CDH 5.x cluster only with a configured SDX namespace that points to version 5.x of the Hive metastore and Sentry databases.
    • For clusters with CDH 5.x, Altus archives logs to ADLS Gen1.
    Environment Name of the Altus environment that describes the resources to be used for the cluster. The Altus environment specifies the network and instance settings for the cluster.

    If you do not know which Altus environment to select, check with your Altus administrator.

  5. In the Node Configuration section, specify the configuration of the nodes in the cluster.
    Property Description
    Worker The worker nodes in a cluster can run data storage and computational processes.
    You can configure the following properties for the worker nodes:
    Instance Type
    Select the instance type to use for the worker nodes in the cluster. You can use one of the following instance types:
    • Standard_D4S_v3 16 GiB with 4v CPU
    • Standard_D8S_v3 32 GiB with 8v CPU
    • Standard_D16S_v3 64 GiB with 16v CPU
    • Standard_D32S_v3 128 GiB with 32v CPU
    • Standard_D64S_v3 256 GiB with 64v CPU
    • Standard_DS12_v2 28 GiB with 4v CPU
    • Standard_DS13_v2 56 GiB with 8v CPU
    • Standard_DS14_v2 112 GiB with 16v CPU
    • Standard_DS15_v2 140 GiB with 20v CPU
    • Standard_E4S_v3 32 GiB with 4v CPU
    • Standard_E8S_v3 64 GiB with 8v CPU
    • Standard_E16S_v3 128 GiB with 16v CPU
    • Standard_E32S_v3 256 GiB with 32v CPU
    • Standard_E64S_v3 432 GiB with 64v CPU

    Altus uses the same instance type for all the worker nodes in the cluster.

    Number of Nodes
    Select the number of worker nodes to include in the cluster. A cluster must have a minimum of 3 worker nodes.

    Default: 5

    Disk Configuration
    In the Disk Configuration window, configure the following properties for the disk:
    • Storage Type. Select the storage type best suited for the job you want to run, premium or standard.
    • Storage Size. Set the storage size of the disk expressed in gibibyte (GiB).
    • Disks per Instance. Set the number of disks for each instance in the worker node.
    If you do not change the disk configuration, Altus sets the optimum configuration for the disks based on the service type and instance type.

    For more information about Azure Managed Disks, see Managed Disks on the Azure website.

    Master Altus configures the master node for the cluster. You cannot modify the master node configuration.
    By default, Altus sets the following configuration for the master node:
    Instance Type
    Standard_DS12_v2 56 GiB with 4v CPU
    Number of Nodes
    1
    Disk Configuration
    Altus sets the optimum configuration for the master node based on the service type and instance type.
    Cloudera Manager Altus configures the Cloudera Manager node for the cluster. You cannot modify the Cloudera Manager node configuration.
    By default, Altus sets following configuration for the Cloudera Manager node:
    Instance Type
    Standard_DS12_v2 56 GiB with 4v CPU
    Number of Nodes
    1
    Disk Configuration
    Altus sets the optimum configuration for the Cloudera Manager node based on the service type and instance type.
  6. In the Credentials section, provide the credentials for the user account to log in to Cloudera Manager.
    Property Description
    Public SSH Key You use an SSH key to access instances in the cluster that you are creating. You can provide a public key that Altus will add to the authorized_keys file on each node in the cluster. To connect to the cluster through SSH, use the private key that corresponds to the public key.

    Select File Upload to upload a file that contains the public key or select Direct Input to enter the full key code.

    If you select Skip and you do not provide an SSH public key, you cannot access the cluster through SSH or access the Cloudera Manager instance through a SOCKS proxy.

    For more information about connecting to Altus clusters through SSH, see SSH Connection.

    Cloudera Manager Access Altus creates a read-only user account that you can use to o access the Cloudera Manager instance in the cluster. You can allow Altus to generate the user name and password for the user account or you can specify the user name and password for the account.

    To allow Altus to generate the credentials, select Auto-generate. After you click Create Cluster, Altus displays a window with the user name and password for the Cloudera Manager instance. Save the credentials before you close the window.

    To specify the user credentials, click Customize. Specify the user name and password for the user account and then confirm the password. Take note of the user name and password that you specify for the Cloudera Manager user account.

  7. In the Advanced Settings section, set the following optional properties:
    Property Description
    Instance bootstrap script Bootstrap script that is executed on all the cluster instances immediately after start-up before any service is configured and started. You can use the bootstrap script to install additional OS packages or application dependencies.

    You cannot use the bootstrap script to change the cluster configuration.

    Select File Upload to upload a script file or select Direct Input to type the script on the screen.

    The bootstrap script must be a local file. It can be in any executable format, such as a Bash shell script or Python script. The size of the script cannot be larger than 4096 bytes.

    Resource Tags Tags that you define and that you want Altus to append to the cluster that you are creating. Altus appends the tags you define to the nodes and resources associated with the cluster.

    You create the tag as a name-value pair. Click + to add a tag name and set the value for that tag. Click - to delete a tag from the list.

    By default, Altus appends tags to the cluster instance to make it easy to identify nodes in a cluster. When you define tags for the cluster, Altus adds your tags in addition to the default tags.

    For more information about the tags that Altus appends to the cluster, see Altus Tags.

  8. Verify that all required fields are set and click Create Cluster.

    The Altus Data Engineering service creates a CDH cluster with the configuration you set. On the Clusters page, the new cluster displays at the top of the list of clusters.

Viewing the Cluster Status

To view the status of clusters on the console:
  1. Sign in to the Cloudera Altus console:

    https://console.altus.cloudera.com/

  2. On the side navigation panel, click Clusters.

    By default, the Clusters page displays the list of all the Altus Data Engineering clusters in your Altus account. The cloud icon next to the cluster name indicates the cloud service provider for the cluster. You can filter the list by environment and status. You can also search for clusters by name.

    The Clusters list shows the following information:
    • Cluster name
    • Status

      For more information about the different statuses that a cluster can have, see Cluster Status.

    • Service type for the cluster
    • Number of worker nodes
    • Instance type for the cluster
    • Date and time the cluster was created in Altus
    • Version of CDH that runs in the cluster.
  3. You can click the Actions button for a cluster to perform the following tasks:
    • Submit Jobs. Select this action to submit one or more jobs to run on the cluster.
    • Clone Cluster. Select this action to create a cluster of the same type and characteristics as the cluster that you are viewing. On the Create Cluster page, you can create a cluster with the same properties as the cluster you are cloning. You can modify or add to the properties before you create the cluster.
    • Delete Cluster. Select this action to terminate the cluster.
  4. To view the details of a cluster, click the name of the cluster you want to view.

    The Cluster Details page displays information about the cluster in more detail, including the list of jobs in the cluster.

Viewing the Cluster Details

To view the details of a cluster on the console:
  1. Sign in to the Cloudera Altus console:

    https://console.altus.cloudera.com/

  2. On the side navigation panel, click Clusters.

    By default, the Clusters page displays the list of all the Altus Data Engineering clusters in your Altus account. The cloud icon next to the cluster name indicates the cloud service provider for the cluster. You can filter the list by environment and status. You can also search for clusters by name.

  3. Click the name of a cluster.

    You can click Submit Jobs to run jobs on the cluster. Click View Jobs to go to the Jobs page and view the list of all jobs on the cluster. Clear the filter to view all jobs in the Altus account.

    The details page for the selected cluster displays the status of the cluster and the following information:
    Cluster Status
    The details page displays information appropriate for the status of the cluster. For example, if a cluster failed at creation time, the details page displays the failure message that explains the reason for the failure, but does not display a link to the Cloudera Manager instance.
    Submit Jobs and View Jobs
    The Submit Jobs and View Jobs links take you to the Jobs page. You can view the jobs on the cluster or create and submit jobs to run on the cluster. For more information about the Jobs page, see Running and Monitoring Jobs on the Console.
    Cloudera Manager Configuration
    The Cloudera Manager Configuration section displays the instance type and connection details for the Cloudera Manager instance.

    The cluster details page displays the private IP address assigned to the Cloudera Manager instance in the cluster. If the Public IPs option for the environment used to create the cluster is enabled, the page also displays the public IP addresses. You can log in to Cloudera Manager through the public or private IP. If the public IP addresses are available, you can click a link to view the Altus command to set up a SOCKS proxy server to access the Cloudera Manager instance in the cluster.

    The Cloudera Manager Configuration section appears only if the Cloudera Manager instance is accessible. The Cloudera Manager instance might not be accessible when the cluster status is Creating or when the cluster failed at creation time.

    Node Configuration
    The Node Configuration section displays the configuration of the nodes in the cluster.
    For a cluster on AWS, the section displays the configuration of the master node, worker nodes, and any compute worker node that you add to the cluster. The section displays the number of nodes and their instance types, the EBS volume configuration and the pricing option used to acquire the instance. If the cluster does not have compute worker nodes, the section displays zero for the number of compute worker nodes, but shows the default settings that the Altus Data Engineering service uses for compute worker nodes.
    For a cluster on Azure, the section displays the configuration of the master node and worker nodes. The section displays the number of nodes, their instance types and storage volume configuration, and the number of disks per instance.
    Cluster Details
    • Log Archive Location shows where the cluster and job logs are archived.
    • Termination condition shows the action that Altus takes when all jobs in the cluster complete.
    • Uses instance bootstrap script? shows whether a bootstrap script runs before cluster startup.
    • Security shows whether the cluster is secure or not, based on the setting for the Secure Clusters option in the environment.
    • Resource Tags shows the resource tags set up for the cluster.
    Service Type and other key information
    • Service Type shows the service that runs in the cluster.
    • Creation Time shows the time when a user created the cluster in Altus.
    • Total Nodes shows the number of nodes in the cluster.

      For a cluster on AWS, the total number of nodes includes the master node, worker nodes, and compute worker nodes. The number does not include the Cloudera Manager instance. If the compute worker nodes use Spot instances, the number of compute worker nodes available might not be equivalent to the number of compute worker nodes configured for the cluster. The section shows the number of nodes available in the cluster and the total number of nodes configured for the cluster.

      For a cluster on Azure, the total number of nodes includes the number of master and worker nodes but not the Cloudera Manager instance.

      To view information about the nodes, click View. The Instances window displays the list of instances in the cluster, their instance IDs and IP addresses, and their roles in the cluster. The list of instances does not include the Cloudera Manager instance.

    • Environment displays the name of the Altus environment used to create the cluster.
    • Region indicates the region where the cluster is created.
    • CDH Version shows the version of CDH in the cluster.
    • CRN shows the Cloudera Resource Name (CRN) assigned to the cluster. Because the CRN is a long string of characters, Altus provides a copy icon so you can easily copy the CRN for any purpose.

Deleting a Cluster

To delete a cluster on the console:
  1. Sign in to the Cloudera Altus console:

    https://console.altus.cloudera.com/

  2. On the side navigation panel, click Clusters.

    By default, the Clusters page displays the list of all the Altus Data Engineering clusters in your Altus account. The cloud icon next to the cluster name indicates the cloud service provider for the cluster. You can filter the list by environment and status. You can also search for clusters by name.

  3. Click the name of the cluster to terminate.

    On the Cluster details page, review the cluster information to verify that it is the cluster that you want to terminate.

  4. Click Actions and select Delete Cluster.
  5. Click OK to confirm that you want to terminate the cluster.