Troubleshooting

To troubleshoot CDH clusters and jobs in Altus, you use the same tools you use to troubleshoot jobs in on-premises CDH deployments. You can use Cloudera Manager to view logs to help you understand how a job was executed or why it failed.

SSH Connection

In AWS, the security group that you create and specify for your EC2 instances functions as a firewall to prevent unwanted access to your cluster and Cloudera Manager. To improve security, Cloudera recommends that you do not configure security groups to allow internet access through the public IP addresses of your instances. You can configure security groups to SSH from the public Cloudera Altus IP addresses to your instances.

For more information about the Cloudera Altus IP addresses, see AWS Resources and Services.

To connect to a CDH cluster, the security group inbound rules must be configured to allow SSH access from your machine. You can use the IP address of your machine or contact IT to get the range of IP addresses used by your organization.

To configure the security group to allow SSH access to a cluster:
  1. In the AWS console, browse to VPC > Security Groups and find the security group you created for the Altus environment.

    Verify that you are in the correct region.

  2. On the Inbound Rules tab, edit the security group and add another rule of type SSH.
  3. Set the port number to 22.
  4. Set the Source property to the IP address or range of IP addresses in your organization from which you want to connect.

    If you do not know your IP address, select My IP from the list.

  5. Save the rule.

SOCKS Proxy

Cloudera recommends that you connect to your cluster using a SOCKS proxy server. A SOCKS proxy server allows a client to connect directly and securely to a server and, from there, to servers on other IP addresses and ports in the same subnet. For example, the SOCKS proxy server allows your browser to securely connect to the Cloudera Manager Admin Console and to the YARN History Server and the Spark History Server on the same subnet.

If you use a subnet that is connected to a corporate VPN, you can connect to the cluster directly through the private IP address of Cloudera Manager. However, if DNS domain resolution is not configured properly, you might not be able to seamlessly navigate to all the different web interfaces. In that case, Cloudera recommends that you set up a SOCKS proxy as if the VPN tunnel does not exist. Your browser will forward the DNS resolution over the SOCKS connection.

You can use the Altus client to set up a SOCKS proxy server to access the EC2 instance that hosts Cloudera Manager through SSH. If you use the Chrome browser, you can include a parameter in the command to open Cloudera Manager in Chrome. If you are familiar with SOCKS proxy connections, you can also manually set up the server with the SSH command.

When you create a SOCKS proxy, you must add the appropriate IP address to the security groups associated with the EC2 instances in the Altus clusters in your AWS account, as defined in the Altus environment.

To set up the SOCKS proxy server for use with Altus, perform the following steps:
  1. Configure the security group associated with the cluster in your AWS account to allow connections through a SOCKS proxy.
  2. Set up a SOCKS proxy.

    You can use the Altus client command to create the SOCKS proxy or use the SSH command.

Configuring the Security Group to Allow a SOCKS Proxy Connection

Security groups control connections to EC2 instances. You must set up rules in the security group associated with an Altus cluster to allow connections to the cluster from the SOCKS proxy.

To configure the security group to allow a SOCKS proxy connection to a cluster:
  1. In the AWS console, browse to VPC > Security Groups and find the security group you created for the Altus environment.

    Verify that you are in the correct region.

  2. On the Inbound Rules tab, edit the security group and add another rule of type SSH.
  3. Set the port number to 22.
  4. Set the Source property to the IP address or range of IP addresses in your organization from which you want to connect.

    If you do not know your IP address, select My IP from the list.

  5. Save the rule.

Setting Up a SOCKS Proxy Using the Altus Client

When you run the command to set up a SOCKS proxy server to access the instance that hosts Cloudera Manager, you can also optionally set the command to immediately open Cloudera Manager in a Chrome browser.

Set up a SOCKS proxy connection to the Cloudera Manager instance for each cluster that you want to connect to.

To set up a SOCKS proxy connection to Cloudera Manager, use the following command:
altus dataeng socks-proxy --cluster-name NameOfYourCluster --ssh-private-key PathAndFileNameOfYourPrivateKey
To immediately open Cloudera Manager in a Google Chrome browser, include the following parameter:
--open-cloudera-manager yes

The command uses port 1080 for the connection to the proxy server.

Cloudera Manager Connection

Clusters created through Cloudera Altus are configured to allow read-only access to Cloudera Manager. When you run the command to create the cluster, you can set the username and password for the read-only account. If you do not provide a username and password, the Data Engineering service generates a guest username and password.

$ altus dataeng create-aws-cluster [...] --cloudera-manager-username guest --cloudera-manager-password <PASSWORD-FOR-GUEST-USER>
When Altus generates the credentials, the server response for the create-aws-cluster command returns the username and password in plain text. For example:
{
    "cluster": {
        "status": "CREATING",
        "serviceType": "HIVE",

   [...]
        "instanceType": "m4.xlarge",
        "cdhVersion": "CDH511"
    },
    "clouderaManagerPassword": "RANDOM_GENERATED_PASSWORD",
    "clouderaManagerUsername": "guest"
}

Save the generated credentials immediately after you run the create-aws-cluster command. The credentials are not exposed through the list-clusters or describe-cluster commands.

History Servers

Cloudera Manager stores information about jobs in history servers. You can use the read-only account created with the cluster to log in to Cloudera Manager and view information about the jobs running on the cluster.

Connecting to the YARN History Server

Use the YARN History Server to monitor the YARN jobs that run on your clusters. You can navigate to the YARN History Server from the Cloudera Manager Admin Console.

To go to the YARN History Server:
  1. Log in to Cloudera Manager with the read-only account generated when the cluster was created.
  2. On the Cloudera Manager Admin Console home page, click the YARN-1 service.

    This takes you to a page that shows the status of the YARN service processes and a set of useful metrics.

  3. On the YARN service page, click Web UI.

    The YARN History Server is also useful for debugging Spark applications.

  4. To view past jobs, click History Server Web UI.

    To view jobs that are currently running, click Resource Manager Web UI.

When you use the YARN History Server, you can drill down to individual task attempts and MapReduce jobs and see the related logs. Cloudera Manager provides additional log searching utilities and metrics that are useful for debugging.

For more information about YARN job monitoring, see Monitoring YARN Applications in the Cloudera Manager documentation.

Connecting to the Spark History Server

Use the Spark History Server to monitor Spark jobs that run on your clusters. You can navigate to the Spark History Server from the Cloudera Manager Admin Console.

To go to the Spark History Server:
  1. Log in to Cloudera Manager with the read-only account generated when the cluster was created.
  2. On the Cloudera Manager Admin Console home page, click the SPARK_ON_YARN-1 service.

    This takes you to a page that shows the status of the Spark service processes and a set of useful metrics.

  3. On the Spark on YARN service page, click History Server Web UI.

    The YARN History Server is also useful for debugging Spark applications.

For more information about Spark job monitoring, see Monitoring Spark Applications in the Cloudera Manager documentation.

Debugging Jobs Executed on a Terminated Cluster

If you need to debug a job that ran on a cluster that was subsequently terminated, you can take the following steps:
Create a support case.
When a cluster is terminated, Cloudera automatically collects diagnostic data in a support bundle. Although the bundle does not include logs specific to the workload, it is useful for debugging cluster setup issues. When you create a support case, Cloudera can review the data in the support bundle and provide information about the cluster setup that might help your debugging efforts.
Inspect the cluster logs in the log archive S3 bucket.
If the Altus environment used for the cluster is configured with an S3 bucket for archiving logs, you can view the logs for the terminated cluster.

Run the describe-cluster command to get the log archive bucket name for the cluster where the job ran. Log in to the AWS console and view the job and service daemon logs in the S3 bucket.

Altus Tags

When Altus creates a data engineering cluster, it appends tags to the cluster instance to make it easy to identify nodes in a cluster. When you view data engineering clusters in your AWS account, you can view the tags to identify the nodes in the cluster.

Altus appends the following tags to each node in a cluster:
  • Cloudera-Cluster-Role. Identifies the node type with one of the following values:
    • Master
    • Worker
    • Cloudera Manager
  • Cloudera-Resource-Name. A Cloudera resource name (CRN) identifies a resource created within Altus. The CRN that is appended to a node is the CRN of the cluster to which the node belongs. All nodes in the cluster have the same CRN.
  • Name. Internal identifier for the cluster to which the node belongs. This tag is used to track the node within Altus.
  • Cloudera-Altus-Id. Tag used internally by services within Altus.
  • Cloudera-Altus-Template-Name. Tag used internally by services within Altus.

To see the Altus tags for an instance in AWS, select an EC2 instance in the cluster and go to the Tags tab.

Altus Components

When you create an Altus environment using Quickstart or when you create a cluster, Altus creates components in your AWS account. When you run a job, Altus can create components in your AWS account depending on your environment or cluster configuration. You can view the components created by Altus in your AWS accounts.

When you delete a cluster, Altus deletes some of the components that it creates. You must manually delete the components that Altus does not delete by default. For example, Altus does not delete the CloudFormation stack that is created when you use the Quickstart to create an Altus environment. If you do no longer need the resource for the Altus environment, you can delete the resource stack.

Altus creates, but does not delete, the following components in your AWS account:
CloudFormation stack
If you use Quickstart to create an Altus environment, Altus uses the AWS CloudFormation service to create a stack of resources and builds the Altus environment from the stack. The name of the resource stack created by Altus is the AWS stack name that you specify in Quickstart.
Cluster and job logs
If you enable the option to archive workload logs, Altus writes cluster and job logs to the Amazon S3 bucket that you specify. The names of the log files include the name of the cluster for which the logs are created.
S3Guard DynamoDB table
If you enable the S3Guard option for Amazon S3 buckets, Altus requires an Amazon DynamoDB table to store metadata and ensure that data written to S3 is immediately available for processing. When you create a cluster, the cluster creates an Amazon DynamoDB table if a table is not available. The cluster assigns the DynamoDB table the name that you specify when you enable the option. If you do not specify a name, the cluster assigns the default name s3guard-metadata.
Altus creates the following components in your AWS account and deletes them when they are no longer needed:
EC2 instances
When you create a cluster, Altus creates EC2 instances in your AWS account for the nodes in the cluster. It creates a master node, multiple worker nodes, and a node for Cloudera Manager. To identify them as nodes in an Altus cluster, the instances that Altus creates have Cloudera resource names (CRN) that start with crn:altus:dataeng:....

The EC2 instances created for Altus clusters also have tags that identify their roles in the cluster. For more information about the Altus tags, see Altus Tags.

SSH key registration with EC2
When you create an Altus environment, you can enable the option for Altus to auto-register your private SSH key. When you create a cluster and you provide an unregistered private key, Altus imports your unregistered key into EC2 and registers it in the region where Altus creates the cluster.

If you provide a public key when you create a cluster, no registration is required. Cloudera recommends that you provide a public key instead of a private key when you create a cluster.

If you encounter a problem where you need to delete the components created by Altus, you can search for and delete the components in your AWS account.

Creating a Support Case

From the Cloudera Altus console, you can file a ticket to Cloudera Support for help in troubleshooting. To expedite problem resolution, provide as much information as possible in the support ticket.

To file a support ticket:
  1. Sign in to the Cloudera Altus console:

    https://console.altus.cloudera.com/

  2. Click Support and select Support Center.
  3. On the Support Center page, click one of the following selections based on the type of support you require:
    • Billing. For questions regarding billing issues with your Altus account.
    • Service Limits. For requests to raise Altus resource limits, such as the number of users, clusters, and jobs. For more information about the Altus resource limits, see Altus Resource Limits.
    • Technical Support. For requests for help on problems with Altus components, including creating environments, configuring authorization, clusters, or jobs.
      If your technical support issue relates to the Altus Data Engineering component, provide the following information:
      • Select the area in Altus where you encounter the problem.
      • Select whether to use the diagnostic logs that Altus collects for the clusters.

        If selected, Altus generates a diagnostic bundle for the cluster you specify and forwards the bundle to Cloudera Technical Support.

      • If you select to use the diagnostic bundle, provide the CRN of the cluster that you need help with.

        The CRN (Cloudera Resource Name) is a unique identifier for a resource created in Altus. It includes details about the cluster and provides relevant information to support engineers.

  4. Click File a Ticket.
  5. On the Cloudera support window, click fill out the details of the ticket.

    Describe the issue and provide your contact information.

  6. Click Submit.

On the Cloudera support window, you can access the Cloudera product documentation, Knowledge Base, and other resources. You can also access the Altus community page to post questions and comments.