Troubleshooting

To troubleshoot CDH clusters and jobs in Altus, you use the same tools you use to troubleshoot jobs in on-premises CDH deployments. You can use Cloudera Manager to view logs to help you understand how a job was executed or why it failed.

SOCKS Proxy

Cloudera recommends that you connect to your cluster using a SOCKS proxy server. A SOCKS proxy server allows a client to connect directly and securely to a server and, from there, to servers on other IP addresses and ports in the same subnet. For example, the SOCKS proxy server allows your browser to securely connect to the Cloudera Manager Admin Console and to the YARN History Server and the Spark History Server on the same subnet.

If you use a subnet that is connected to a corporate VPN, you can connect to the cluster directly through the private IP address of Cloudera Manager. However, if DNS domain resolution is not configured properly, you might not be able to seamlessly navigate to all the different web interfaces. In that case, Cloudera recommends that you set up a SOCKS proxy as if the VPN tunnel does not exist. Your browser will forward the DNS resolution over the SOCKS connection.

You can use the Altus client to set up a SOCKS proxy server to access Cloudera Manager through SSH. If you use the Chrome browser, you can include a parameter in the command to open Cloudera Manager in Chrome. If you are familiar with SOCKS proxy connections, you can also manually set up the server with the SSH command.

When you create a SOCKS proxy, you must add the appropriate IP address to the security groups in your cloud provider account.

In AWS, add the IP address to the security groups associated with the EC2 instances in your Altus clusters, as defined in the Altus environment.

In Azure, add the IP address to the network security groups associated with the Altus clusters in your Azure subscription, as defined in the Altus environment.

To set up the SOCKS proxy server for use with Altus, perform the following steps:
  1. Configure the security group associated with the cluster to allow connections through a SOCKS proxy.
  2. Set up a SOCKS proxy.

    You can use the Altus client command to create the SOCKS proxy or use the SSH command.

Step 1. Configure the Security Group to Allow a SOCKS Proxy Connection

You configure the security group on your cloud service account. You can use the AWS console or the Azure portal to configure the security group.

Configuring the Security Group in AWS

Security groups control connections to EC2 instances. You must set up rules in the security group associated with an Altus cluster to allow connections to the cluster from the SOCKS proxy.

To configure the security group to allow a SOCKS proxy connection to a cluster:
  1. In the AWS console, browse to VPC > Security Groups and find the security group you created for the Altus environment.

    Verify that you are in the correct region.

  2. On the Inbound Rules tab, edit the security group and add another rule of type SSH.
  3. Set the port number to 22.
  4. Set the Source property to the IP address or range of IP addresses in your organization from which you want to connect.

    If you do not know your IP address, select My IP from the list.

  5. Save the rule.

Configuring the Network Security Group in Azure

Network security groups control connections to your cluster instances. You must set up rules in the network security group associated with an Altus cluster to allow connections to the cluster from the SOCKS proxy.

For more information about creating rules in a network security group on the Azure portal, see the instructions in the Microsoft Azure documentation.

Step 2. Set Up a SOCKS Proxy

When you run the command to set up a SOCKS proxy server to access the instance that hosts Cloudera Manager, you can also optionally set the command to immediately open Cloudera Manager in a Chrome browser.

Set up a SOCKS proxy connection to the Cloudera Manager instance for each cluster that you want to connect to.

To set up a SOCKS proxy connection to Cloudera Manager, use the following command:
altus dataeng socks-proxy --cluster-name NameOfYourCluster --ssh-private-key PathAndFileNameOfYourPrivateKey
To immediately open Cloudera Manager in a Google Chrome browser, include the following parameter:
--open-cloudera-manager yes

The command uses port 1080 for the connection to the proxy server.

Cloudera Manager Connection

Clusters created through Cloudera Altus are configured to allow read-only access to Cloudera Manager. When you run the command to create the cluster, you can set the username and password for the read-only account. If you do not provide a username and password, the Altus Data Engineering service generates a guest username and password.

Use the cloudera-manager-username and cloudera-manager-password parameters to set the Cloudera Manager credentials:

AWS cluster:
$ altus dataeng create-aws-cluster [...] --cloudera-manager-username guest --cloudera-manager-password <PASSWORD-FOR-GUEST-USER>
Azure cluster:
$ altus dataeng create-azure-cluster [...] --cloudera-manager-username guest --cloudera-manager-password <PASSWORD-FOR-GUEST-USER>

When Altus generates the credentials, the server response to the create-aws-cluster or create-azure-clustercommand returns the username and password in plain text.

For example, the response to the create-aws-cluster command would be similar to the following text:
{
    "cluster": {
        "status": "CREATING",
        "serviceType": "HIVE",

   [...]
        "instanceType": "m4.xlarge",
        "cdhVersion": "CDH513"
    },
    "clouderaManagerPassword": "RANDOM_GENERATED_PASSWORD",
    "clouderaManagerUsername": "guest"
}

Save the generated credentials immediately after you run the create cluster command. The credentials are not exposed through the list-clusters or describe-cluster commands.

History Servers

Cloudera Manager stores information about jobs in history servers. You can use the read-only user account created with the cluster to log in to Cloudera Manager and view information about the jobs running on the cluster.

Connecting to the YARN History Server

Use the YARN History Server to monitor the YARN jobs that run on your clusters. You can navigate to the YARN History Server from the Cloudera Manager Admin Console.

To go to the YARN History Server:
  1. Log in to Cloudera Manager with the read-only account generated when the cluster was created.
  2. On the Cloudera Manager Admin Console home page, click the YARN-1 service.

    This takes you to a page that shows the status of the YARN service processes and a set of useful metrics.

  3. On the YARN service page, click Web UI.

    The YARN History Server is also useful for debugging Spark applications.

  4. To view past jobs, click History Server Web UI.

    To view jobs that are currently running, click Resource Manager Web UI.

When you use the YARN History Server, you can drill down to individual task attempts and MapReduce jobs and see the related logs. Cloudera Manager provides additional log searching utilities and metrics that are useful for debugging.

For more information about YARN job monitoring, see Monitoring YARN Applications in the Cloudera Manager documentation.

Connecting to the Spark History Server

Use the Spark History Server to monitor Spark jobs that run on your clusters. You can navigate to the Spark History Server from the Cloudera Manager Admin Console.

To go to the Spark History Server:
  1. Log in to Cloudera Manager with the read-only account generated when the cluster was created.
  2. On the Cloudera Manager Admin Console home page, click the SPARK_ON_YARN-1 service.

    This takes you to a page that shows the status of the Spark service processes and a set of useful metrics.

  3. On the Spark on YARN service page, click History Server Web UI.

    The YARN History Server is also useful for debugging Spark applications.

For more information about Spark job monitoring, see Monitoring Spark Applications in the Cloudera Manager documentation.

SSH Connection

The security groups that you create in your cloud provider account function as a firewall to prevent unwanted access to your cluster and Cloudera Manager. To prevent unauthorized access, Cloudera recommends that you do not configure security groups to allow internet access through the public IP addresses of your instances. You can configure security groups to allow SSH connections from the public Cloudera Altus IP addresses to your instances.

SSH Connection in AWS

In AWS, you can configure the security groups in your EC2 instances to control access to your cluster and to Cloudera Manager. Configure the security group inbound rules to allow SSH access from your machine to the Altus clusters. You can use the IP address of your machine or contact IT to get the range of IP addresses used by your organization.

To configure the security group in AWS to allow SSH access to a cluster:
  1. In the AWS console, browse to VPC > Security Groups and find the security group you created for the Altus environment.

    Verify that you are in the correct region.

  2. On the Inbound Rules tab, edit the security group and add another rule of type SSH.
  3. Set the port number to 22.
  4. Set the Source property to the IP address or range of IP addresses in your organization from which you want to connect.

    If you do not know your IP address, select My IP from the list.

  5. Save the rule.

SSH Connection in Azure

In Azure, configure the inbound rules of the network security groups for Altus clusters to allow SSH access from your machine. You can use the IP address of your machine or contact IT to get the range of IP addresses used by your organization.

For more information about creating rules in a network security group on the Azure portal, see the Create rules in an existing NSG in the Microsoft Azure documentation.

Altus Tags

When Altus creates a cluster in your cloud provider account, Altus appends tags to the cluster instance to make it easy to identify nodes in a cluster. When you view Altus clusters in your AWS account or Azure subscription, you can view the tags to identify the nodes in the cluster.

Altus appends the following tags to each node in a cluster:
  • Cloudera-Cluster-Role. Identifies the node type with one of the following values:
    • Master
    • Worker
    • Cloudera Manager
  • Cloudera-Resource-Name. A Cloudera resource name (CRN) identifies a resource created within Altus. The CRN that is appended to a node is the CRN of the cluster to which the node belongs. All nodes in the cluster have the same CRN.
  • Name. Internal identifier for the cluster to which the node belongs. This tag is used to track the node within Altus.
  • Cloudera-Altus-Id. Tag used internally by services within Altus.
  • Cloudera-Altus-Template-Name. Tag used internally by services within Altus.

When you create a cluster in Altus, you can also define tags that you want to associate with the cluster instance. When Altus creates the cluster in your cloud provider account, Altus appends the tags that you define in addition to the default tags. For more information about adding custom tags to the cluster, see Creating a Cluster for AWS or Creating a Data Engineering Cluster for Azure.

To see the Altus tags for an instance in AWS, select an EC2 instance in the cluster and go to the Tags tab.

To see the tags for the Altus clusters in Azure, select a virtual machine and click Tags.

Debugging Jobs Executed on a Terminated Cluster in AWS

If you need to debug a job that ran on a cluster that was subsequently terminated, you can take the following steps:
Create a support case.
When a cluster is terminated, Cloudera automatically collects diagnostic data in a support bundle. Although the bundle does not include logs specific to the workload, it is useful for debugging cluster setup issues. When you create a support case, Cloudera can review the data in the support bundle and provide information about the cluster setup that might help your debugging efforts.
Inspect the cluster logs in the log archive S3 bucket.
If the Altus environment used for the cluster is configured with an S3 bucket for archiving logs, you can view the logs for the terminated cluster.

Run the describe-cluster command to get the log archive bucket name for the cluster where the job ran. Log in to the AWS console and view the job and service daemon logs in the S3 bucket.

Altus Components in AWS

When you create an Altus environment using Quickstart or when you create a cluster, Altus creates components in your AWS account. When you run a job, Altus can create components in your AWS account depending on your environment or cluster configuration. You can view the components created by Altus in your AWS accounts.

When you delete a cluster, Altus deletes some of the components that it creates. You must manually delete the components that Altus does not delete by default. For example, Altus does not delete the CloudFormation stack that is created when you use the Quickstart to create an Altus environment. If you do no longer need the resource for the Altus environment, you can delete the resource stack.

Altus creates, but does not delete, the following components in your AWS account:
CloudFormation stack
If you use Quickstart to create an Altus environment, Altus uses the AWS CloudFormation service to create a stack of resources and builds the Altus environment from the stack. The name of the resource stack created by Altus is the AWS stack name that you specify in Quickstart.
Cluster and job logs
If you enable the option to archive workload logs, Altus writes cluster and job logs to the Amazon S3 bucket that you specify. The names of the log files include the name of the cluster for which the logs are created.
S3Guard DynamoDB table
If you enable the S3Guard option for Amazon S3 buckets, Altus requires an Amazon DynamoDB table to store metadata and ensure that data written to S3 is immediately available for processing. When you create a cluster, the cluster creates an Amazon DynamoDB table if a table is not available. The cluster assigns the DynamoDB table the name that you specify when you enable the option. If you do not specify a name, the cluster assigns the default name s3guard-metadata.
Altus creates the following component in your AWS account and deletes them when they are no longer needed:
EC2 instances
When you create a cluster, Altus creates EC2 instances in your AWS account for the nodes in the cluster. It creates a master node, multiple worker nodes, and a node for Cloudera Manager. To identify them as nodes in an Altus cluster, the instances that Altus creates have Cloudera resource names (CRN) that start with crn:altus:dataeng:....

The EC2 instances created for Altus clusters also have tags that identify their roles in the cluster. For more information about the Altus tags, see Altus Tags.

If you encounter a problem where you need to delete the components created by Altus, you can search for and delete the components in your AWS account.

Using Impala JDBC Connector Version 2.6.4 or Older with MicroStrategy

Impala JDBC Connector version 2.6.4 or older does not use the Altus credentials to connect to an Altus Data Warehouse cluster. You must provide the IP address of the coordinator node of the Data Warehouse cluster and the user login credentials for the cluster.

To connect to an Altus Data Warehouse cluster through the Impala JDBC Connector, the cluster must have public IP addresses. You can create clusters with public IP addresses if you use an environment with the Public IPs option enabled. For more information, see Enable Public IPs. If the Altus Data Warehouse cluster does not have public IPs set up, you can connect to the cluster using private IP addresses if you set up a network connection between the client tool and the coordinator node in the cluster.

To set up a JDBC connection from MicroStrategy to an Altus Data Warehouse cluster, complete the following steps:
  1. Download and install the Cloudera JDBC database driver for your operating system.

    From the Cloudera JDBC driver download page, download the version of Impala JDBC Connector for Cloudera Enterprise that you want to use.

    For more information about downloading the JDBC driver, see the Cloudera Enterprise Connector Documentation for the version of Cloudera JDBC driver for Impala that you want to use.

  2. Open MicroStrategy.
  3. On the Datasets page, click Add New Data.
  4. On the Connect to Your Data window, go to Hadoop.
  5. From the All Datasets list, select Impala.
  6. On the Select Import Options window, select Build a Query and click Next.
  7. On the Import From Tables page, click Add.
  8. On the Data Source window, configure the following properties:
    • Database. Select Impala.
    • Version. Select Impala 2.x.
  9. Click Show Connection String.
  10. Select Edit connection string.

    The Cloudera Impala driver displays as the selected JDBC driver.

  11. In the connection string entry field, enter the following connection parameters:
    Property Description
    Driver Name of the Impala JDBC driver:
    com.cloudera.impala.jdbc41.Driver
    URL Connection URL for the Impala JDBC driver:
    jdbc:impala://DataWarehouseClusterCoordinatorIP:21050/default;
    Impala uses port number 21050.
    Include the following attributes:
    • AuthMech. Set the value to 3 so the connection uses username and password authentication.
    • SSL. Set the value to 1 to specify that the connection to Impala goes through an SSL-enabled socket.
    • CAIssuedCertNamesMismatch. Set the value to 1 so that the Common Name (CN) in the self-signed certificate does not need to match the host name of the Impala server.
    • AllowSelfSignedCerts. Set value to 1.
    For example:
    URL={jdbc:impala://IPAddressOfCoordinator:21050/default;AuthMech=3;SSL=1;
    CAIssuedCertNamesMismatch=1;AllowSelfSignedCerts=1;}
  12. Enter the cluster credentials and assign a name to the data source:
    • User Name. Enter the user name to use for the cluster that you want to access.

      You can get user credentials for the Altus Data Warehouse cluster in Altus. For more information about getting the Altus Data Warehouse cluster credentials, see Getting the Cluster Credentials on the Console and Getting the Cluster Credentials Using the CLI.

    • Password. Enter the password for the user account to access the cluster.
    • Data Source Name. Assign a name to the data source.
  13. Click OK.

Create test queries to verify your connection to the Data Warehouse cluster.

Upgrading a Configured SDX Namespace to Work with CDH 6.1 Clusters

Altus SDX supports CDH 6.1. You can use configured SDX namespaces with CDH 6.1 clusters.

CDH 5.x clusters and CDH 6.1 clusters cannot share a configured SDX namespace. To use a configured SDX namespace for CDH 6.1 clusters, create a configured SDX namespace that points to CDH 6.1 Hive metastore and Sentry databases. If you have a configured SDX namespace that uses CDH 5.x Hive metastore and Sentry databases and you want to use it with CDH 6.1 clusters, you must upgrade the Hive metastore and Sentry database schemas to be compatible with CDH 6.1.

To upgrade a configured SDX namespace to use CDH 6.1 clusters:
  1. Terminate the CDH 5.x clusters that use the configured SDX namespace.

    Do not use the configured SDX namespace with any cluster while you upgrade it.

  2. Upgrade the Hive metastore and Sentry databases to version 6.1.

    Upgrade the database schemas in the HMS and Sentry databases to version 6.1. You can use Cloudera Manager or Altus Director to upgrade the cluster as you would any other CDH cluster.

    For more information about upgrading to CDH 6.1, see the Cloudera Enterprise Upgrade Guide.

  3. Use the upgraded configured SDX namespace with CDH 6.1 clusters.

    After the upgrade, you cannot use the configured namespace with CDH 5.x clusters. You can use the configured namespace only with CDH 6.1 clusters.

Creating a Support Case

From the Cloudera Altus console, you can file a ticket to Cloudera Support for help in troubleshooting. To expedite problem resolution, provide as much information as possible in the support ticket.

To file a support ticket:
  1. Sign in to the Cloudera Altus console:

    https://console.altus.cloudera.com/

  2. Click Support and select Support Center.
  3. On the Support Center page, click one of the following selections based on the type of support you require:
    • Billing. For questions regarding billing issues with your Altus account.
    • Service Limits. For requests to raise Altus resource limits, such as the number of users, clusters, and jobs. For more information about the Altus resource limits, see Altus Resource Limits.
    • Technical Support. For requests for help on problems with Altus components, including creating environments, configuring authorization, clusters, or jobs.
      If your technical support issue relates to the Data Engineering component, provide the following information:
      • Select the area in Altus where you encounter the problem.
      • Select whether to use the diagnostic logs that Altus collects for the clusters.

        If selected, Altus generates a diagnostic bundle for the cluster you specify and forwards the bundle to Cloudera Technical Support.

      • If you select to use the diagnostic bundle, provide the CRN of the cluster that you need help with.

        The CRN (Cloudera Resource Name) is a unique identifier for a resource created in Altus. It includes details about the cluster and provides relevant information to support engineers.

  4. Click File a Ticket.
  5. On the Cloudera support window, click fill out the details of the ticket.

    Describe the issue and provide your contact information.

  6. Click Submit.

On the Cloudera support window, you can access the Cloudera product documentation, Knowledge Base, and other resources. You can also access the Altus community page to post questions and comments.