Troubleshooting Altus Director

This topic contains information on problems that can occur when you set up, configure, or use Altus Director, their causes, and their solutions.

Viewing Altus Director Logs

To help you troubleshoot problems, you can view the Altus Director logs. Log files are in the following locations:
  • Altus Director client
    • One shared log file per user account:
      $HOME/.cloudera-director/logs/application.log
  • Altus Director server
    • One file for all clusters:
      /var/log/cloudera-director-server/application.log
Altus Director normally logs only error, warning, and informational messages. To configure it to log debug level messages, edit the file logback.xml, which can be found at the following locations:
  • Altus Director client: /etc/cloudera-director-client/logback.xml
  • Altus Director server: /etc/cloudera-director-server/logback.xml
The XML file configures the logback logging library. To turn on all debug logging for Altus Director and its libraries, change the "root" element as follows:
<root level="DEBUG">
Enabling debug logging significantly increases the size of the logs, and can include more information than needed for troubleshooting. Once you discover specific loggers that carry information you care most about, you can narrow the scope of debug logging to those only. For example, if after turning on all debug logging you find that the messages emitted from Altus Director itself are most important, then you can set the root level back to INFO and then add a new logger element like this example, along with the other similar elements.
<logger name="com.cloudera.launchpad" level="DEBUG"/>

The logback.xml file can be reconfigured in many other ways to adjust how logging is performed. See Logback configuration in the Logback project documentation to learn more. Note that major changes to log format and contents will hamper the effectiveness of Cloudera Support, if you should need to forward logs to them as part of troubleshooting.

Routing Server Logs to Separate Log Files

By default, the Altus Director server sends all log messages to a single file: /var/log/cloudera-director-server/application.log. This can make troubleshooting challenging when Director works with multiple clusters at once because log messages about all of the different cluster activities are combined and interlaced in a single file.

When the name of the environment, deployment, or cluster is available, the Director server records the name in each thread that handles an API call. You can configure server logging so that log messages generated by specific API calls are routed to separate log files based on the environment, deployment, and cluster names included in the thread.

The Director server also records the names of certain background tasks, such as those that periodically refresh its deployment and cluster states, in threads that execute those tasks. You can also configure server logging so that log messages pertaining to those tasks are routed to separate log files.

The Director server uses the logback logging library to route log messages to files. You can use the server logback.xml file to define a discriminator that determines where to send a message. You can set only one discriminator at a time, but the value is selected from one of several configured subkeys. The first subkey whose target content is recorded in a thread determines the value of the discriminator.

Threads that do not include a discriminator are sent to the standard application.log file. All logs files are created in the directory configured for the main application.log file.

You can use one or more of the following items as subkeys for the discriminator to send log messages to separate files:
Item Description
environmentName

Environment name included in the thread.

Example: env123

deploymentName

Deployment name included in the thread.

Example: depl456

clusterName

Cluster name included in the thread.

Example: cluster789

deploymentKey

The combination of the environment and deployment names included in the thread, separated by double underscores (__).

Example: env123__depl456

clusterKey

The combination of the environment, deployment, and cluster names included in the thread, separated by double underscores (__).

Example: env123__depl456__cluster789

task

The name of the task included in the thread.

Example: refreshDeployments

The server logback.xml file includes an example sifting appender that logs task and cluster activity into separate log files named after each task and cluster. Uncomment that appender, and comment the non-sifting appender, and then restart the server to start logging to separate files by task and cluster name. Edit the appender configuration to sift logs by a different discriminator.

Configuring Tag-on-create for AWS GovCloud (US) and China (Beijing) Regions

In most AWS regions, Altus Director assigns a tag during the creation of each instance it creates to facilitate instance management. The GovCloud (US) and China (Beijing) regions do not support tagging of instances on creation, so for instances in these regions, the tag is created after the instance is created.

If you are running Altus Director in the GovCloud (US) or China (Beijing) regions, you must turn off useTagOnCreate in the Altus Director AWS plugin. To do this add the following line to the end of the aws-plugin.conf file, after the last closing bracket:
useTagOnCreate: false

The aws-plugin.conf file can be found at /var/lib/cloudera-director-plugins/aws-provider-plugin_version/etc/ on your Altus Director EC2 instance.

Backing Up the H2 Embedded Database

By default, Altus Director uses an H2 embedded database to store environment and cluster data. The H2 embedded database file is located at:
/var/lib/cloudera-director-server/state.h2.db

Back up the state.h2.db file to avoid losing environment and cluster data. To ensure that your backup copy can be restored, use the H2 backup tools instead of simply copying the file. For more information, see the H2 Tutorial.

Manual Modifications to the Altus Director Database

Manual modifications to the Altus Director database are not supported. Modifications made outside of Altus Director control can lead to permanent data loss and unrecoverable errors in Altus Director.

Bootstrap fails in Azure when custom image has an attached data disk and dataDiskCount is not 0

Symptom

Bootstrap fails in Azure when a custom image is used that has an attached data disk and dataDiskCount is not set to 0. The error message displayed is, "Cannot specify user image overrides for a disk already defined in the specified image reference."

Cause

This error originates in Azure. It occurs because the Azure image has a data disk attached, while the dataDiskCount value wrongly indicates that Altus Director is trying to attach an additional disk or disks. The conflict causes an error to be thrown.

Solution

If you deploy a cluster in Azure with a custom image that has a data disk attached, you must set dataDiskCount to 0. You can use the Azure Portal to check if your custom image has a data disk attached. If you simply comment out the dataDiskCount setting, it will default to 5. Bootstrap fails if the dataDiskCount value is not 0. See Deploying Clusters with Custom Images.

Slow or Failed OS Updates in Some AWS Regions

Symptom

In AWS, Altus Director triggers operating system updates and performs software downloads on instances it allocates in your chosen region. Depending on the local network configuration, these updates and downloads might go slowly or fail.

Solutions

Consider trying one or more of the following steps:
  • Disable instance normalization. This causes Altus Director to not perform usual automated, general work on new instances. You should replace that work with your own, either by building a custom AMI with the work already accomplished, or by using a bootstrap script. Normalization can be disabled using a configuration file.

  • Create a preloaded AMI. Altus Director can avoid downloading Cloudera Manager and CDH software if it is already present in expected locations on instances. This also speeds up deployment and cluster bootstrap processes, even when download speeds from Cloudera repositories are reasonable. See the documentation for more information.

  • Mirror Cloudera repositories. Instead of preloading an AMI with Cloudera software, you can host them at local mirrors, and point Altus Director to them as alternative download locations. As with preloaded AMIs, taking this step can speed up bootstrap processes, and make your architecture less vulnerable to network problems. See the documentation for more information.

Altus Director Bootstrap Fails with DNS Error

Symptom

Altus Director fails to bootstrap with the error message, "DNS is not configured correctly on at least one instance."

Cause

Needs to be diagnosed.

Solution

Verify that DNS is configured properly. Check the server logs, which might contain additional warning messages and information about why DNS detection failed. For example, this error can appear when an invalid ssh user has been set.

Altus Director Bootstrap Fails with IAM Permissions Error

Symptom

Altus Director fails to bootstrap with the error message:
ErrorInfo{code=PROVIDER_EXCEPTION, properties={message=User: 
arn:aws:sts::code:assumed-role/ClouderaDirector-Director-instance 
is not authorized to perform: iam:GetInstanceProfile on resource: instance profile test}

Cause

User failed to configure the required IAM permissions.

Solution

Configure the required IAM permissions. Check the list of required IAM permissions: Creating AWS Identity and Access Management (IAM) Policies.

Cloudera Manager API Call Fails

Symptom

A Cloudera Manager API call fails in Altus Director.

Cause

Needs to be diagnosed. (See Solution immediately below.)

Solution

Enable API debugging in Cloudera Manager by going to Settings on the Administration tab in Cloudera Manager and clicking the checkbox Enable Debugging of API. Then look at the Cloudera Manager server logs to get more information on why the API call failed.

Altus Director Cannot Manage a Cluster That Was Kerberized Through Cloudera Manager

Symptom

Altus Director cannot manage a cluster after Cloudera Manager is used to enable Kerberos on the cluster.

Cause

Once a cluster is deployed through Altus Director, some changes to the cluster that are made using Cloudera Manager cause Altus Director to be out of sync and unable to manage the cluster. See Altus Director and Cloudera Manager Usage.

Solution

Deploy a new kerberized cluster, use distcp to transfer data from the old cluster to the new one, and then destroy the old cluster.

RDS Name Conflicts

Symptom

RDS name conflicts occur when creating multiple clusters with the same configuration file.

Cause

Most often, deletion of an older RDS instance has not completed when you try to launch a new cluster using the same configuration file, and therefore the same RDS name.

Solution

Allow more time for an RDS instance to be completely removed before creating a new cluster with the same configuration file, or change the name of the RDS instance in the configuration files for new clusters.

New Cluster Fails to Start Because of Missing Roles

Symptom

A new cluster will not start because roles are missing.

Cause

Altus Director does not validate that all required roles are assigned when provisioning a cluster. This can lead to failures during the intial run of a new cluster. For example, if the gateway instance group was removed, but the Flume Agent and Kafka Broker were assigned to roles in that group, the cluster fails to start.

Solution

Ensure that all required role types for the CDH services included in the cluster are assigned to instances before starting the cluster.

Altus Director Server Will Not Start with Unsupported Java Version

Symptom

Altus Director server will not start, and /var/log/cloudera-director-server/cloudera-director-server.out has the following error:
Exception in thread "main" java.lang.UnsupportedClassVersionError: com/cloudera/launchpad/Server : Unsupported major.minor version 51.0

Cause

You are running Altus Director server against an older, unsupported version of the Oracle Java SE Development Kit (JDK).

Solution

Update to a supported version of the Oracle JDK. For information on supported versions, see Supported Software and Distributions.

Error Occurs if Tags Contain Unquoted Special Characters

Symptom

When using the bootstrap-remote command to set up a cluster with Altus Director server, an error message is displayed. This applies to HOCON characters, and includes periods. If the added configuration is in the form x.y, for example, the following error message might be displayed: "com.typesafe.config.ConfigException$WrongType: ... <x> has type OBJECT rather than STRING". This means that x.y must be in quotes, as in "x.y".
com.typesafe.config.ConfigException$WrongType: ... <x> has type OBJECT rather than STRING

Cause

Altus Director validation checks to ensure that special characters in configurations are enclosed in double quotes.

Solution

Use double quotes for special characters in configurations. An example of a configuration that would require double quotes is "log.dirs" in Kafka.

DNS Issues

Symptom

Altus Director fails to bootstrap a cluster with a DNS error. The Cloudera Agent log (cloudera-scm-agent.log) will show an entry similar to the following:
[27/Mar/2017 20:26:16 +0000] 12596 Thread-13 https ERROR Failed to retrieve/store URL:
http://ip-10-202-202-109.ec2.internal:7180/cmf/parcel/download/CDH-5.10.0-1.cdh5.10.0.p0.41-el7.parcel.torrent -> 
/opt/cloudera/parcel-cache/CDH-5.10.0-1.cdh5.10.0.p0.41-el7.parcel.torrent 
<urlopen error [Errno -2] Name or service not known>

Cause

This can be caused by one the following:
  • DNS Hostnames is not set to Yes in the Edit DNS Hostnames VPC configuration setting.
  • The Amazon Virtual Private Cloud (VPC) is not set up for forward and reverse hostname resolution. Forward and reverse DNS resolution is a requirement for many components of the Cloudera EDH platform, including Altus Director.

Solutions

In the AWS Management Console, go to Services > Networking and click VPC. In the VPC Dashboard, select your VPC and click Action. In the shortcut menu, click Edit DNS Hostnames and click Yes. If this does not fix the issue, continue with the instructions that follow to configure forward and reverse hostname resolution.

Configure the VPC for forward and reverse hostname resolution. You can verify if DNS is working as expected on a host by issuing the following one-line Python command:
python -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())"

For more information on DNS and Amazon VPCs, see DHCP Options Sets in the Amazon VPC documentation.

If you are using Amazon-provided DNS, perform these steps to configure DHCP options:
  1. Log in to the AWS Management Console.

  2. Select VPC from the Services navigation list box.

  3. In the left pane, click Your VPCs. A list of currently configured VPCs is displayed.

  4. Select the VPC you are using and note the DHCP options set ID.

  5. In the left pane, click DHCP Option Sets. A list of currently configured DHCP Option Sets is displayed.

  6. Select the option set used by the VPC.

  7. Check for an entry similar to the following and make sure the domain-name is specified. For example:

    domain-name = ec2.internal
    domain-name-servers = AmazonProvidedDNS 
  8. If it is not configured correctly, create a new DHCP option set for the specified region and assign it to the VPC. For information on how to specify the correct domain name, see the AWS Documentation.

Server Does Not Start

Symptom

The Altus Director server does not start or quickly exits with an Out of Memory exception.

Cause

The Altus Director server is running on a machine with insufficient memory.

Solution

Run Altus Director on an instance that has at least 1 GB of free memory. See Resource Requirements for more details on Altus Director hardware requirements.

Problem When Removing Hosts from a Cluster

Symptom

A Modify Cluster operation fails to complete.

Cause

You are trying to shrink the cluster below the HDFS replication factor. See How to Remove Instances from a Cluster (Note) for more information about replication factors.

Solution

Do not attempt to shrink a cluster below the HDFS replication factor. Doing so can result in a loss of data.

Problems Connecting to Altus Director Server

Symptom

You are unable to connect to the Altus Director server.

Cause

Configuration of security group and iptables settings. For more information about configuring security groups, see Preparing Your AWS EC2 Resources. For commands to turn off iptables, see either Installing Altus Director Server and Client on the EC2 Instance or Installing Altus Director Server and Client on Google Compute Engine. Some operating systems have IP tables turned on by default, and they must be turned off.

Solution

Check security group and iptables settings and reconfigure if necessary.

Shrinking an H2 Database

If you use an H2 database for Altus Director's data, the database should not be larger than a few megabytes. The H2 database grows when Altus Director runs over a long period of time because the database is not able to reclaim disk space when entries are deleted. As the database file grows larger, the risk of database corruption increases.

You can reduce the size of the H2 database by exporting and importing all the data using H2’s Script and RunScript commands.
  1. Back up the existing H2 database file.
  2. Stop Altus Director.
  3. Make a backup script using H2's Script command:
    # Make a backup script (backup.zip)
    # NOTE: Do not include ‘.h2.db’ when specifying the db file
    $ java -cp <h2 jar file> org.h2.tools.Script -url \
      "jdbc:h2:/var/lib/cloudera-director-server/state;MV_STORE=false; \
      MVCC=true;DB_CLOSE_ON_EXIT=TRUE;AUTO_SERVER=TRUE;TRACE_LEVEL_FILE=4;TRACE_LEVEL_SYSTEM_OUT=0" \
      -user sa -password sa -script backup.zip -options compression zip
  4. Delete the old database.
  5. Create a new, smaller database from the script using H2's RunScript command:
    # Create a new database from the script (backup.zip)
    # NOTE: Do not include ‘.h2.db’ when specifying the db file
    $ java -cp <h2 jar file> org.h2.tools.RunScript -url \
      "jdbc:h2:/var/lib/cloudera-director-server/state;MV_STORE=false; \
      MVCC=true;DB_CLOSE_ON_EXIT=TRUE;AUTO_SERVER=TRUE;TRACE_LEVEL_FILE=4;TRACE_LEVEL_SYSTEM_OUT=0" \
      -user sa -password sa -script backup.zip -options compression zip
  6. Start Altus Director.

For more information on H2 databases, see the H2 documentation at H2 Database Engine.

Migrating the Altus Director Database from H2 to MySQL Without Using the copy-database Script

Altus Director provides a script to move Altus Director data from one database to another database. You can use the copy-database script to move data from the default H2 database to a MySQL database. For more information about the copy-database script, see Migrating the Altus Director Database.

If you cannot or prefer not to use the copy-database script, you can migrate to a MySQL database by following the steps in this section. The SQL dialect that H2 uses is not compatible with MySQL, so this process requires the use of SQuirreL SQL, which translates the SQL dialect of H2 into that of MySQL. Altus Director will be used to create the schema in the MySQL database because SQuirrel SQL’s translation will not create the database schema exactly as Altus Director requires.

Step 1: Prepare databases

  1. Stop Altus Director and back up the H2 database file, state.h2.db. The H2 database file is located at /var/lib/cloudera-director-server/state.h2.db.
  2. Create a user on the MySQL server for Altus Director:
    CREATE USER director IDENTIFIED BY password;
  3. Create a database on the MySQL server for exporting the data from H2:
    CREATE DATABASE directorexport CHARACTER SET utf8;
    GRANT ALL PRIVILEGES ON directorexport.* TO 'director'@'%';
  4. Create a database on the MySQL server for use by Altus Director:
    CREATE DATABASE director CHARACTER SET utf8;
    GRANT ALL PRIVILEGES ON director.* TO 'director'@'%';

Step 2: Export data from H2 database

  1. Install SQuirreL SQL.
    1. Download SQuirreL SQL from the SQuirreL SQL web site.
    2. Install SQuirreL SQL with the H2 and MySQL database plugins enabled.
  2. Enable the H2 Embedded driver in SQuirreL SQL.
    1. Download H2 from http://www.h2database.com/html/download.html.
    2. Edit the H2 Embedded driver, adding the downloaded jar file to the Extra Class Path.

  3. Enable the “MySQL Driver” driver in SQuirreL SQL.
    1. Download the MySQL driver from the MySQL Downloads site.
    2. Edit the “MySQL Driver” driver, adding the MySQL connector jar to the Extra Class Path.

  4. Create an alias for the H2 database. Test the connection to make sure you can connect to the database.

  5. Create alias for MySQL data export database. Test the connection to make sure you can connect to the database.

  6. Prepare the H2 database for data export. H2 generates names for indexes that are longer than what is permitted in MySQL, so you must rename the indexes in H2 to ensure that they do not violate the MySQL length limit. For example:
    ALTER INDEX DEPLOYMENTS_UNIQUE_PER_ENVIRONMENT_BY_DEPLOYMENT_NAME_INDEX_8 RENAME TO DEPLOYMENTS_UNIQUE_PER_ENVIRONMENT_BY_DEPLOYMENT_NAME 
    ALTER INDEX UK_EXTERNAL_DATABASE_SERVERS_UNIQUE_PER_ENVIRONMENT_BY_NAME_INDEX_8 RENAME TO UK_EXTERNAL_DATABASE_SERVERS_UNIQUE_PER_ENVIRONMENT_BY_NAME
  7. Prepare MySQL for data export.
    1. Depending on the Altus Director and MySQL versions you are running, you might need the following step. If you see the error message invalid default value after performing the next step, you need to do this. Otherwise, this step is optional.
      select @@global.sql_mode
      // Keep the value somewhere if you would like to restore the value 
      // at the end of this procedure. (See step 9c below.)
      set @@global.sql_mode = "..."; 
      // remove NO_ZERO_IN_DATE,NO_ZERO_DATE from previous value.
    2. Reconnect to the MySQL alias for this change to apply to your session.
  8. Export data from H2 to MySQL.
    1. Go to the Alias pane in SQuirreL SQL, and connect to H2.
    2. Go to h2 > STATE > PUBLIC > TABLE. Select all tables except for the PIPELINES* and *schema_versions tables, right click and select Copy Table.

    3. Go to the Alias pane in SQuirreL SQL and connect to MySQL.
    4. Go to mysql > directorexport > TABLE, right click TABLE and select Paste Table.

  9. Restore sql_mode, if desired (for more about sql_mode, see Prepare MySQL for data export above).

Step 3: Prepare MySQL database for data import

Start Altus Director with MySQL.
  1. Configure Altus Director to use MySQL, as described in Configuring Altus Director Server to use the MySQL Database.
  2. Start Altus Director with MySQL. When Altus Director starts, the database schema will be created.
  3. Stop Altus Director to prevent modification of the database during data import.
  4. Delete all values from the AUTHORITIES, USERS, and SERVER_CONFIGS tables. Altus Director populates these tables with some values by default. These values should be deleted so they will not conflict with the imported data.
    DELETE FROM AUTHORITIES;
    DELETE FROM USERS;
    DELETE FROM SERVER_CONFIGS;

Step 4: Import data to MySQL

  1. Dump the data only (no schema) from the export database. You can use the -h option if running mysqldump against a remote host.
    $ mysqldump -u [user] -p --no-create-info directorexport > directorexport.sql
  2. Import the data into Altus Director’s database. You may use the -h option if running mysql against a remote host.
    $ mysql -u [user] -p director < directorexport.sql