Troubleshooting CDH Upgrades

"Access denied" in install or update wizard

"Access denied" in install or update wizard during database configuration for Activity Monitor or Reports Manager.

Possible Reasons

Hostname mapping or permissions are not set up correctly.

Possible Solutions

  • For hostname configuration, see Configure Network Names.
  • For permissions, make sure the values you enter into the wizard match those you used when you configured the databases. The value you enter into the wizard as the database hostname must match the value you entered for the hostname (if any) when you configured the database.

    For example, if you had entered the following when you created the database

    grant all on activity_monitor.* TO 'amon_user'@'myhost1.myco.com' IDENTIFIED BY 'amon_password';

    the value you enter here for the database hostname must be myhost1.myco.com. If you did not specify a host, or used a wildcard to allow access from any host, you can enter either the fully qualified domain name (FQDN), or localhost. For example, if you entered

    grant all on activity_monitor.* TO 'amon_user'@'%' IDENTIFIED BY 'amon_password';

    the value you enter for the database hostname can be either the FQDN or localhost.

Cluster hosts do not appear

Some cluster hosts do not appear when you click Find Hosts in install or update wizard.

Possible Reasons

You might have network connectivity problems.

Possible Solutions

  • Make sure all cluster hosts have SSH port 22 open.
  • Check other common causes of loss of connectivity such as firewalls and interference from SELinux.

Cannot start services after upgrade

You have upgraded the Cloudera Manager Server, but now cannot start services.

Possible Reasons

You might have mismatched versions of the Cloudera Manager Server and Agents.

Possible Solutions

Make sure you have upgraded the Cloudera Manager Agents on all hosts. (The previous version of the Agents will heartbeat with the new version of the Server, but you cannot start HDFS and MapReduce with this combination.)

HDFS DataNodes fail to start

After upgrading, HDFS DataNodes fail to start with exception:

Exception in secureMainjava.lang.RuntimeException: Cannot start datanode because the configured max locked memory size (dfs.datanode.max.locked.memory) of 4294967296 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 65536 bytes.
      

Possible Reasons

HDFS caching, which is enabled by default in CDH 5 and higher, requires new memlock functionality from Cloudera Manager Agents.

Possible Solutions

Do the following:

  1. Stop all CDH and managed services.
  2. On all hosts with Cloudera Manager Agents, hard-restart the Agents. Before performing this step, ensure you understand the semantics of the hard_restart command by reading Hard Stopping and Restarting Agents.
    RHEL 7, SLES 12, Debian 8, Ubuntu 16.04 and higher
    sudo systemctl stop supervisord
    sudo systemctl start cloudera-scm-agent
    RHEL 5 or 6, SLES 11, Debian 6 or 7, Ubuntu 12.04 or 14.04
    sudo service cloudera-scm-agent hard_restart
  3. Start all services.

Cloudera services fail to start

Cloudera services fail to start.

Possible Reasons

Java might not be installed or might be installed at a custom location.

Possible Solutions

See Configuring a Custom Java Home Location for more information on resolving this issue.

Upgrading to CDH 5.8.0 or CDH 5.8.1 When Using the Flume Kafka Client

Due to the change of offset storage from ZooKeeper to Kafka in the CDH 5.8 Flume Kafka client, data might not be consumed by the Flume agents, or might be duplicated (if kafka.auto.offset.reset=smallest) during an upgrade to CDH 5.8.0 or CDH 5.8.1. To prevent this, perform the steps described below before you upgrade your system.

The upgrade process is based on this example configuration:
tier1.sources  = source1
        tier1.channels = channel1
        tier1.sinks = sink1

        tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
        tier1.sources.source1.zookeeperConnect = zkhost:2181
        tier1.sources.source1.topic = flumetopic
        tier1.sources.source1.groupId = flume
        tier1.sources.source1.channels = channel1
        tier1.sources.source1.interceptors = i1 i2
        tier1.sources.source1.interceptors.i1.type = timestamp
        tier1.sources.source1.interceptors.i2.type = host
        tier1.sources.source1.kafka.consumer.timeout.ms = 100

        tier1.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel
        tier1.channels.channel1.brokerList=broker1:9092,broker2:9092
        tier1.channels.channel1.zookeeperConnect=zkhost:2181
        tier1.channels.channel1.topic=flumechannel1
        tier1.channels.channel1.groupId = flumechannel
        tier1.channels.channel1.capacity = 10000
        tier1.channels.channel1.transactionCapacity = 1000

        tier1.sinks.sink1.type = hdfs
        tier1.sinks.sink1.hdfs.path = /tmp/kafka/%{topic}/
        tier1.sinks.sink1.hdfs.filePrefix = %{host}-
        tier1.sinks.sink1.hdfs.rollInterval = 60
        tier1.sinks.sink1.hdfs.rollSize = 0
        tier1.sinks.sink1.hdfs.rollCount = 0
        tier1.sinks.sink1.hdfs.fileType = DataStream
        tier1.sinks.sink1.channel = channel1
        tier1.sinks.sink1.hdfs.kerberosKeytab = $KERBEROS_KEYTAB
        tier1.sinks.sink1.hdfs.kerberosPrincipal = $KERBEROS_PRINCIPAL
      

Perform the following steps to upgrade CDH.

  1. If you are using a version lower than CDH 5.7, first upgrade to CDH 5.7. If for some reason you cannot upgrade to CDH 5.7, contact your Cloudera Sales Engineer for assistance, or file a support case with specific versions from which and to which you are upgrading.
  2. Add the following sections to the source and channel for the Flume configuration:
    # Added for source upgrade compatibility
                tier1.sources.source1.kafka.bootstrap.servers = broker1:9092,broker2:9092
                tier1.sources.source1.kafka.offsets.storage = kafka
                tier1.sources.source1.kafka.dual.commit.enabled = true
                tier1.sources.source1.kafka.consumer.group.id = flume
                tier1.sources.source1.kafka.topics = flumetopic
    
                # Added for channel upgrade compatability
                tier1.channels.channel1.kafka.topic = flumechannel1
                tier1.channels.channel1.kafka.bootstrap.servers = broker1:9092,broker2:9092
                tier1.channels.channel1.kafka.consumer.group.id = flumechannel
                tier1.channels.channel1.kafka.offsets.storage = kafka
                tier1.channels.channel1.kafka.dual.commit.enabled = true
  3. Restart (or rolling restart) the Flume agents. This switches offsets.storage to Kafka, but keeps both the Kafka and ZooKeeper offsets updated because the dual.commit.enabled property is set to true. Confirm that Kafka messages are flowing through the Flume servers. Updating the offsets only occurs when new messages are consumed, so there must be at least one Kafka message consumed by the Flume agent, or one event passed through the Flume channel. Use the following commands to verify that Flume is properly updating the offsets in Kafka (the egrep command is used to match the correct topic names: in this example, flumetopic and flumechannel) :
    echo "exclude.internal.topics=false" > /tmp/consumer.config
                kafka-console-consumer --consumer.config /tmp/consumer.config
                --formatter "kafka.coordinator.GroupMetadataManager\$OffsetsMessageFormatter"
                --zookeeper zkhost:2181 --topic __consumer_offsets  |egrep -e "flumetopic|flumechannel1"

    Output should be similar to the following and show that the Flume source and/or channel topics offsets are being incremented:

    [flume,flumetopic,0]::[OffsetMetadata[70,cf9e5630-214e-4689-9869-5e077c936ffb],CommitTime 1469827951129,ExpirationTime 1469914351129]
                [flumechannel,flumechannel1,0]::[OffsetMetadata[61,875e7a82-1c22-43be-acaa-eb4d63e7f71e],CommitTime 1469827951128,ExpirationTime 1469914351128]
                [flumechannel,flumechannel1,0]::[OffsetMetadata[62,66bda888-0a70-4a02-a286-7e2e7d14050d],CommitTime 1469827951131,ExpirationTime 1469914351131]
              
  4. Perform the upgrade to CDH 5.8.0 or CDH 5.8.1. The Flume agents are restarted during the process. Flume continues to consume the source topic where it left off, and the sinks continue draining from the Kafka channels where they left off. Post upgrade, remove the following deprecated properties from flume.conf because they are no longer used in CDH 5.8.0 or higher:
    tier1.sources.source1.zookeeperConnect = zkhost:2181
                tier1.sources.source1.topic = flumetopic
                tier1.sources.source1.groupId = flume
    
                tier1.channels.channel1.zookeeperConnect=zkhost:2181
                tier1.channels.channel1.topic=flumechannel1
                tier1.channels.channel1.groupId = flumechannel