Upgrading the CDH Cluster

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

Loading Filters ... 6.0.0 5.15 5.14 5.13 5.12 5.11 5.10 5.9 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0 6.0.0 5.15 5.14 5.13 5.12 5.11 5.10 5.9 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0

Back Up Cloudera Manager

Before you upgrade a CDH cluster, backup Cloudera Manager. Even if you just backed up Cloudera Manager before an upgrade, you should now back up your upgraded Cloudera Manager deployment. See Backing Up Cloudera Manager.

Enter Maintenance Mode

To avoid unnecessary alerts during the upgrade process, enter maintenance mode on your cluster before you start the upgrade. Entering maintenance mode stops email alerts and SNMP traps from being sent, but does not stop checks and configuration validations. Be sure to exit maintenance mode when you have finished the upgrade to re-enable Cloudera Manager alerts. More Information.

On the Home > Status tab, click next to the cluster name and select Enter Maintenance Mode.

Complete Pre-Upgrade Migration Steps

  • YARN

    Decommission and recommission the YARN NodeManagers but do not start the NodeManagers.

    A decommission is required so that the NodeManagers stop accepting new containers, kill any running containers, and then shutdown.

    1. Ensure that new applications, such as MapReduce or Spark applications, will not be submitted to the cluster until the upgrade is complete.
    2. In the Cloudera Manager Admin Console, navigate to the YARN service for the cluster you are upgrading.
    3. On the Instances tab, select all the NodeManager roles. This can be done by filtering for the roles under Role Type.
    4. Click Actions for Selected (number) > Decommission.

      If the cluster runs CDH 5.9 or higher and is managed by Cloudera Manager 5.9 or higher, and you configured graceful decommission, the countdown for the timeout starts.

      A Graceful Decommission provides a timeout before starting the decommission process. The timeout creates a window of time to drain already running workloads from the system and allow them to run to completion. Search for the Node Manager Graceful Decommission Timeout field on the Configuration tab for the YARN service, and set the property to a value greater than 0 to create a timeout.

    5. Wait for the decommissioning to complete. The NodeManager State is Stopped and the Commission State is Decommissioned when decommissioning completes for each NodeManager.
    6. With all the NodeManagers still selected, click Actions for Selected (number) > Recommission.
  • Cloudera Search

    The CDH upgrade process fails to delete Solr data from HDFS and the recreated collections fail to be initialized due to the existing indexes.

    Before starting the CDH upgrade, delete the Solr files from the HDFS directory configured with the HDFS Data Directory Solr Service configuration property in Cloudera Manager.

  • Hive

    There are changes to query syntax, DDL syntax, and the Hive API. You might need to edit the HiveQL code in your application workloads before upgrading.

    See Incompatible Changes for Apache Hive/Hive on Spark/HCatalog

  • Pig

    DataFu is no longer supported. Your Pig scripts will require modification for use with CDH 6.x.

    See Incompatible Changes for Apache Pig

  • Sentry

    If your cluster uses Sentry policy file authorization, you must migrate the policy files to the database-backed Sentry service before you upgrade to CDH 6.

    See Migrating from Sentry Policy Files to the Sentry Service.
  • Cloudera Search

    If your cluster uses Cloudera Search, you must migrate the configuration to Apache Solr 7.

    See Migrating Cloudera Search Configuration Before Upgrading to CDH 6.

  • Spark

    If your cluster uses Spark or Spark Standalone, there are several steps you must perform to ensure that the correct version is installed.

    See Migrating Apache Spark Before Upgrading to CDH 6.

  • Kafka
    In CDH 5.x, Kafka was delivered as a separate parcel and could be installed along with CDH 5.x using Cloudera Manager. Starting with CDH 6.0, Kafka is part of the CDH distribution and is deployed as part of the CDH 6.x parcel.
    1. Explicitly set the Kafka protocol version to match what's being used currently among the brokers and clients. Update server.properties on all brokers as follows:
      1. Log in to the Cloudera Manager Admin Console
      2. Choose the Kafka service.
      3. Click Configuration.
      4. Use the Search field to find the Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.propertiesconfiguration property.
      5. Add the following properties to the snippet:
        • inter.broker.protocol.version = current_Kafka_version
        • log.message.format.version = current_Kafka_version
        Make sure you enter full Kafka version numbers with three values, such as 0.10.0. Otherwise, you will see an error message similar to the following:
        2018-06-14 14:25:47,818 FATAL kafka.Kafka$:
        java.lang.IllegalArgumentException: Version `0.10` is not a valid version
                at kafka.api.ApiVersion$$anonfun$apply$1.apply(ApiVersion.scala:72)
                at kafka.api.ApiVersion$$anonfun$apply$1.apply(ApiVersion.scala:72)
                at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
    2. Save your changes. The information is automatically copied to each broker.

Establish Access to the Software

  • If the Cloudera Manager hosts have internet access, you can use the publicly available repositories from https://archive.cloudera.com. Fill in the form at the top of this page to generate the contents of the repository file for your system. The package manager uses this file to download and install a new version of the Cloudera Manager software. You will copy the contents of this file to your clipboard and then create the file on the hosts in a later step.
  • If the Cloudera Manager hosts do not have internet access, configure a local package repository hosted on your network, replace the default repository URL below, and click Apply to update the contents of the repository file in the text box below. You will copy the contents of this file to your clipboard and then create the file on the hosts in a later step. For example: http://MyWebServer:1234/cloudera-repos

Package Repository URL: Click Apply to generate the repository file.

Log in to each host in the cluster using ssh and perform the following steps:
  1. Back up the existing repository directory.
    RHEL / CentOS
    sudo cp -rpf /etc/yum.repos.d $HOME/yum.repos.d-`date +%F`-CM-CDH
    SLES
    sudo cp -rpf /etc/zypp/repos.d $HOME/repos.d-`date +%F`CM-CDH
    Debian / Ubuntu
    sudo cp -rpf /etc/apt/sources.list.d $HOME/sources.list.d-`date +%F`-CM-CDH
  2. Remove any older files in the existing repository directory:
    RHEL / CentOS
    sudo rm /etc/yum.repos.d/cloudera*cdh.repo*
    SLES
    sudo rm /etc/zypp/repos.d/cloudera*cdh.repo*
    Debian / Ubuntu
    sudo rm /etc/apt/sources.list.d/cloudera*cdh.list*
  3. RHEL / CentOS

    Create a file named /etc/yum.repos.d/cloudera-cdh.repo with the following content:

    [cdh]
    # Packages for CDH
    name=CDH
    baseurl=https://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/5.15
    gpgkey=https://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/RPM-GPG-KEY-cloudera
    gpgcheck=1
    SLES

    Create a file named /etc/zypp/repos.d/cloudera-cdh.repo with the following content:

    [cdh]
    # Packages for CDH
    name=CDH
    baseurl=https://archive.cloudera.com/cdh5/sles/12/x86_64/cdh/5.15
    gpgkey=https://archive.cloudera.com/cdh5/sles/12/x86_64/cm/RPM-GPG-KEY-cloudera
    gpgcheck=1
    Debian / Ubuntu

    Create a file named /etc/apt/sources.list.d/cloudera-cdh.list with the following content:

    # Packages for CDH
    deb https://archive.cloudera.com/cdh5/debian/jessie/amd64/cdh/ jessie-cdh5.15 contrib
    deb-src https://archive.cloudera.com/cdh5/debian/jessie/amd64/cdh/ jessie-cdh5.15 contrib
    sudo apt-get update

    The repository file, as created, specifies an upgrade to the most recent maintenance release of the specified minor release. If you would like to use a specific maintenance version, for example 5.15.1, replace 5.15 with 5.15.1 in the generated repository file shown above.

Stop the Cluster

Stop the cluster before proceeding to upgrade CDH using packages:
  1. Open the Cloudera Manager Admin console.
  2. Click the drop-down list next to the cluster name and select Stop.

Install CDH Packages

  1. Log in to each host in the cluster using ssh.
  2. Run the following command:
    RHEL / CentOS
    sudo yum clean all
    sudo yum install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie parquet pig pig-udf-datafu search sentry solr solr-mapreduce spark-python sqoop sqoop2 whirr zookeeper
    sudo yum clean all
    sudo yum remove hadoop-0.20\* hue-\* crunch llama mahout sqoop2 whirr sqoop2-client
    sudo yum install avro-tools flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hadoop-kms hbase-solr hive-hbase hive-webhcat hue impala impala-shell kafka kite kudu oozie pig search sentry sentry-hdfs-plugin solr-crunch solr-mapreduce spark-core spark-python sqoop zookeeper parquet hbase solr
    SLES
    sudo zypper clean --all
    sudo zypper install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie parquet pig pig-udf-datafu search sentry solr solr-mapreduce spark-python sqoop sqoop2 whirr zookeeper
    sudo zypper clean --all
    sudo zypper remove hadoop-0.20\* hue-\* crunch llama mahout sqoop2 whirr sqoop2-client
    sudo zypper install avro-tools flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hadoop-kms hbase-solr hive-hbase hive-webhcat hue impala impala-shell kafka kite kudu oozie pig search sentry sentry-hdfs-plugin solr-crunch solr-mapreduce spark-core spark-python sqoop zookeeper parquet hbase solr
    Debian / Ubuntu
    sudo apt-get update
    sudo apt-get install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie parquet pig pig-udf-datafu search sentry solr solr-mapreduce spark-python sqoop sqoop2 whirr zookeeper
    sudo apt-get update
    sudo apt-get remove hadoop-0.20\* crunch llama mahout sqoop2 whirr sqoop2-client
    sudo apt-get update
    sudo apt-get install avro-tools flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hadoop-kms hbase-solr hive-hbase hive-webhcat hue impala impala-shell kafka kite kudu oozie pig search sentry sentry-hdfs-plugin solr-crunch solr-mapreduce spark-core spark-python sqoop zookeeper parquet hbase solr
  3. Restart the Cloudera Manager Agent.
    RHEL 7, SLES 12, Debian 8, Ubuntu 16.04
    sudo systemctl restart cloudera-scm-agent
    If the agent starts without errors, no response displays.
    RHEL 5 or 6, SLES 11, Debian 6 or 7, Ubuntu 12.04, 14.04
    sudo service cloudera-scm-agent restart
    You should see the following:
    Starting cloudera-scm-agent: [ OK ]

Download and Distribute Parcels

  1. Log in to the Cloudera Manager Admin Console.
  2. Click Hosts > Parcels. The Parcels page displays.
  3. Update the Parcel Repository for CDH using the following remote parcel repository URL:
    https://archive.cloudera.com/cdh5/parcels/5.15/
    https://archive.cloudera.com/cdh6/6.0.0/parcels/
    1. Click the Configuration button.
    2. In the Remote Parcel Repository URLs section, Click the + icon to add the parcel URL above. Click Save Changes. See Parcel Configuration Settings for more information.
    3. Locate the row in the table that contains the new CDH parcel and click the Download button.
    4. After the parcel is downloaded, click the Distribute button.
  4. If your cluster has GPLEXTRAS installed, update the version of the GPLEXTRAS parcel to match the CDH version using the following remote parcel repository URL:
    https://archive.cloudera.com/gplextras5/parcels/5.15/
    https://archive.cloudera.com/gplextras6/6.0.0/parcels/
    1. Click the Configuration button.
    2. In the Remote Parcel Repository URLs section, Click the + icon to add the parcel URL above. Click Save Changes. See Parcel Configuration Settings for more information.
    3. Locate the row in the table that contains the new CDH parcel and click the Download button.
    4. After the parcel is downloaded, click the Distribute button.
  5. If your cluster has Spark 2.0 or Spark 2.1 installed and you want to upgrade to CDH 5.13 or higher, you must download and install Spark 2.1 release 2 or later.

    To install these versions of Spark, do the following before running the CDH Upgrade Wizard:
    1. Install the Custom Service Descriptor (CSD) file. See
    2. Download, distribute, and activate the Parcel for the version of Spark that you are installing: See Managing Parcels.
  6. If your cluster has Kudu 1.4.0 or lower installed and you want to upgrade to CDH 5.13 or higher, deactivate the existing Kudu parcel. Starting with Kudu 1.5.0 / CDH 5.13, Kudu is part of the CDH parcel and does not need to be installed separately.
  7. After all the parcels are distributed, click on the Upgrade button next to the chosen CDH. The chosen CDH should be selected automatically.

Run the Upgrade CDH Wizard

  1. If you are using packages, or did not choose Upgrade from the parcels page, You can get to the Upgrade CDH page from the Home > Status tab, click next to the cluster name and select Upgrade Cluster.
    Select the previously download/distributed CDH version. If no qualifying CDH parcels are pre-listed, or you want to upgrade to a different version of CDH:
    1. Click the Remote Parcel Repository URLs link and add the appropriate parcel URL. See Parcel Configuration Settings for more information.
    2. Click the Cloudera Manager logo to return to the Home page.
    3. From the Home > Status tab, click next to the cluster name and select Upgrade Cluster.

    If you were previously using packages and would like to switch to using parcels, select Use Parcels.

  2. Cloudera Manager 5.14 and lower:
    1. In the Choose CDH Version (Parcels) section, select the CDH version that you want to upgrade to.
    2. Click Continue.

      A page displays the version you are upgrading to and asks you to confirm that you have completed some additional steps.

    3. Click Yes, I have performed these steps.
    4. Click Continue.
    5. Cloudera Manager verifies that the agents are responsive and that the correct software is installed. When you see the No Errors Found message, click Continue.

      The selected parcels are downloaded, distributed, and unpacked.

    6. Click Continue.

      The Host Inspector runs. Examine the output and correct any reported errors.

    Cloudera Manager 5.15 and higher:
    1. In the Upgrade to CDH Version drop-down list, select the version of CDH you want to upgrade to.

      The Upgrade Wizard performs some checks on configurations, health, and compatibility and reports the results. Fix any reported issues before continuing.

    2. Click Run Host Inspector.

      The Host Inspector runs. Click Show Inspector Results to view the Host Inspector report (opens in a new browser tab). Fix any reported issues before continuing.

    3. Click Run Service Inspector. Click Show Inspector Results to view the output of the Service Inspector command (opens in a new browser tab). Fix any reported issues before continuing.
    4. Read the notices for steps you must complete before upgrading, select Yes, I have performed theses steps. ... after completing the steps, and click Continue.

      The selected parcels are downloaded, distributed, and unpacked. The Continue button turns blue when this process finishes.

  3. If you have a parcel that works with the existing CDH version, the Upgrade Wizard may display a message that this parcel conflicts with the new CDH version.
    1. Configure and download the newer version of this parcel before proceeding.
      1. Open the Cloudera Manager Admin Console from another browser tab, go to the parcels page, and configure the remote parcel repository for the newer version of this parcel.
      2. Download and distribute the newer version of this parcel.
    2. Click the Run All Checks Again button.
    3. Select the option to resolve the conflicts automatically.
    4. Cloudera Manager deactivates the old version of the parcel, activates the new version and verifies that all hosts have the correct software installed.
  4. Click Continue.

    The Choose Upgrade Procedure screen displays. Select the upgrade procedure from the following options:

    • Rolling Restart

      Cloudera Manager upgrades services and performs a rolling restart. The Rolling Restart dialog box displays the impact of the restart on various services. Services that do not support rolling restart undergo a normal restart, and are not available during the restart process.

      Configure the following parameters for the rolling restart (optional):

      Roles to include

      Select which roles to restart as part of the rolling restart.

      Batch Size

      Number of roles to include in a batch. Cloudera Manager restarts the worker roles rack-by-rack, in alphabetical order, and within each rack, hosts are restarted in alphabetical order. If you use the default replication factor of 3, Hadoop tries to keep the replicas on at least 2 different racks. So if you have multiple racks, you can use a higher batch size than the default 1. However, using a batch size that is too high means that fewer worker roles are active at any time during the upgrade, which can cause temporary performance degradation. If you are using a single rack, restart one worker node at a time to ensure data availability during upgrade.

      Advanced Options > Sleep between batches

      Amount of time Cloudera Manager waits before starting the next batch. Applies only to services with worker roles.

      Advanced Options > Failed threshold

      The number of batch failures that cause the entire rolling restart to fail. For example, if you have a very large cluster, you can use this option to allow some failures when you are sure that the cluster will still be functional while some worker roles are down.

      Click the Rolling Restart button when you are ready to restart the cluster.

    • Full Cluster Restart

      Cloudera Manager performs all service upgrades and restarts the cluster.

    • Manual Upgrade

      Cloudera Manager configures the cluster to the specified CDH version but performs no upgrades or service restarts. Manually upgrading is difficult and for advanced users only. Manual upgrades allow you to selectively stop and restart services to prevent or mitigate downtime for services or clusters where rolling restarts are not available.

      To perform a manual upgrade: See Upgrading CDH Manually after an Upgrade Failure for the required steps.

  5. Click Continue.

    The Upgrade Cluster Command screen displays the result of the commands run by the wizard as it shuts down all services, activates the new parcels, upgrades services, deploys client configuration files, and restarts services and performs a rolling restart of the services that support it.

    If any of the steps fail, correct any reported errors and click the Resume button. Cloudera Manager will skip restarting roles that have already successfully restarted. Alternately, return to the Home > Status tab and then perform the steps in Upgrading CDH Manually after an Upgrade Failure.

  6. Click Continue.
    If your cluster was previously installed or upgraded using packages, the wizard may indicate that some services cannot start because their parcels are not available. To download the required parcels:
    1. In another browser tab, open the Cloudera Manager Admin Console.
    2. Select Hosts > Parcels.
    3. Locate the row containing the missing parcel and click the button to Download, Distribute, and then Activate the parcel.
    4. Return to the upgrade wizard and click the Resume button.

      The Upgrade Wizard continues upgrading the cluster.

  7. Click Finish to return to the Home page.

Finalize the HDFS Upgrade

Follow the steps in this section if you are upgrading:
  • CDH 5.0 or 5.1 to 5.2 or higher
  • CDH 5.2 or 5.3 to 5.4 or higher

To determine if you can finalize, run important workloads and ensure that they are successful. Once you have finalized the upgrade, you cannot roll back to a previous version of HDFS without using backups. Verifying that you are ready to finalize the upgrade can take a long time.

Make sure you have enough free disk space, keeping in mind that the following behavior continues until the upgrade is finalized:
  • Deleting files does not free up disk space.
  • Using the balancer causes all moved replicas to be duplicated.
  • All on-disk data representing the NameNodes metadata is retained, which could more than double the amount of space required on the NameNode and JournalNode disks.
If you have Enabled high availability for HDFS, and you have performed a rolling upgrade:
  1. Go to the HDFS service.
  2. Select Actions > Finalize Rolling Upgrade and click Finalize Rolling Upgrade to confirm.

If you have not performed a rolling upgrade:

  1. Go to the HDFS service.
  2. Click the Instances tab.
  3. Click the link for the NameNode instance. If you have enabled high availability for HDFS, click the link labeled NameNode (Active).

    The NameNode instance page displays.

  4. Select Actions > Finalize Metadata Upgrade and click Finalize Metadata Upgrade to confirm.

Finalize the HDFS Upgrade

To determine if you can finalize, run important workloads and ensure that they are successful. Once you have finalized the upgrade, you cannot roll back to a previous version of HDFS without using backups. Verifying that you are ready to finalize the upgrade can take a long time.

Make sure you have enough free disk space, keeping in mind that the following behavior continues until the upgrade is finalized:
  • Deleting files does not free up disk space.
  • Using the balancer causes all moved replicas to be duplicated.
  • All on-disk data representing the NameNodes metadata is retained, which could more than double the amount of space required on the NameNode and JournalNode disks.
If you have Enabled high availability for HDFS, and you have performed a rolling upgrade:
  1. Go to the HDFS service.
  2. Select Actions > Finalize Rolling Upgrade and click Finalize Rolling Upgrade to confirm.

If you have not performed a rolling upgrade:

  1. Go to the HDFS service.
  2. Click the Instances tab.
  3. Click the link for the NameNode instance. If you have enabled high availability for HDFS, click the link labeled NameNode (Active).

    The NameNode instance page displays.

  4. Select Actions > Finalize Metadata Upgrade and click Finalize Metadata Upgrade to confirm.

Complete Post-Upgrade Migration Steps

Several components require additional migrations steps after you complete the CDH upgrade:

Exit Maintenance Mode

If you entered maintenance mode during this upgrade, exit maintenance mode.

On the Home > Status tab, click next to the cluster name and select Exit Maintenance Mode.