Upgrading CDH

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

This topic describes how to upgrade CDH from any version of CDH 5.x to a higher version of CDH, 5.15 or below, using Cloudera Manager parcels or RPM-based packages.
  • Parcels – This option uses Cloudera Manager to upgrade CDH and allows you to upgrade your cluster either via a full restart or a rolling restart if you have HDFS high availability enabled and have a Cloudera Enterprise license. The use of Parcels requires that your cluster be managed by Cloudera Manager. This includes any components that are not a regular part of CDH, such as Spark 2.
  • Packages – This option is the most time consuming and requires you to log in using ssh and execute a series of package commands on all hosts in your cluster. Cloudera recommends that you instead upgrade your cluster using parcels, which allows Cloudera Manager to distribute the upgraded software to all hosts in the cluster without having to log in to each host. If you installed the cluster using packages, you can upgrade using parcels and the cluster will use parcels for subsequent upgrades.

    If the CDH cluster you are upgrading was installed using packages, you can upgrade it using parcels, and the upgraded version of CDH will then use parcels for future upgrades or changes. You can migrate your cluster from using packages to using parcels before starting the upgrade.

The minor version of Cloudera Manager you use to perform the upgrade must be equal to or greater than the CDH minor version. To upgrade Cloudera Manager, see Overview of Upgrading Cloudera Manager.

Fill in the following form to create a customized set of instructions for your CDH upgrade:

Loading Filters ...

Before You Begin

  1. You must have SSH access to the Cloudera Manager server hosts and be able to login using the root account or an account that has password-less sudo permission to all the hosts.
  2. Review the Requirements and Supported Versions for the new versions you are upgrading to.
  3. Ensure that Java 1.7 or 1.8 is installed across the cluster. For installation instructions and recommendations, see Step 2: Install Java Development Kit or Upgrading to Oracle JDK 1.8.
  4. Review the CDH 5 Release Notes.
  5. Review the Cloudera Security Bulletins.
  6. Review the upgrade procedure and reserve a maintenance window with enough time allotted to perform all steps. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.
  7. If you are upgrading from CDH 5.1 or lower, and use Hive Date partition columns, you may need to update the date format. See Date partition columns.
  8. If the cluster uses Impala, check your SQL against the newest reserved words listed in incompatible changes. If upgrading across multiple versions, or in case of any problems, check against the full list of Impala keywords.
  9. Run the Security Inspector and fix any reported errors.

    Go to Administration > Security > Security Inspector.

  10. Log in to any cluster node as the hdfs user, run the following commands, and correct any reported errors:
    hdfs fsck /
    hdfs dfsadmin -report
    See HDFS Commands Guide in the Apache Hadoop documentation.
  11. Log in to any DataNode as the hbase user, run the following command, and correct any reported errors:
    hbase hbck 
    See Checking and Repairing HBase Tables.
  12. If you have configured Hue to use TLS/SSL and you are upgrading from CDH 5.2 or lower to CDH 5.3 or higher, Hue validates CA certificates and requires a truststore. To create a truststore, follow the instructions in Hue as a TLS/SSL Client.
  13. If your cluster uses the Flume Kafka client, and you are upgrading to CDH 5.8.0 or CDH 5.8.1, perform the extra steps described in Upgrading to CDH 5.8.0 or CDH 5.8.1 When Using the Flume Kafka Client and then continue with the procedures in this topic.
  14. If your cluster uses Impala and Llama, this role has been deprecated as of CDH 5.9 and you must remove the role from the Impala service before starting the upgrade. If you do not remove this role, the upgrade wizard will halt the upgrade.
    To determine if Impala uses Llama:
    1. Go to the Impala service.
    2. Select the Instances tab.
    3. Examine the list of roles in the Role Type column. If Llama appears, the Impala service is using Llama.
    To remove the Llama role:
    1. Go to the Impala service and select Actions > Disable YARN and Impala Integrated Resource Management.

      The Disable YARN and Impala Integrated Resource Management wizard displays.

    2. Click Continue.

      The Disable YARN and Impala Integrated Resource Management Command page displays the progress of the commands to disable the role.

    3. When the commands have completed, click Finish.
  15. If your cluster uses Sentry, and are upgrading from CDH 5.12 or lower, you may need to increase the Java heap memory for Sentry. See Performance Guidelines.

Enter Maintenance Mode

To avoid unnecessary alerts during the upgrade process, enter maintenance mode on your cluster before you start the upgrade. This stops email alerts and SNMP traps from being sent, but does not stop checks and configuration validations. Be sure to exit maintenance mode when you have finished the upgrade to reenable Cloudera Manager alerts. More Information.

On the Home > Status tab, click next to the cluster name and select Enter Maintenance Mode.

Back Up HDFS Metadata

[Not required for CDH maintenance release upgrades.]

The steps in this section are only required for the following upgrades:
  • CDH 5.0 or 5.1 to 5.2 or higher
  • CDH 5.2 or 5.3 to 5.4 or higher

Back up HDFS metadata using the following command:

hdfs dfsadmin -fetchImage myImageName

Back Up HDFS Metadata

[Not required for CDH maintenance release upgrades.]

The steps in this section are only required for the following upgrades:
  • CDH 5.0 or 5.1 to 5.2 or higher
  • CDH 5.2 or 5.3 to 5.4 or higher

Back up HDFS metadata using the following command:

hdfs dfsadmin -fetchImage myImageName

Back Up Databases

Gather the following information:

  • Type of database (PostgreSQL, Embedded PostgreSQL, MySQL, MariaDB, or Oracle)
  • Hostnames of the databases
  • Credentials for the databases
  • Sqoop, Oozie, and Hue – Go to Cluster Name > Configuration > Database Settings.
  • Hive Metastore – Go to the Hive service, select Configuration, and select the Hive Metastore Database category.
  • Sentry – Go to the Sentry service, select Configuration, and select the Sentry Server Database category.

To back up the databases

  1. If not already stopped, stop the service.
    1. On the Home > Status tab, click to the right of the service name and select Stop.
    2. Click Stop in the next screen to confirm. When you see a Finished status, the service has stopped.
  2. Back up the database.
    MySQL
    mysqldump --databases database_name --host=database_hostname --port=database_port -u database_username -p > $HOME/database_name-backup-`date +%F`.sql
    PostgreSQL/Embedded
    pg_dump -h database_hostname -U database_username -W -p database_port database_name > $HOME/database_name-backup-`date +%F`.sql
    Oracle
    Work with your database administrator to ensure databases are properly backed up.
  3. Start the service.
    1. On the Home > Status tab, click to the right of the service name and select Start.
    2. Click Start that appears in the next screen to confirm. When you see a Finished status, the service has started.

Establish Access to the Software

Package Repository URL:

  1. SSH into each host in the cluster.
  2. Redhat / CentOS

    Create a file named cloudera_cdh.repo with the following content:

    [cdh]
    # Packages for CDH
    name=CDH
    baseurl=https://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/5.15
    gpgkey=https://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/RPM-GPG-KEY-cloudera
    gpgcheck=1
    SLES

    Create a file named cloudera_cdh.repo with the following content:

    [cdh]
    # Packages for CDH
    name=CDH
    baseurl=https://archive.cloudera.com/cdh5/sles/12/x86_64/cdh/5.15
    gpgkey=https://archive.cloudera.com/cdh5/sles/12/x86_64/cm/RPM-GPG-KEY-cloudera
    gpgcheck=1
    Debian / Ubuntu

    Create a file named cloudera_cdh.list with the following content:

    # Packages for CDH
    deb https://archive.cloudera.com/cdh5/debian/jessie/amd64/cdh/ jessie-cdh5.15 contrib
    deb-src https://archive.cloudera.com/cdh5/debian/jessie/amd64/cdh/ jessie-cdh5.15 contrib

    The repository file, as created, specifies an upgrade to the most recent maintenance release of the specified minor release. If you would like to upgrade to an specific maintenance version, for example 5.15.1, replace 5.15 with 5.15.1 in the generated repository file shown above.

  3. Backup the existing repository directory.
    Redhat / CentOS
    sudo cp -rf /etc/yum.repos.d $HOME/yum.repos.d-`date +%F`
    SLES
    sudo cp -rf /etc/zypp/repos.d $HOME/repos.d-`date +%F`
    Debian / Ubuntu
    sudo cp -rf /etc/apt/sources.list.d $HOME/sources.list.d-`date +%F`
  4. Remove any older files in the existing repository directory:
    Redhat / CentOS
    sudo rm /etc/yum.repos.d/cloudera*cdh.repo*
    SLES
    sudo rm /etc/zypp/repos.d/cloudera*cdh.repo*
    Debian / Ubuntu
    sudo rm /etc/apt/sources.list.d/cloudera*cdh.list*
  5. Copy the repository file created above to the repository directory:
    Redhat / CentOS
    sudo cp cloudera_cdh.repo /etc/yum.repos.d/
    SLES
    sudo cp cloudera_cdh.repo /etc/zypp/repos.d/
    Debian / Ubuntu
    sudo cp cloudera_cdh.list /etc/apt/sources.list.d/

Stop the Cluster

  1. Stop the cluster before proceed with upgrading CDH using packages.

Install CDH Packages

  1. SSH into each host in the cluster.
  2. Redhat / CentOS
    sudo yum clean all
                      sudo yum install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie parquet pig pig-udf-datafu search sentry solr solr-mapreduce spark-python sqoop sqoop2 whirr zookeeper
    SLES
    sudo zypper clean --all
                      sudo zypper install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie parquet pig pig-udf-datafu search sentry solr solr-mapreduce spark-python sqoop sqoop2 whirr zookeeper
    Debian / Ubuntu
    sudo apt-get update
                      sudo apt-get install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie parquet pig pig-udf-datafu search sentry solr solr-mapreduce spark-python sqoop sqoop2 whirr zookeeper
  3. Restart the Cloudera Manager Agent.
    Redhat 7, SLES 12, Debian 8, Ubuntu 16.04
    sudo systemctl restart cloudera-scm-agent
    You should see no response if there are no errors starting the agent.
    Redhat 5 or 6, SLES 11, Debian 6 or 7, Ubuntu 12.04, 14.04
    sudo service cloudera-scm-agent restart
    You should see the following:
    Starting cloudera-scm-agent: [ OK ]

Run the Upgrade Wizard

  1. If your cluster has Kudu 1.4.0 or lower installed and you want to upgrade to CDH 5.13 or higher, deactivate the existing Kudu parcel. Starting with Kudu 1.5.0 / CDH 5.13, Kudu is part of the CDH parcel and does not need to be installed separately.
  2. If your cluster has Spark 2.0 or Spark 2.1 installed and you want to upgrade to CDH 5.13 or higher, you must download and install Spark 2.1 release 2 or later.

    To install these versions of Spark, do the following before running the CDH Upgrade Wizard:
    1. Install the Custom Service Descriptor (CSD) file. See
    2. Download, distribute, and activate the Parcel for the version of Spark that you are installing: See Managing Parcels.
  3. If your cluster has GPLEXTRAS installed, download and distribute the version of the GPLEXTRAS parcel that matches the version of CDH that you are upgrading to.
  4. From the Home > Status tab, click next to the cluster name and select Upgrade Cluster.
  5. If the option to pick between packages and parcels displays, select Use Parcels.
    The Getting Started page of the upgrade wizard displays and lists the available versions of CDH that are available for upgrade. If no qualifying parcels are listed, or you want to upgrade to a different version of CDH:
    1. Click the Remote Parcel Repository URLs link and add the appropriate parcel URL. See Parcel Configuration Settings for more information.
    2. Click the Cloudera Manager log to return to the Home page.
    3. From the Home > Status tab, click next to the cluster name and select Upgrade Cluster.

      The Getting Started page of the upgrade wizard displays.

  6. Select the CDH version and download the parcels.
    • Cloudera Manager 5.14 and lower:
      1. In the Choose CDH Version (Parcels) section, select the CDH version that you want to upgrade to.
      2. Click Continue.

        A page displays the version you are upgrading to and asks you to confirm that you have completed some additional steps.

      3. Click Yes, I have performed these steps.
      4. Click Continue.
      5. Cloudera Manager verifies that the agents are responsive and that the correct software is installed. When you see the No Errors Found message, click Continue.

        The selected parcels are downloaded, distributed, and unpacked.

      6. Click Continue.

        The Host Inspector runs. Examine the output and correct any reported errors.

    • Cloudera Manager 5.15 and higher:
      1. In the Upgrade to CDH Version drop-down list, select the version of CDH you want to upgrade to.

        The Upgrade Wizard performs some checks on configurations, health, and compatibility and reports the results. Fix any reported issues before continuing.

      2. Click Run Host Inspector.

        The Host Inspector runs. Click Show Inspector Results to see the Host Inspector report (opens in a new browser tab).

      3. Read the notices for steps you must complete before upgrading, select Yes, I have performed theses steps. ... after completing the steps, and click Continue.

        The selected parcels are downloaded, distributed, and unpacked. The Continue button turns blue when this process finishes.

  7. If you downloaded a new version of the GPLEXTRAS parcel, the Upgrade Wizard displays a message that the GPLEXTRAS parcel conflicts with the version of the CDH parcel, similar to the following:

    Select the option to resolve the conflicts automatically.

    Cloudera Manager deactivates the old version of the GPLEXTRAS parcel, activates the new version and verifies that all hosts have the correct software installed.

    Cloudera Manager may also offer to resolve other parcels with conflicting versions.

    Click Continue when the parcel is installed

  8. Click Continue.

    The Choose Upgrade Procedure screen displays. Select the upgrade procedure from the following options:

    • Rolling Restart

      Cloudera Manager upgrades services and performs a rolling restart.

      This option is only available if you have enabled high availability for HDFS and you are performing a minor/maintenance upgrade.

      Services that do not support rolling restart undergo a normal restart, and are not available during the restart process.

      Configure the following parameters for the rolling restart (optional):

      Batch Size

      Number of roles to include in a batch. Cloudera Manager restarts the worker roles rack-by-rack, in alphabetical order, and within each rack, hosts are restarted in alphabetical order. If you use the default replication factor of 3, Hadoop tries to keep the replicas on at least 2 different racks. So if you have multiple racks, you can use a higher batch size than the default 1. However, using a batch size that is too high means that fewer worker roles are active at any time during the upgrade, which can cause temporary performance degradation. If you are using a single rack, restart one worker node at a time to ensure data availability during upgrade.

      Advanced Options > Sleep between batches

      Amount of time Cloudera Manager waits before starting the next batch. Applies only to services with worker roles.

      Advanced Options > Failed threshold

      The number of batch failures that cause the entire rolling restart to fail. For example if you have a very large cluster, you can use this option to allow some failures when you know that the cluster is functional when some worker roles are down.

    • Full Cluster Restart

      Cloudera Manager performs all service upgrades and restarts the cluster.

    • Manual Upgrade

      Cloudera Manager configures the cluster to the specified CDH version but performs no upgrades or service restarts. Manually upgrading is difficult and for advanced users only. Manual upgrades allow you to selectively stop and restart services to prevent or mitigate downtime for services or clusters where rolling restarts are not available.

      To perform a manual upgrade: See Performing Upgrade Wizard Actions Manually for the required steps.

  9. Click Continue.

    The Upgrade Cluster Command screen displays the result of the commands run by the wizard as it shuts down all services, activates the new parcels, upgrades services, deploys client configuration files, and restarts services , and performs a rolling restart of the services that support it.

    If any of the steps fail, correct any reported errors and click the Resume button. Cloudera Manager will skip restarting roles that have already successfully restarted. Alternatively, return to the Home > Status tab and Performing Upgrade Wizard Actions Manually

  10. Click Continue.
    If your cluster was previously installed or upgraded using packages, the wizard may indicate that some services cannot start because their parcels are not available. To download the required parcels:
    1. In another browser tab, open the Cloudera Manager Admin Console.
    2. Select Hosts > Parcels.
    3. Locate the row containing the missing parcel and click the button to Download, Distribute, and then Activate the parcel.
    4. Return to the upgrade wizard and click the Resume button.

      The Upgrade Wizard continues upgrading the cluster.

  11. Click Finish to return to the Home page.

Finalize the HDFS Metadata Upgrade

[Not required for CDH maintenance release upgrades.]

The steps in this section are only required for the following upgrades:
  • CDH 5.0 or 5.1 to 5.2 or higher
  • CDH 5.2 or 5.3 to 5.4 or higher

To determine if you can finalize, run important workloads and ensure that they are successful. Once you have finalized the upgrade, you cannot roll back to a previous version of HDFS without using backups. Verifying that you are ready to finalize the upgrade can take a long time.

Make sure you have enough free disk space, keeping in mind that the following behavior continues until the upgrade is finalized:
  • Deleting files does not free up disk space.
  • Using the balancer causes all moved replicas to be duplicated.
  • All on-disk data representing the NameNodes metadata is retained, which could more than double the amount of space required on the NameNode and JournalNode disks.
To finalize the metadata upgrade:
  1. Go to the HDFS service.
  2. Click the Instances tab.
  3. Select the NameNode instance. If you have enabled high availability for HDFS, select NameNode (Active).
  4. Select Actions > Finalize Metadata Upgrade and click Finalize Metadata Upgrade to confirm.

Finalize the HDFS Metadata Upgrade

[Not required for CDH maintenance release upgrades.]

The steps in this section are only required for the following upgrades:
  • CDH 5.0 or 5.1 to 5.2 or higher
  • CDH 5.2 or 5.3 to 5.4 or higher

To determine if you can finalize, run important workloads and ensure that they are successful. Once you have finalized the upgrade, you cannot roll back to a previous version of HDFS without using backups. Verifying that you are ready to finalize the upgrade can take a long time.

Make sure you have enough free disk space, keeping in mind that the following behavior continues until the upgrade is finalized:
  • Deleting files does not free up disk space.
  • Using the balancer causes all moved replicas to be duplicated.
  • All on-disk data representing the NameNodes metadata is retained, which could more than double the amount of space required on the NameNode and JournalNode disks.
To finalize the metadata upgrade:
  1. Go to the HDFS service.
  2. Click the Instances tab.
  3. Select the NameNode instance. If you have enabled high availability for HDFS, select NameNode (Active).
  4. Select Actions > Finalize Metadata Upgrade and click Finalize Metadata Upgrade to confirm.

Finalize HDFS Rolling Upgrade

[Not required for CDH maintenance release upgrades.]

The steps in this section are only required for the following upgrades:
  • CDH 5.0 or 5.1 to 5.2 or higher
  • CDH 5.2 or 5.3 to 5.4 or higher

To determine if you can finalize, run important workloads and ensure that they are successful. Once you have finalized the upgrade, you cannot roll back to a previous version of HDFS without using backups. Verifying that you are ready to finalize the upgrade can take a long time.

  1. Go to the HDFS service.
  2. Select Actions > Finalize Rolling Upgrade and click Finalize Rolling Upgrade to confirm.

Finalize HDFS Rolling Upgrade

[Not required for CDH maintenance release upgrades.]

The steps in this section are only required for the following upgrades:
  • CDH 5.0 or 5.1 to 5.2 or higher
  • CDH 5.2 or 5.3 to 5.4 or higher

To determine if you can finalize, run important workloads and ensure that they are successful. Once you have finalized the upgrade, you cannot roll back to a previous version of HDFS without using backups. Verifying that you are ready to finalize the upgrade can take a long time.

  1. Go to the HDFS service.
  2. Select Actions > Finalize Rolling Upgrade and click Finalize Rolling Upgrade to confirm.

Exit Maintenance Mode

If you entered maintenance mode during this upgrade, exit maintenance mode.

On the Home > Status tab, click next to the cluster name and select Exit Maintenance Mode.