This is the documentation for Cloudera 5.2.x.
Documentation for other versions is available at Cloudera Documentation.

Upgrading from CDH 4 to CDH 5 Parcels

Required Role:

This topic covers upgrading a CDH 4 cluster to a CDH 5 cluster using the upgrade wizard, which will install CDH 5 parcels. Your CDH 4 cluster can be using either parcels or packages; you can use the cluster upgrade wizard to upgrade using parcels in either case.

If you want to upgrade using CDH 5 packages, you can do so using a manual process. See Upgrading from CDH 4 Packages to CDH 5 Packages.

The steps to upgrade a CDH installation managed by Cloudera Manager using parcels are as follows.

  1. Before You Begin
  2. Stop All Services
  3. Perform Service-Specific Prerequisite Actions
  4. Remove CDH Packages
  5. Deactivate and Remove the GPL Extras Parcel
  6. Run the Upgrade Wizard
  7. Import MapReduce Configuration to YARN
  8. Upgrade the GPL Extras Parcel
  9. Restart the Reports Manager Role
  10. Recompile HBase Coprocessor and Custom JARs
  11. Finalize the HDFS Metadata Upgrade

Before You Begin

  • Read the CDH 5 Release Notes.
  • Read the Cloudera Manager 5 Release Notes.
  • Ensure that the Cloudera Manager version is greater than or equal to the CDH version.
  • If you are upgrading to CDH 5.0 or 5.1, make sure there are no Oozie workflows in RUNNING or SUSPENDED status; otherwise the Oozie database upgrade will fail and you will have to reinstall CDH 4 to complete or kill those running workflows.
  • Run the Host Inspector and fix every issue.
  • If using security, run the Security Inspector.
  • Run hdfs fsck / and hdfs dfsadmin -report and fix every issue.
  • If using HBase:
    • Run hbase hbck.
    • Before you can upgrade HBase from CDH 4 to CDH 5, your HFiles must be upgraded from HFile v1 format to HFile v2, because CDH 5 no longer supports HFile v1. The upgrade procedure itself is different if you are using Cloudera Manager or the command line, but has the same results. The first step is to check for instances of HFile v1 in the HFiles and mark them to be upgraded to HFile v2, and to check for and report about corrupted files or files with unknown versions, which need to be removed manually. The next step is to rewrite the HFiles during the next major compaction. After the HFiles are upgraded, you can continue the upgrade. After the upgrade is complete, you must recompile custom coprocessors and JARs. To check and upgrade the files:
      1. In the Cloudera Admin Console, go to the HBase service and run Actions > Check HFile Version.
      2. Check the output of the command in the stderr log.
        Your output should be similar to the following:
        Tables Processed:
        hdfs://localhost:41020/myHBase/.META.
        hdfs://localhost:41020/myHBase/usertable
        hdfs://localhost:41020/myHBase/TestTable
        hdfs://localhost:41020/myHBase/t
        
        Count of HFileV1: 2
        HFileV1:
        hdfs://localhost:41020/myHBase/usertable /fa02dac1f38d03577bd0f7e666f12812/family/249450144068442524
        hdfs://localhost:41020/myHBase/usertable /ecdd3eaee2d2fcf8184ac025555bb2af/family/249450144068442512
        
        Count of corrupted files: 1
        Corrupted Files:
        hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/family/1
        Count of Regions with HFileV1: 2
        Regions to Major Compact:
        hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812
        hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af
        In the example above, you can see that the script has detected two HFile v1 files, one corrupt file and the regions to major compact.
      3. Trigger a major compaction on each of the reported regions. This major compaction rewrites the files from HFile v1 to HFile v2 format. To run the major compaction, start HBase Shell and issue the major_compact command.
        $ bin/hbase shell
        hbase> major_compact 'usertable'
        You can also do this in a single step by using the echo shell built-in command.
        $ echo "major_compact 'usertable'" | bin/hbase shell
  • Review the upgrade procedure and reserve a maintenance window with enough time allotted to perform all steps. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.
  • To avoid lots of alerts during the upgrade process, you can enable maintenance mode on your cluster before you start the upgrade. This will stop email alerts and SNMP traps from being sent, but will not stop checks and configuration validations from being made. Be sure to exit maintenance mode when you have finished the upgrade in order to re-enable Cloudera Manager alerts.

Stop All Services

  1. Stop the cluster.
    1. On the Home page, click to the right of the cluster name and select Stop.
    2. Click Stop in the confirmation screen. The Command Details window shows the progress of stopping services.

      When All services successfully stopped appears, the task is complete and you can close the Command Details window.

  2. Stop the Cloudera Management service:
    1. Do one of the following:
        1. Select Clusters > Cloudera Management Service > Cloudera Management Service.
        2. Select Actions > Stop.
        1. On the Home page, click to the right of Cloudera Management Service and select Stop.
    2. Click Stop to confirm. The Command Details window shows the progress of stopping the roles.
    3. When Command completed with n/n successful subcommands appears, the task is complete. Click Close.

Perform Service-Specific Prerequisite Actions

  • Accumulo - if you have installed the Accumulo parcel, deactivate it following the instructions in Managing Parcels.
  • HDFS - Back up HDFS metadata on the NameNode:
    1. Stop the cluster. It is particularly important that the NameNode role process is not running so that you can make a consistent backup.
    2. Go to the HDFS service.
    3. Click the Configuration tab.
    4. In the Search field, search for "NameNode Data Directories". This locates the NameNode Data Directories property.
    5. From the command line on the NameNode host, back up the directory listed in the NameNode Data Directories property. If more than one is listed, then you only need to make a backup of one directory, since each directory is a complete copy. For example, if the data directory is /mnt/hadoop/hdfs/name, do the following as root:
      # cd /mnt/hadoop/hdfs/name
      # tar -cvf /root/nn_backup_data.tar .

      You should see output like this:

      ./
      ./current/
      ./current/fsimage
      ./current/fstime
      ./current/VERSION
      ./current/edits
      ./image/
      ./image/fsimage
        Warning: If you see a file containing the word lock, the NameNode is probably still running. Repeat the preceding steps, starting by shutting down the CDH services.

Remove CDH Packages

If your previous installation of CDH was done using packages, you must remove those packages on all hosts in the cluster being upgraded. This will definitely be the case if you are running a version of CDH prior to CDH 4.1.3, since parcels were not available with those releases.
  1. Uninstall the CDH packages. On each host:
    Operating System Command
    RHEL $ sudo yum remove bigtop-jsvc bigtop-utils bigtop-tomcat hue-common sqoop2-client hbase-solr-doc solr-doc
    SLES $ sudo zypper remove bigtop-jsvc bigtop-utils bigtop-tomcat hue-common sqoop2-client hbase-solr-doc solr-doc
    Ubuntu or Debian $ sudo apt-get purge bigtop-jsvc bigtop-utils bigtop-tomcat hue-common sqoop2-client hbase-solr-doc solr-doc
  2. Restart all the Cloudera Manager Agents to force an update of the installed binaries reported by the Agent. On each host:
    $ sudo service cloudera-scm-agent restart
  3. Run the Host Inspector to verify that the packages have been removed:
    1. Click Hosts tab and then click the Host Inspector button.
    2. When the command completes, click Show Inspector Results.

Deactivate and Remove the GPL Extras Parcel

If you are using LZO, deactivate and remove the CDH 4 GPL Extras parcel.

Run the Upgrade Wizard

The first step of the upgrade process is to download and distribute the parcel for the versions of CDH that you want to install. CDH 5 parcels include Impala and Search, so it is not necessary to add Impala or Search parcels separately:
  1. Log into the Cloudera Manager Admin console.
  2. From the Home tab Status page, click next to the cluster name and select Upgrade Cluster. The Upgrade Wizard starts.
  3. Click the checkbox to acknowledge that you have backed up all your databases and click Continue.
  4. The next step shows you the hosts that the Upgrade Wizard has detected as needing to be upgraded.
  5. Select Use Parcels as your install method, and select the parcel you want to install. Do not select Use Packages. This option only works if you have previously installed the CDH 5 packages. If those packages are not present the upgrade wizard will not continue. To upgrade to CDH 5 using packages, see Upgrading from CDH 4 Packages to CDH 5 Packages.
  6. Click Continue to initiate the parcel download and distribution step.
  7. When your parcels have been downloaded and distributed successfully, click Continue.
  8. The next page notifies you that the services on your cluster will be shut down. Rolling upgrade is not available. You can select whether to have all your services restarted and client configurations deployed automatically after the upgrade has finished. Click Continue to proceed.
  9. The upgrade wizard proceeds to execute the various steps involved in upgrading your cluster, which includes:
    • Waiting for the Cloudera Manager Agent to recognize the new CDH version
    • Converting your configuration parameters
    • Upgrading HDFS metadata, Sqoop server, Hive metastore, and various databases
    • Deploying client configuration and restarting services, if you elected those options
      Note: If you encounter errors during these steps:
    • If the upgrade reports the error "Could not find a healthy host with CDH5 on it to create HiveServer2", wait 30 seconds and retry the upgrade.
    • If the converting configuration parameters step fails, Cloudera Manager rolls back all configurations to CDH 4. Fix any reported problems and retry the upgrade.
    • If the upgrade command fails at any point after the convert configuration step, there is no retry support in Cloudera Manager. You must first correct the error, then manually re-run the individual commands. You can view the remaining commands in the Recent Commands page.
    • If the HDFS upgrade metadata step fails, you cannot revert back to CDH 4 unless you restore a backup of Cloudera Manager.
  10. When the upgrade has finished, the Host Inspector runs. This should now show that the hosts are running CDH 5. Click Continue to proceed.

    If your cluster name includes the string "CDH 4" the upgrade procedure changes the string to "CDH 5". Otherwise, it leaves the cluster name unchanged. If you want to rename the cluster, you can do so by clicking the cluster name, which displays a pop-up where you can change the name.

Import MapReduce Configuration to YARN

In CDH 5 and Cloudera Manager 5, YARN rather than MapReduce is the default MapReduce computation framework. If you had the MapReduce service configured in CDH 4, you can import the MapReduce configuration to YARN. This does not affect your MapReduce configuration.

  Warning: In addition to importing configuration settings, the import process:
  • Configures services to use YARN as the MapReduce computation framework instead of MapReduce.
  • Overwrites existing YARN configuration and role assignments.
  1. To import the existing configuration from your MapReduce service, select OK, set up YARN to add the YARN service and import the MapReduce settings. To skip the import, select Skip this step now. If you choose to skip this step, you can perform it at a later time from the YARN service.
  2. Click Continue to proceed. Cloudera Manager stops the YARN service (if running) and its dependencies.
  3. Click Continue to proceed. The next page indicates some additional configuration required by YARN.
  4. Verify or modify the configurations and click Continue. The Switch Cluster to MR2 step proceeds.
  5. When all steps have completed, click Continue.

Upgrade the GPL Extras Parcel

If you are using LZO:
  1. Install the CDH 5 GPL Extras parcel. See Installing GPL Extras.
  2. Reconfigure and restart services that use the parcel. See Configuring Services to Use the GPL Extras Parcel.

Restart the Reports Manager Role

  1. Do one of the following:
    • Select Clusters > Cloudera Management Service > Cloudera Management Service.
    • On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management Service link.
  2. Click the Instances tab.
  3. Check the checkbox next to Reports Manager.
  4. Select Actions for Selected > Restart and then Restart to confirm.

Recompile HBase Coprocessor and Custom JARs

Before using any HBase applications that use coprocessor or custom JARs, you must recompile the JARs.

Finalize the HDFS Metadata Upgrade

After ensuring that the CDH 5 upgrade has succeeded and that everything is running smoothly, finalize the HDFS metadata upgrade. It is not unusual to wait days or even weeks before finalizing the upgrade.
  1. Go to the HDFS service.
  2. Click the Instances tab.
  3. Click the NameNode instance.
  4. Select Actions > Finalize Metadata Upgrade and click Finalize Metadata Upgrade to confirm.