This is the documentation for Cloudera Manager 4.8.3.
Documentation for other versions is available at Cloudera Documentation.

Upgrading CDH3 to CDH4 in a Cloudera Manager Deployment

  Important:

The following instructions describe how to upgrade components managed by Cloudera Manager from a CDH3 release to the latest CDH4 release. This involves uninstalling the CDH3 packages and installing the CDH4 packages.

If you are upgrading from a CDH4 release, use the instructions under Upgrading to the Latest Version of CDH4 in a Cloudera Managed Deployment instead.

For instructions on upgrading components, such as Flume, that Cloudera Manager does not manage, see the CDH4 Installation Guide.

  Note:

As of Cloudera Manager 4.5, you can upgrade to CDH4.1.3 (or later) within the Cloudera Manager Admin Console, using parcels and an upgrade wizard. This vastly simplifies the upgrade process. In addition, this will enable Cloudera Manager to automate the deployment and rollback of CDH versions. Electing to upgrade using packages means that future upgrades and rollbacks will still need to be done manually.

If you are running Cloudera Manager 4.5, and want to upgrade to CDH4.1.3 or later, see Upgrading from CDH 3 to CDH4.1.3 or Later Using the Upgrade Wizard for instructions. If you want to upgrade to a version of CDH4 earlier than 4.1.3, you will still need to follow the instructions below.

Before You Begin

  Important:

Hive has undergone major version changes between CDH3, CDH4.0 to 4.1 and between CDH4.1 and 4.2. (CDH4.0 had Hive 0.8.0, CDH4.1 used Hive 0.9.0, and 4.2 or later has 0.10.0). This requires the user to manually back up and upgrade their Hive metastore database when upgrading between major Hive versions.

In Cloudera Manager, if you are upgrading from a version of CDH prior to CDH4.2, you must follow the steps in the appropriate CDH upgrade procedure for upgrading the metastore BEFORE you start the Hive service. This applies whether you are upgrading to packages or parcels.

  Important:

Before upgrading, be sure to read about the latest Incompatible Changes and Known Issues and Workarounds in the CDH4 Release Notes.

Plan Downtime

If you are upgrading a cluster that is part of a production system, be sure to plan ahead. As with any operational work, be sure to reserve a maintenance window with enough extra time allotted in case of complications. The Hadoop upgrade process is well understood, but it is best to be cautious. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.

If Security is Enabled, put the NameNode into Safemode

  Important:

If you have security enabled, you must do this prior to stopping services (if you are upgrading with packages) or prior to starting the Upgrade Wizard to upgrade with parcels.

If security is enabled, put the NameNode into Safemode and save the Namespace (see the CDH3 Security Guide for more information about CDH3 security):

  1. In the Cloudera Manager Admin console, go the the HDFS service, NameNode role instance.
  2. From the NameNode role's Actions menu, click Enter Safemode... and confirm that you want to do this.
  3. After the NameNode has successfully entered Safemode, from the Actions menu, click Save Namespace... and confirm that you want to do this.

    This will result in a new fsimage being written out with no edit log entries.

  4. Leave the NameNode in safemode while you proceed with the instructions to do the upgrade.

To upgrade CDH in multiple clusters, repeat this process for each cluster.

Upgrading from CDH 3 to CDH4.1.3 or Later Using the Upgrade Wizard

  Important:

If you have security enabled, make sure you have run the commands to put the NameNode into safemode, as described above in Before You Begin.

  Important:

If you set up your secure CDH3 cluster using Cloudera Manager 4.1, and have subsequently upgraded your Cloudera Manager to version 4.5, upgrading your CDH3 cluster will fail due to a missing HTTP principal in the NameNode's keytab. This is not a problem if you installed CDH3 using Cloudera Manager 4.5. See Known Issues and Workarounds in Cloudera Manager 4 for more information.

You can upgrade from CDH3 packages to CDH4 parcels from within the Cloudera Manager Admin Console. Not only is this process more streamlined than doing it manually with packages, but it also allows Cloudera Manager to automate the deployment and rollback of CDH versions in the future.

  1. In the Cloudera Manager Admin Console, click the Parcels indicator in the top navigation bar ( ) to go to the Parcels page.
  2. In the parcels page, click Download for the version you want to download.
  3. When the download has completed, click Distribute for the version you downloaded.
  4. When the parcel has been distributed and unpacked, the button will change to say Upgrade.
  5. BEFORE YOU CLICK UPGRADE verify that the /user/oozie directory exists.
    • If it does not exist, create it (in HDFS) before continuing with the upgrade wizard. You must create it as the oozie Unix user to ensure that the directory has the correct permissions.
  6. Click the Upgrade button and in the pop-up that appears, read the information and click Upgrade Cluster to proceed.
      Important:

    If you are using Hive, DO NOT elect to have the upgrade wizard start the services — you must upgrade your Hive metastore before you restart Hive.

    If you are not using Hive, you can also elect to have the upgrade start your services and deploy the client configuration as part of the upgrade.

    The upgrade process will execute the commands to stop your services, convert your configuration parameters, upgrade your HDFS metadata and the Oozie database and ShareLib. When the upgrade has finished, the All Services page will appear.

  7. If Hue is configured to use SQLite as its database, back up the desktop.db kept at /usr/share/hue/desktop/desktop.db to a temporary location.
      Important: Removing the Hue Common package will remove your Hue database; if you do not back it up you may lose all your Hue user account information.

    Start the new Hue service before you remove your CDH3 packages.

  8. Uninstall CDH3 on each host. (Please note what version of Hive you were using as you will need that information in order to upgrade your Hive metastore in step 10 below.
    OS Command

    RHEL

    $ sudo yum remove hadoop-0.20 hue-common hadoop-pig oozie-client hadoop-hive hadoop-hbase hadoop-zookeeper bigtop-utils

    SLES

    $ sudo zypper remove hadoop-0.20 hue-common hadoop-pig oozie-client hadoop-hive hadoop-hbase hadoop-zookeeper bigtop-utils$ sudo 

    Ubuntu or Debian

    $ sudo apt-get purge hadoop-0.20 hue-common hadoop-pig oozie-client hadoop-hive hadoop-hbase hadoop-zookeeper bigtop-utils$ sudo 
  9. Restart all the Cloudera Manager agents to force an update of the symlinks to point to the newly installed components.

    To restart the Cloudera Manager agents:

    On each host:

    $ sudo service cloudera-scm-agent restart
  10. Go to Step 9. Upgrade your Hive Metastore below and follow the instructions there to upgrade the Hive metastore.
  11. Restart the Services you Stopped
    1. In the Cloudera Manager Admin Console, click the Services tab.
    2. Click the top Actions button that corresponds to the cluster and choose Restart. The Command Details window shows the progress of starting services.
  12. Redeploy the client configuration files.
    1. From the top Actions button that corresponds to the cluster and choose Deploy Client Configuration....
    2. Click the Deploy Client Configuration button in the confirmation pop-up that appears.
  13. Go to Step 12: Finalize the HDFS Metadata Upgrade and follow the instructions there to finish the installation process.

Upgrading Using Packages

Use the instructions that follow to upgrade to CDH4.

Step 1: Back Up Important Items

  Important:

Do this step now if your cluster includes any Ubuntu or Debian systems running CDH3u3 or earlier.

Otherwise, you can perform this step later if you prefer – any time before you use Cloudera Manager to upgrade the cluster.

  1. Back up the databases. For instructions, see Database Considerations for Cloudera Manager Upgrades.
  2. If the directory /usr/lib/oozie/libext exists, move it to a temporary location before you proceed.

Step 2: Stop All CDH Components

  Important:

If you have security enabled, make sure you have run the commands to put the NameNode into safemode, as described above in Before You Begin.

To stop all services

  1. In the Cloudera Manager Admin Console, select Services > All Services.
  2. Click the top Actions button that corresponds to the cluster and choose Stop.... Click Stop in the confirmation screen. The Command Details window shows the progress of stopping services. When All services successfully stopped appears, the task is complete and you may close the Command Details window.
  3. For each Cloudera Management Service entry, click Actions and click Stop.... Click Stop in the confirmation screen. The Command Details window shows the progress of stopping services. When All services successfully stopped appears, the task is complete and you may close the Command Details window.

Step 3: Back up the HDFS Metadata

Back up the HDFS metadata on the NameNode machine. Back up the HDFS metadata on the NameNode machine.

  Important:

Do the following when you are sure that all Hadoop services have been shut down. It is particularly important that the NameNode service is not running so that you can make a consistent backup.

  Note: Cloudera recommends backing up HDFS metadata on a regular basis, as well as before a major upgrade.
  1. On the Services page of Cloudera Manager, click the link for the HDFS service. Click the Configuration tab and click Edit. On that page find the name of the NameNode Data Directories (in NameNode (Default)).
  2. From the command line on the NameNode machine, back up that directory; for example, if the data directory is /mnt/hadoop/hdfs/name, do the following as root:
    # cd /mnt/hadoop/hdfs/name
    # tar -cvf /root/nn_backup_data.tar .

    You should see output like this:

    ./
    ./current/
    ./current/fsimage
    ./current/fstime
    ./current/VERSION
    ./current/edits
    ./image/
    ./image/fsimage
  3. Check the output.
      Warning:

    If you see a file containing the word lock, the NameNode is probably still running. Repeat the preceding steps, starting by shutting down the Hadoop services.

If you need to restore HDFS metadata, refer to Cloudera's Knowledge Base article, "How do I recover a failed NameNode?"

Step 4: Uninstall CDH3

Uninstall CDH3 on each host.

  • On Red Hat-compatible systems:
$ sudo yum remove hadoop-0.20 hue-common hadoop-pig oozie-client hive hadoop-hbase hadoop-zookeeper bigtop-utils
  • On SUSE systems:
$ sudo zypper remove hadoop-0.20 hue-common hadoop-pig oozie-client hive hadoop-hbase hadoop-zookeeper bigtop-utils
  • On Ubuntu and Debian systems:
$ sudo apt-get purge hadoop-0.20 hue-common hadoop-pig oozie-client hive hadoop-hbase hadoop-zookeeper bigtop-utils
  Warning:

If you are upgrading an Ubuntu or Debian system from CDH3u3 or earlier, you must use apt-get purge (rather than apt-get remove) to make sure the re-install succeeds, but be aware that apt-get purge removes all your configuration data. If you have modified any configuration files, DO NOT PROCEED before backing them up.

Step 5: Download CDH4

On Red Hat-compatible systems:

  1. Download the CDH4 Package:
    1. Click the entry in the table below that matches your Red Hat or CentOS system, choose Save File, and save the file to a directory to which you have write access (it can be your home directory).

      For OS Version

      Click this Link

      Red Hat/CentOS/Oracle 5

      Red Hat/CentOS/Oracle 5 link

      Red Hat/CentOS 6 (32-bit)

      Red Hat/CentOS 6 link (32-bit)

      Red Hat/CentOS 6 (64-bit)

      Red Hat/CentOS 6 link (64-bit)

    2. Install the RPM. For Red Hat/CentOS/Oracle 5:
      $ sudo yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm

      For Red Hat/CentOS 6 (32-bit):

      $ sudo yum --nogpgcheck localinstall cloudera-cdh-4-0.i386.rpm

      For Red Hat/CentOS 6 (64-bit):

      $ sudo yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm
      Note:

    For instructions on how to add a CDH4 yum repository or build your own CDH4 yum repository, see the topic Installing CDH4 On Red Hat-compatible systems in the CDH4 Installation Guide.

  2. (Optionally) add a repository key on each system in the cluster. Add the Cloudera Public GPG Key to your repository by executing one of the following commands:
    • For Red Hat/CentOS/Oracle 5 systems:
    $ sudo rpm --import http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera 
    • For Red Hat/CentOS 6 systems:
    $ sudo rpm --import http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera 

On SUSE systems:

  1. Download the CDH4 Package:
    1. Click this link, choose Save File, and save it to a directory to which you have write access (it can be your home directory).
    2. Install the RPM:
      $ sudo rpm -i cloudera-cdh-4-0.x86_64.rpm
      Note:

    For instructions on how to add a repository or build your own repository, see the topic on Installing CDH4 on SUSE Systems in the CDH4 Installation Guide.

  2. Update your system package index by running:
    $ sudo zypper refresh
  3. (Optionally) add a repository key on each system in the cluster. Add the Cloudera Public GPG Key to your repository by executing the following command:
  • For all SLES systems:
$ sudo rpm --import http://archive.cloudera.com/cdh4/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera  

On Ubuntu and Debian systems:

  1. Download the CDH4 Package:
    1. Click one of the following: this link for a Squeeze system, or this link for a Lucid system this link for a Precise system.
    2. Install the package. Do one of the following: Choose Open with in the download window to use the package manager, or Choose Save File, save the package to a directory to which you have write access (it can be your home directory) and install it from the command line, for example:
      sudo dpkg -i cdh4-repository_1.0_all.deb
      Note:

    For instructions on how to add a repository or build your own repository, see the topic on Installing CDH4 on Ubuntu Systems in the CDH4 Installation Guide.

    (Optionally) add a repository key on each system in the cluster. Add the Cloudera Public GPG Key to your repository by executing one of the following commands:

  • For Ubuntu Lucid systems:
$ curl -s http://archive.cloudera.com/cdh4/ubuntu/lucid/amd64/cdh/archive.key | sudo apt-key add -
  • For Ubuntu Precise systems:
$ curl -s http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/archive.key | sudo apt-key add -
  • For Debian Squeeze systems:
$ curl -s http://archive.cloudera.com/cdh4/debian/squeeze/amd64/cdh/archive.key | sudo apt-key add -

Step 6: Re-Install HDFS, MapReduce, and the CDH4 Components

  • Use one of the following commands to install CDH4 packages on every host in your cluster: On Red Hat/CentOS/Oracle systems:
$ sudo yum -y install bigtop-utils bigtop-jsvc bigtop-tomcat hadoop hadoop-hdfs hadoop-httpfs hadoop-mapreduce hadoop-yarn hadoop-client hadoop-0.20-mapreduce hbase hive oozie oozie-client pig zookeeper mahout

On SUSE systems:

$ sudo zypper install bigtop-utils bigtop-jsvc bigtop-tomcat hadoop hadoop-hdfs hadoop-httpfs hadoop-mapreduce hadoop-yarn hadoop-client hadoop-0.20-mapreduce hbase hive oozie oozie-client pig zookeeper mahout

On Debian/Ubuntu systems:

$ sudo apt-get install bigtop-utils bigtop-jsvc bigtop-tomcat hadoop hadoop-hdfs hadoop-httpfs hadoop-mapreduce hadoop-yarn hadoop-client hadoop-0.20-mapreduce hbase hive oozie oozie-client pig zookeeper mahout
  • To install the hue-common package and all Hue applications on the Hue machine, install the hue meta-package.
  Important:

If you used the Hue Authorization Manager with CDH3, you must remove the hue-userman package, and disable or remove the Authorization Manager repository before installing the new version of Hue. The repository is the one you installed when you configured the Authorization Manager in Hue. For example, on a Red Hat system, the repository file is /etc/yum.repos.d/cloudera-authman.repo by default. Either remove this file or add a line that reads enabled=0 (or, if there is already a line that reads enabled=1, change the 1 to a 0).

To install the hue meta-package on Red Hat/CentOS/Oracle systems:

$ sudo yum install hue 

To install the hue meta-package on SUSE systems:

$ sudo zypper install hue 

To install the hue meta-package on Debian/Ubuntu systems:

$ sudo apt-get install hue 
  • If you moved /usr/lib/oozie/libext to a temporary location in Step 1, copy its contents (not the directory itself) back to the new /usr/lib/oozie/libext now.

Step 7: Disable Start on Boot for Hue, Oozie, and HttpFS:

  • To prevent Hue from starting on the Hue machine:
  Note:

Preventing Hue from starting is only required on CDH3u5 and earlier. If you are running newer versions of CDH3u5 or later or CDH4, you do not need to stop Hue from starting. While stopping Hue from starting is not required in those cases, executing the chkconfig command generates an error, but there are no other negative effects.

$ sudo /sbin/chkconfig hue off 
  • To prevent Oozie from starting on system boot on every machine on which it is installed. do the following:
    OS Command

    RHEL and SLES

    $ sudo /sbin/chkconfig oozie off

    Ubuntu or Debian

    $ sudo /usr/sbin/update-rc.d oozie disable
  • To prevent HttpFS from starting on system boot:
    OS Command

    RHEL and SLES

    $ sudo /sbin/chkconfig hadoop-httpfs off
    $ sudo service hadoop-httpfs stop

    Ubuntu or Debian

    $ sudo /usr/sbin/update-rc.d hadoop-httpfs disable
    $ sudo service hadoop-httpfs stop

Step 8: Upgrade the HDFS Metadata and the Cluster Configuration

  Important:
  • If you have not already backed up your configuration data, do so now.
  • Before you proceed, click on the Hosts tab in Cloudera Manager and make sure that all hosts are up and running CDH4.

To upgrade the cluster

  1. In the Cloudera Manager Admin Console, click the Services tab, click Actions and click Upgrade Cluster.
      Important: If you are using Hive, DO NOT elect to have the upgrade wizard start the services – you must upgrade your Hive metastore before you restart Hive.
  2. Click Upgrade Cluster to confirm you want to upgrade the cluster.
      Note: If you are already on the Services page, the Upgrade Cluster may not be available. If this occurs, refresh the page.

    If you are not using Hive, you can elect to have this process start your services and deploy the client configuration as part of the upgrade.

      Important: If you are using Hive, DO NOT elect to have the upgrade wizard start the services – you must upgrade your Hive metastore before you restart Hive.

    The upgrade process will execute the commands to stop your services, convert your configuration parameters, upgrade your HDFS metadata and the Oozie database and ShareLib. Cloudera Manager updates the configuration, upgrades HDFS metadata, and upgrades the Oozie database. Upgrading CDH from CDH3 or CDH4 Beta 1 requires these changes.

Step 9. Upgrade your Hive Metastore

If you are upgrading from CDH4.2 to CDH4.3 or later, you do not need to perform this step. If you are upgrading from an earlier version of CDH to CDH4.2 or later, you DO need to do this.

  1. (Strongly recommended) Make a backup copy of your Hive metastore database.
  2. Run the metastore upgrade script. The script you run depends on whether you are upgrading to parcels or packages.
    • If you are upgrading to packages, the upgrade script is at /usr/lib/hive/scripts/metastore/upgrade/
    • If you are upgrading to parcels, then the upgrade script is located at /opt/cloudera/parcels/<parcel_name>/lib/hive/scripts/metastore/upgrade/<database>.

      <parcel_name> should be the name of the parcel to which you have upgraded.

      <database> is the type of database you are running (i.e. mysql, postgres, etc.)

      For example, if you are installing a CDH4.2.0 parcel using the default location for the local repository, and using the default database (PostgreSQL) the script will be at: /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10-e16.parcel/lib/hive/scripts/metastore/upgrade/postgres
    • You must cd to the directory the scripts are in.
    • Execute the script in the appropriate DB command shell. Note that there are multiple scripts in each directory. You nust run the one that corresponds to the versions of Hive you are upgrading between. For example, if you are upgrading with MySQL from Hive 0.9 to 0.10, the command would be similar to:
      mysql -u hive1 -phive1 hive1 < upgrade-0.9.0-to-0.10.0.mysql.sql

      (with the appropriate substitutions for username, etc.).

      If your upgrade spans multiple versions of Hive (for example, upgrading from Hive 0.8 to Hive 0.10) you must run all the relevant scripts in the proper order.

      Important:

    You must know the password for the Hive metastore database; if you installed Cloudera Manager using the default (embedded PostgreSQL) database, the password was displayed on the Database Setup page during the Cloudera Manager installation wizard. If you do not know the password for your Hive metastore database, you can find it as follows:

    • cat /etc/cloudera-scm-server/db.properties This shows you Cloudera Manager's internal database credentials.
    • Run the following command:
      psql -p 7432 -U cm cm -c "select s.display_name as hive_service_name, s.name as  hive_internal_name, c.value as metastore_password from CONFIGS c, SERVICES s where attr='hive_metastore_database_password' and  c.service_id = s.service_id"
    • Use the password from com.cloudera.cmf.db.password. This will output the passwords for the hive service metastore as follows:
       hive_service_name | hive_internal_name | metastore_password
      -------------------+--------------------+--------------------
       hive1             | hive1              | lF3Cv2zsvI
      (1 row)
  3. If you have multiple instances of Hive, run the upgrade script(s) on each metastore database.

Step 10. Restart Stopped Services

  1. In the Cloudera Manager Admin Console, click the Services tab.
  2. Click the top Actions button that corresponds to the cluster and choose Start. The Command Details window shows the progress of starting services. When All services successfully started appears, the task is complete and you may close the Command Details window.

Step 11: Redeploy the Client Configuration Files

  1. From the top Actions button that corresponds to the cluster and choose Deploy Client Configuration....
  2. Click the Deploy Client Configuration button in the confirmation pop-up that appears.

Step 12: Finalize the HDFS Metadata Upgrade

After ensuring that the CDH4 upgrade has succeeded and that everything is running smoothly, finalize the HDFS metadata upgrade. It is not unusual to wait days or even weeks before finalizing the upgrade. To finalize the HDFS metadata upgrade

  1. In the Cloudera Manager Admin Console, pull down the Services tab and go to the HDFS service.
  2. Go the the Instances tab and click on the NameNode instance.
  3. From the NameNode Status page, from the Actions menu click Finalize Metadata Upgrade.
  4. Click Finalize Metadata Upgrade to confirm you want to complete this process.

    Cloudera Manager finalizes the metadata upgrade. The upgrade is now complete.