This is the documentation for Cloudera Manager 4.8.5.
Documentation for other versions is available at Cloudera Documentation.

Upgrading CDH3 in a Cloudera Manager Deployment

  Important: Cloudera Manager version 3.x and CDH3 have reached End of Maintenance (EOM) as of June 20, 2013. Cloudera will not support or provide patches for any of the Cloudera Manager version 3.x and CDH3 releases. Even though Cloudera Manager 4.x will continue to support CDH3, it is strongly recommended that you upgrade to CDH4. See Upgrading existing installations of CDH3 to CDH4 for more details.

Before You Begin

  Important:

Before upgrading, be sure to read about the latest Incompatible Changes and Known Issues and Work Arounds in the CDH3 Release Notes.

  Note:

If you are upgrading a cluster that is part of a production system, be sure to plan ahead. As with any operational work, be sure to reserve a maintenance window with enough extra time allotted in case of complications. The Hadoop upgrade process is well understood, but it is best to be cautious. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.

Upgrading Unmanaged Components

Upgrading unmanaged components is a process that is separate from upgrading managed components. Upgrade the unmanaged components before proceeding to upgrade managed components. For example, if you have unmanaged Flume installed, upgrade that before proceeding to upgrade managed components. Components that you might have installed that are not managed by Cloudera Manager include:

  • Flume 0.9.x
  • Flume 1.x
  • Sqoop
  • Pig
  • Hive
  • Whirr
  • Mahout

For information on upgrading these unmanaged components, see CDH3 Installation Guide

Step 1. Stop all the CDH Services on All Hosts

You must stop all Hadoop services before upgrading CDH.

To stop all services

  1. In the Cloudera Manager Admin Console, select Services > All Services.
  2. Click the top Actions button that corresponds to the cluster and choose Stop.... Click Stop in the confirmation screen.

    The Command Details window shows the progress of stopping services.

    When All services successfully stopped appears, the task is complete and you may close the Command Details window.

  3. For each Cloudera Management Service entry, click Actions and click Stop.... Click Stop in the confirmation screen.

    The Command Details window shows the progress of stopping services.

    When All services successfully stopped appears, the task is complete and you may close the Command Details window.

Repeat this process for all clusters hosting CDH3 machines to be upgraded.

Step 2. Back up the HDFS Metadata on the NameNode

  Important:

Do the following when you are sure that all Hadoop services have been shut down. It is particularly important that the NameNode service is not running so that you can make a consistent backup.

  Note: Cloudera recommends backing up HDFS metadata on a regular basis, as well as before a major upgrade.
  1. On the Services page of Cloudera Manager, click the HDFS service, then the Configuration tab. Navigate to the NameNode category and find NameNode Data Directories.
  2. From the command line on the NameNode machine, back up all the directories listed in that property; for example, if the data directory is /mnt/hadoop/hdfs/name, do the following as root:
    # cd /mnt/hadoop/hdfs/name
    # tar -cvf /root/nn_backup_data.tar .

    You should see output like this:

    ./
    ./current/
    ./current/fsimage
    ./current/fstime
    ./current/VERSION
    ./current/edits
    ./image/
    ./image/fsimage
  3. Check the output.
      Warning:

    If you see a file containing the word lock, the NameNode is probably still running. Repeat the preceding steps, starting by shutting down the Hadoop services.

Step 3. Upgrade Managed Components

There are a variety of strategies that you can use to upgrade to the latest version of CDH3.

  • You can use your operating system's package management tools to update all packages to the latest version using standard repositories. This approach works well because it minimizes the amount of configuration required and uses the simplest commands. Be aware that this can take a considerable amount of time if you have not upgraded the system recently.
  • You can target the cloudera.com repository that is added during a typical install, only updating Cloudera components. This limits the scope of updates to be completed, so the process takes less time. This will not work if you created and used a custom repository.
  • You can use a custom repository. This process can be more complicated, but enables updating Cloudera components for CDH machines that are not connected to the Internet.

Updating Everything

You can update all components on your system, including Cloudera components. Note that this may take a significant amount of time. To update all packages on your system, use the following command:

Operating System Command

RHEL

$ sudo yum update

SLES

$ sudo zypper up

Ubuntu or Debian

$ sudo apt-get upgrade

Once you complete the process of updating all components, proceed to Step 4. Start the Services you Stopped.

Updating Cloudera Components Using Default Repositories

To install the new version, you can upgrade from Cloudera's repository by adding an entry to your operating system's package management configuration file. The repository location varies by operating system.

Operating System Configuration File Repository Entry

Red Hat

http://archive.cloudera.com/redhat/cdh/3/

SLES

http://archive.cloudera.com/sles/11/x86_64/cdh/3/

Debian Squeeze

deb http://archive.cloudera.com/debian/

squeeze-cdh3 contrib

Ubuntu Lucid

deb http://archive.cloudera.com/debian/

lucid-cdh3 contrib

Ubuntu Maverick

deb http://archive.cloudera.com/debian/

maverick-cdh3 contrib

For example, under Red Hat, to upgrade from Cloudera's repository you can run commands such as the following on the CDH host to update only CDH:

$ sudo yum clean all
$ sudo yum update 'cloudera-*'  
  Note:
– cloudera-cdh3 is the name of the repository on your system; the name is usually in square brackets on the first line of the repo file, in this example /etc/yum.repos.d/cloudera-cdh3.repo:
[chris@ca727 yum.repos.d]$ more cloudera-cdh3.repo
[cloudera-cdh3]
...

yum clean all cleans up yum's cache directories, ensuring that you download and install the latest versions of the packages. – If your system is not up to date, and any underlying system components need to be upgraded before this yum update can succeed, yum will tell you what those are.

On a SLES system, use commands like this to clean cached repository information and then update only the CDH components. For example:

$ sudo zypper clean --all
$ sudo zypper up -r http://archive.cloudera.com/sles/11/x86_64/cdh/

The apt configuration files specify repository information. These files are typically either the /etc/apt/apt.conf file or in various files in the /etc/apt/apt.conf.d/ directory. Review the contents of that file to find the Cloudera repository.

On a Debian/Ubuntu system, use commands like this to clean cached repository information and then update only the CDH components. First:

$ sudo apt-get clean

After cleaning the cache, use one of the following upgrade commands to upgrade CDH.

Maverick:

$ sudo apt-get upgrade -t maverick-cdh3

Lucid:

$ sudo apt-get upgrade -t lucid-cdh3

Squeeze:

$ sudo apt-get upgrade -t squeeze-cdh3

At the end of this process you should have the most recent versions of the CDH packages installed on the host and you can now proceed to Step 4. Start the Services you Stopped.

Updating Cloudera Components Using Custom Repositories

You can create your own repository, as described in Appendix A - Understanding Custom Installation Solutions. Creating your own repository is necessary if you are upgrading a cluster that does not have access to the Internet.

If you used a custom repository to complete the installation of current files and now you want to update using a custom repository, the details of the steps to complete the process are variable.

In general, begin by updating any existing custom repository that you will use with the installation files you wish to use. This can be completed in a variety of ways. For example, you might use wget to copy the necessary installation files. Once the installation files have been updated, use the custom repository you established for the initial installation to update CDH.

Red Hat

On a Red Hat system ensure you have a custom repo that is configured to use your internal repository. For example, if you could have custom repo file in /etc/yum.conf.d/ called cdh_custom.repo in which you specified a local repository. In such a case, you might use the following commands:

$ sudo yum clean all
$ sudo yum update 'cloudera-*'  

SLES

On a SLES system, use commands such as the following to clean cached repository information and then update only the CDH components:

$ sudo zypper clean --all
$ sudo zypper up -r http://internalserver.example.com/path_to_cdh_repo

Debian/Ubuntu

Use a command that targets upgrade of your CDH distribution using the custom repository specified in your apt configuration files. These files are typically either the /etc/apt/apt.conf file or in various files in the /etc/apt/apt.conf.d/ directory. Information about your custom repository must be included in the repo files. The general form of entries in Debian/Ubuntu is:

deb http://server.example.com/directory/ dist-name pool

For example, the entry for the default repo is:

deb http://us.archive.ubuntu.com/ubuntu/ precise universe

On a Debian/Ubuntu system, use commands such as the following to clean cached repository information and then update only the CDH components:

$ sudo apt-get clean
$ sudo apt-get upgrade -t your_cdh_repo

Step 4. Start the Services you Stopped

You can now start the services that you stopped in Step 1. Proceed as follows:

  1. In the Cloudera Manager Admin Console, click the Services tab.
  2. Click the top Actions button that corresponds to the cluster and choose Start.

    The Command Details window shows the progress of starting services.

    When All services successfully started appears, the task is complete and you may close the Command Details window.

  3. For each Cloudera Management Service entry, click Actions and click Start. Click Start in the confirmation screen.

    The Command Details window shows the progress of stopping services.

    When All services successfully started appears, the task is complete and you may close the Command Details window.

Repeat this process for all clusters that you previously stopped.