This is the documentation for Cloudera Manager 5.1.x.
Documentation for other versions is available at Cloudera Documentation.

Upgrading CDH 4 Using Packages

Required Role:

If you originally used Cloudera Manager to install your CDH service using packages, you can upgrade to a new version of CDH 4 either using packages or parcels. Parcels is the preferred and recommended way to upgrade, as the upgrade wizard provided for parcels handles the upgrade process almost completely automatically. However, if you wish to continue to use packages, you can perform an upgrade following the instructions presented here.

To upgrade your version of CDH using packages, the steps are as follows.

  1. Before You Begin
  2. Upgrading Unmanaged Components
  3. Upgrade Managed Components
  4. Upgrade the Hive Metastore Database
  5. Upgrade the Oozie ShareLib
  6. Upgrade Sqoop
  7. Restart the Services
  8. Configure Cluster CDH Version for Package Installs
  9. Deploy the New Client Configuration Files

Before You Begin

  • Read the Cloudera Manager 5 Release Notes.
  • Make sure there are no Oozie workflows in RUNNING or SUSPENDED status; otherwise the Oozie database upgrade will fail and you will have to reinstall CDH 4 to complete or kill those running workflows.
  • Run the Host Inspector and fix every issue.
  • If using security, run the Security Inspector.
  • Run hdfs fsck / and hdfs dfsadmin -report and fix any issues.
  • If using HBase:
    • Run hbase hbck to make sure there are no inconsistencies.
    • Before you can upgrade HBase from CDH 4 to CDH 5, your HFiles must be upgraded from HFile v1 format to HFile v2, because CDH 5 no longer supports HFile v1. The upgrade procedure itself is different if you are using Cloudera Manager or the command line, but has the same results. The first step is to check for instances of HFile v1 in the HFiles and mark them to be upgraded to HFile v2, and to check for and report about corrupted files or files with unknown versions, which need to be removed manually. The next step is to rewrite the HFiles during the next major compaction. After the HFiles are upgraded, you can continue the upgrade. To check and upgrade the files:
      1. In the Cloudera Admin Console, go to the HBase service and run Actions > Check HFile Version.
      2. Check the output of the command in the stderr log.
        Your output should be similar to the following:
        Tables Processed:
        hdfs://localhost:41020/myHBase/.META.
        hdfs://localhost:41020/myHBase/usertable
        hdfs://localhost:41020/myHBase/TestTable
        hdfs://localhost:41020/myHBase/t
        
        Count of HFileV1: 2
        HFileV1:
        hdfs://localhost:41020/myHBase/usertable /fa02dac1f38d03577bd0f7e666f12812/family/249450144068442524
        hdfs://localhost:41020/myHBase/usertable /ecdd3eaee2d2fcf8184ac025555bb2af/family/249450144068442512
        
        Count of corrupted files: 1
        Corrupted Files:
        hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/family/1
        Count of Regions with HFileV1: 2
        Regions to Major Compact:
        hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812
        hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af
        In the example above, you can see that the script has detected two HFile v1 files, one corrupt file and the regions to major compact.
      3. Trigger a major compaction on each of the reported regions. This major compaction rewrites the files from HFile v1 to HFile v2 format. To run the major compaction, start HBase Shell and issue the major_compact command.
        $ bin/hbase shell
        hbase> major_compact 'usertable'
        You can also do this in a single step by using the echo shell built-in command.
        $ echo "major_compact 'usertable'" | bin/hbase shell
  • Review the upgrade procedure and reserve a maintenance window with enough time allotted to perform all steps. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.
  • To avoid generating many alerts during the upgrade process, you can enable maintenance mode on your cluster before you start the upgrade. Be sure to exit maintenance mode when you have finished the upgrade, in order to re-enable Cloudera Manager alerts.

Upgrading Unmanaged Components

Upgrading unmanaged components is a process that is separate from upgrading managed components. Upgrade the unmanaged components before proceeding to upgrade managed components. Components that you might have installed that are not managed by Cloudera Manager include:

  • Pig
  • Whirr
  • Mahout

For information on upgrading these unmanaged components, see CDH 4 Installation Guide.

Upgrade Managed Components

Use one of the following strategies to upgrade CDH 4:
  • Use your operating system's package management tools to update all packages to the latest version using standard repositories. This approach works well because it minimizes the amount of configuration required and uses the simplest commands. Be aware that this can take a considerable amount of time if you have not upgraded the system recently. To update all packages on your system, use the following command:
    Operating System Command

    RHEL

    $ sudo yum update

    SLES

    $ sudo zypper up

    Ubuntu or Debian

    $ sudo apt-get upgrade
  • Use the cloudera.com repository that is added during a typical installation, only updating Cloudera components. This limits the scope of updates to be completed, so the process takes less time, however this process will not work if you created and used a custom repository. To install the new version, you can upgrade from Cloudera's repository by adding an entry to your operating system's package management configuration file. The repository location varies by operating system:
    Operating System Configuration File Repository Entry
    Red Hat http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/4/
    SLES http://archive.cloudera.com/cdh4/sles/11/x86_64/cdh/4/
    Debian Squeeze [arch=amd64] http://archive.cloudera.com/cdh4/debian/squeeze squeeze-cdh4 contrib
    Ubuntu Lucid [arch=amd64] http://archive.cloudera.com/cdh4/ubuntu/lucid/amd64/cdh lucid-cdh4 contrib
    Ubuntu Precise [arch=amd64] http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh precise-cdh4 contrib

    For example, under Red Hat, to upgrade from Cloudera's repository you can run commands such as the following on the CDH host to update only CDH:

    $ sudo yum clean all
    $ sudo yum update 'cloudera-*'  
      Note:
    • cloudera-cdh4 is the name of the repository on your system; the name is usually in square brackets on the first line of the repo file, in this example /etc/yum.repos.d/cloudera-cdh4.repo:
      [chris@ca727 yum.repos.d]$ more cloudera-cdh4.repo
      [cloudera-cdh4]
      ...
    • yum clean all cleans up yum's cache directories, ensuring that you download and install the latest versions of the packages. – If your system is not up to date, and any underlying system components need to be upgraded before this yum update can succeed, yum will tell you what those are.

    On a SLES system, use commands like this to clean cached repository information and then update only the CDH components. For example:

    $ sudo zypper clean --all
    $ sudo zypper up -r http://archive.cloudera.com/cdh4/sles/11/x86_64/cdh/4

    To verify the URL, open the Cloudera repo file in /etc/zypp/repos.d on your system (for example /etc/zypp/repos.d/cloudera-cdh4.repo) and look at the line beginning

    baseurl=

    Use that URL in your sudo zypper up -r command.

    On a Debian/Ubuntu system, use commands like this to clean cached repository information and then update only the CDH components. First:

    $ sudo apt-get clean

    After cleaning the cache, use one of the following upgrade commands to upgrade CDH.

    Precise:

    $ sudo apt-get upgrade -t precise-cdh4

    Lucid:

    $ sudo apt-get upgrade -t lucid-cdh4

    Squeeze:

    $ sudo apt-get upgrade -t squeeze-cdh4
  • Use a custom repository. This process can be more complicated, but enables updating CDH components for hosts that are not connected to the Internet. You can create your own repository, as described in Understanding Custom Installation Solutions. Creating your own repository is necessary if you are upgrading a cluster that does not have access to the Internet.

    If you used a custom repository to complete the installation of your current files and now you want to update using a custom repository, the details of the steps to complete the process are variable. In general, begin by updating any existing custom repository that you will use with the installation files you wish to use. This can be completed in a variety of ways. For example, you might use wget to copy the necessary installation files. Once the installation files have been updated, use the custom repository you established for the initial installation to update CDH.

    OS Command
    RHEL Ensure you have a custom repo that is configured to use your internal repository. For example, if you could have custom repo file in /etc/yum.conf.d/ called cdh_custom.repo in which you specified a local repository. In such a case, you might use the following commands:
    $ sudo yum clean all
    $ sudo yum update 'cloudera-*'  
    SLES Use commands such as the following to clean cached repository information and then update only the CDH components:
    $ sudo zypper clean --all
    $ sudo zypper up -r http://internalserver.example.com/path_to_cdh_repo
    Ubuntu or Debian Use a command that targets upgrade of your CDH distribution using the custom repository specified in your apt configuration files. These files are typically either the /etc/apt/apt.conf file or in various files in the /etc/apt/apt.conf.d/ directory. Information about your custom repository must be included in the repo files. The general form of entries in Debian/Ubuntu is:
    deb http://server.example.com/directory/ dist-name pool

    For example, the entry for the default repo is:

    deb http://us.archive.ubuntu.com/ubuntu/ precise universe

    On a Debian/Ubuntu system, use commands such as the following to clean cached repository information and then update only the CDH components:

    $ sudo apt-get clean
    $ sudo apt-get upgrade -t your_cdh_repo

Upgrade the Hive Metastore Database

Required if you are upgrading from an earlier version of CDH 4 to CDH 4.2 or later.

  1. Make a backup copy of your Hive metastore database.
  2. Go to the Hive service.
  3. Select Actions > Stop and click Stop to confirm.
  4. Select Actions > Upgrade Hive Metastore Database Schema and click Upgrade Hive Metastore Database Schema to confirm.
  5. If you have multiple instances of Hive, perform the upgrade on each metastore database.

Upgrade the Oozie ShareLib

  1. Go to the Oozie service.
  2. Select Actions > Stop and click Stop to confirm.
  3. Select Actions > Install Oozie ShareLib and click Install Oozie ShareLib to confirm.
  4. When the command completes, click Close.

Upgrade Sqoop

  1. Go to the Sqoop service.
  2. Select Actions > Stop and click Stop to confirm.
  3. Select Actions > Upgrade Sqoop and click Upgrade Sqoop to confirm.
  4. When the command completes, click Close.

Restart the Services

  1. On the Home page, click to the right of the cluster name and select Restart.
  2. Click the Restart button in the confirmation pop-up that appears. The Command Details window shows the progress of starting services.

Configure Cluster CDH Version for Package Installs

If you have installed CDH as a package, after an install or upgrade make sure that the cluster CDH version matches the package CDH version, using the procedure in Configuring the CDH Version for a Cluster in Managing Clusters with Cloudera Manager. If the cluster CDH version does not match the package CDH version, Cloudera Manager will incorrectly enable and disable service features based on the cluster's configured CDH version.

Deploy the New Client Configuration Files

  1. On the Home page, click to the right of the cluster name and select Deploy Client Configuration.
  2. Click the Deploy Client Configuration button in the confirmation pop-up that appears.