Before You Begin Upgrading to CDH 5 Using the Command Line

Before upgrading, be sure to read about the latest Incompatible Changes and Known Issues in CDH 5 in the CDH 5 Release Notes. If you are currently running MRv1, you should read CDH 5 and MapReduce before proceeding.

Plan Downtime

If you are upgrading a cluster that is part of a production system, be sure to plan ahead. As with any operational work, be sure to reserve a maintenance window with enough extra time allotted in case of complications. The Hadoop upgrade process is well understood, but it is best to be cautious. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.

Install Java 1.7

CDH 5 requires Java 1.7 or higher. See Upgrading to Oracle JDK 1.7, and make sure you have read the Install and Upgrade Known Issues before you proceed with the upgrade.

Delete Symbolic Links in HDFS

If there are symbolic links in HDFS when you upgrade from CDH 4 to CDH 5, the upgrade will fail and you will have to downgrade to CDH 4, delete the symbolic links, and start over. To prevent this, proceed as follows.

To check for symbolic links in CDH 4 HDFS:
  1. cd to the directory on the NameNode that contains the latest fsimage The location of this directory is specified as the value of dfs.namenode.name.dir (or dfs.name.dir) in hdfs-site.xml.
  2. Use a command such as the following to write out the path names in the fsimage:
    $ hdfs oiv -i FSIMAGE -o /tmp/YYYY-MM-DD_FSIMAGE.txt
  3. Use a command such as the following to find the path names of any symbolic links listed in /tmp/YYYY-MM-DD_FSIMAGE.txt and write them out to the file /tmp/symlinks.txt:
    $ grep -- "->" /tmp/YYYY-MM-DD_FSIMAGE.txt > /tmp/symlinks.txt
  4. Delete any symbolic links listed in /tmp/symlinks.txt.

Check Hue Table Sizes and Cleanup if Necessary

When upgrading from CDH 4 to CDH 5, Hue upgrade can take a very long time if the beeswax_queryhistory, beeswax_savedquery, and oozie_job tables are larger than 1000 records. You can reduce the upgrade time by running a script to reduce the size of the Hue database:
  1. Stop the Hue service.
  2. Back up the Hue database.
  3. Download the history cleanup script to the host running the Hue Server.
  4. Run the following as root:
    • parcel installation
      export HUE_CONF_DIR="/var/run/cloudera-scm-agent/process/`ls -1 /var/run/cloudera-scm-agent/process | grep HUE| sort -n | tail -1 `"
      /opt/cloudera/parcels/CDH/share/hue/build/env/bin/hue shell
    • package installation
      export HUE_CONF_DIR="/var/run/cloudera-scm-agent/process/`ls -1 /var/run/cloudera-scm-agent/process | grep HUE| sort -n | tail -1 `"
      /usr/share/hue/build/env/bin/hue shell
  5. Run the downloaded script in the Hue shell.

Considerations for Secure Clusters

If you are upgrading a cluster that has Kerberos security enabled, you must do the following:

  • Before starting the upgrade, make sure your installation is properly configured according to the instructions in the installation and configuration sections of the Cloudera Security guide.
  • Before shutting down Hadoop services, put the NameNode into safe mode and perform a saveNamespace operation; see the instructions on backing up the metadata.

High Availability

In CDH 5 you can configure high availability both for the NameNode and the JobTracker or Resource Manager.