Upgrading HBase

Coprocessors and Custom JARs

When upgrading HBase from one major version to another (such as upgrading from CDH 4 to CDH 5), you must recompile coprocessors and custom JARs after the upgrade.

Never rely on HBase directory layout on disk.

The HBase directory layout is an implementation detail and is subject to change. Do not rely on the directory layout for client or administration functionality. Instead, access HBase using the supported APIs.

Upgrading HBase from CDH 4 to CDH 5

CDH 5.0 HBase is based on Apache HBase 0.96.1.1 Remember that once a cluster has been upgraded to CDH 5, it cannot be reverted to CDH 4. To ensure a smooth upgrade, this section guides you through the steps involved in upgrading HBase from the older CDH 4.x releases to CDH 5.

These instructions also apply to upgrading HBase from CDH 4.x directly to CDH 5.1.0, which is a supported path.

Prerequisites

HDFS and ZooKeeper should be available while upgrading HBase.

Overview of Upgrade Procedure

Before you can upgrade HBase from CDH 4 to CDH 5, your HFiles must be upgraded from HFile v1 format to HFile v2, because CDH 5 no longer supports HFile v1. The upgrade procedure itself is different if you are using Cloudera Manager or the command line, but has the same results. The first step is to check for instances of HFile v1 in the HFiles and mark them to be upgraded to HFile v2, and to check for and report about corrupted files or files with unknown versions, which need to be removed manually. The next step is to rewrite the HFiles during the next major compaction. After the HFiles are upgraded, you can continue the upgrade. After the upgrade is complete, you must recompile custom coprocessors and JARs.

Upgrade HBase Using the Command Line

CDH 5 comes with an upgrade script for HBase. You can run /usr/lib/hbase/bin/hbase --upgrade to see its Help section. The script runs in two modes: -check and -execute.

Step 1: Check for HFile v1 files and compact if necessary

  1. Run the upgrade command in -check mode, and examine the output.
    $ /usr/lib/hbase/bin/hbase upgrade -check
    Your output should be similar to the following:
    Tables Processed:
    hdfs://localhost:41020/myHBase/.META.
    hdfs://localhost:41020/myHBase/usertable
    hdfs://localhost:41020/myHBase/TestTable
    hdfs://localhost:41020/myHBase/t
    
    Count of HFileV1: 2
    HFileV1:
    hdfs://localhost:41020/myHBase/usertable /fa02dac1f38d03577bd0f7e666f12812/family/249450144068442524
    hdfs://localhost:41020/myHBase/usertable /ecdd3eaee2d2fcf8184ac025555bb2af/family/249450144068442512
    
    Count of corrupted files: 1
    Corrupted Files:
    hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/family/1
    Count of Regions with HFileV1: 2
    Regions to Major Compact:
    hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812
    hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af
    In the example above, you can see that the script has detected two HFile v1 files, one corrupt file and the regions to major compact.

    By default, the script scans the root directory, as defined by hbase.rootdir. To scan a specific directory, use the --dir option. For example, the following command scans the /myHBase/testTable directory.

    /usr/lib/hbase/bin/hbase upgrade --check --dir /myHBase/testTable
  2. Trigger a major compaction on each of the reported regions. This major compaction rewrites the files from HFile v1 to HFile v2 format. To run the major compaction, start HBase Shell and issue the major_compact command.
    $ /usr/lib/hbase/bin/hbase shell
    hbase> major_compact 'usertable'
    You can also do this in a single step by using the echo shell built-in command.
    $ echo "major_compact 'usertable'" | /usr/lib/hbase/bin/hbase shell
  3. Once all the HFileV1 files have been rewritten, running the upgrade script with the -check option again will return a "No HFile v1 found" message. It is then safe to proceed with the upgrade.

Step 2: Gracefully shut down CDH 4 HBase cluster

Shut down your CDH 4 HBase cluster before you run the upgrade script in -execute mode.

To shut down HBase gracefully:

  1. Stop the REST and Thrift server and clients, then stop the cluster.
    1. Stop the Thrift server and clients:
      sudo service hbase-thrift stop
      Stop the REST server:
      sudo service hbase-rest stop
    2. Stop the cluster by shutting down the master and the RegionServers:
      1. Use the following command on the master node:
        sudo service hbase-master stop
      2. Use the following command on each node hosting a RegionServer:
        sudo service hbase-regionserver stop
  2. Stop the ZooKeeper Server:
    $ sudo service zookeeper-server stop

Step 3: Uninstall the old version of HBase and replace it with the new version.

  1. To remove HBase on Red-Hat-compatible systems:
    $ sudo yum remove hadoop-hbase

    To remove HBase on SLES systems:

    $ sudo zypper remove hadoop-hbase

    To remove HBase on Ubuntu and Debian systems:

    $ sudo apt-get purge hadoop-hbase
    CAUTION:
    On Ubuntu systems, make sure you remove HBase before removing ZooKeeper; otherwise your HBase configuration will be deleted. This is because hadoop-hbase depends on hadoop-zookeeper, and so purging hadoop-zookeeper will purge hadoop-hbase.
  2. Follow the instructions for installing the new version of HBase at HBase Installation.

Step 4: Run the HBase upgrade script in -execute mode

This step executes the actual upgrade process. It has a verification step which checks whether or not the Master, RegionServer and backup Master znodes have expired. If not, the upgrade is aborted. This ensures no upgrade occurs while an HBase process is still running. If your upgrade is aborted even after shutting down the HBase cluster, retry after some time to let the znodes expire. Default znode expiry time is 300 seconds.

As mentioned earlier, ZooKeeper and HDFS should be available. If ZooKeeper is managed by HBase, then use the following command to start ZooKeeper.

/usr/lib/hbase/bin/hbase-daemon.sh start zookeeper

The upgrade involves three steps:

  • Upgrade Namespace: This step upgrades the directory layout of HBase files.
  • Upgrade Znodes: This step upgrades /hbase/replication (znodes corresponding to peers, log queues and so on) and table znodes (keep table enable/disable information). It deletes other znodes.
  • Log Splitting: In case the shutdown was not clean, there might be some Write Ahead Logs (WALs) to split. This step does the log splitting of such WAL files. It is executed in a “non distributed mode”, which could make the upgrade process longer in case there are too many logs to split. To expedite the upgrade, ensure you have completed a clean shutdown.
Run the upgrade command in -execute mode.
$ /usr/lib/hbase/bin/hbase upgrade -execute

Your output should be similar to the following:

Starting Namespace upgrade
Created version file at hdfs://localhost:41020/myHBase with version=7
Migrating table testTable to hdfs://localhost:41020/myHBase/.data/default/testTable
…..
Created version file at hdfs://localhost:41020/myHBase with version=8
Successfully completed NameSpace upgrade.
Starting Znode upgrade
….
Successfully completed Znode upgrade
Starting Log splitting
…
Successfully completed Log splitting

The output of the -execute command can either return a success message as in the example above, or, in case of a clean shutdown where no log splitting is required, the command would return a "No log directories to split, returning" message. Either of those messages indicates your upgrade was successful.

Step 5 (Optional): Move Tables to Namespaces

CDH 5 introduces namespaces for HBase tables. As a result of the upgrade, all tables are automatically assigned to namespaces. The root, meta, and acl tables are added to the hbase system namespace. All other tables are assigned to the default namespace.

To move a table to a different namespace, take a snapshot of the table and clone it to the new namespace. After the upgrade, do the snapshot and clone operations before turning the modified application back on.

Step 6: Recompile coprocessors and custom JARs.

Recompile any coprocessors and custom JARs, so that they will work with the new version of HBase.

FAQ

In order to prevent upgrade failures because of unexpired znodes, is there a way to check/force this before an upgrade?

The upgrade script "executes" the upgrade when it is run with the -execute option. As part of the first step, it checks for any live HBase processes (RegionServer, Master and backup Master), by looking at their znodes. If any such znode is still up, it aborts the upgrade and prompts the user to stop such processes, and wait until their znodes have expired. This can be considered an inbuilt check.

The -check option has a different use case: To check for HFile v1 files. This option is to be run on live CDH 4 clusters to detect HFile v1 and major compact any regions with such files.

What are the steps for Cloudera Manager to do the upgrade?

See Upgrade to CDH 5 for instructions on upgrading HBase within a Cloudera Manager deployment.

Upgrading HBase from a Lower CDH 5 Release

To upgrade HBase from a lower CDH 5 release, proceed as follows.

The instructions that follow assume that you are upgrading HBase as part of an upgrade to the latest CDH 5 release, and have already performed the steps underUpgrading from an Earlier CDH 5 Release to the Latest Release.

During a rolling upgrade from CDH 5.0.x to CDH 5.4.x the HBase Master UI will display the URLs to the old HBase RegionServers using an incorrect info port number. Once the rolling upgrade completes the HBase master UI will use the correct port number.

Step 1: Perform a Graceful Cluster Shutdown

To shut HBase down gracefully:

  1. Stop the Thrift server and clients, then stop the cluster.
    1. Stop the Thrift server and clients:
      sudo service hbase-thrift stop
    2. Stop the cluster by shutting down the master and the RegionServers:
      • Use the following command on the master node:
        sudo service hbase-master stop
      • Use the following command on each node hosting a RegionServer:
        sudo service hbase-regionserver stop
  2. Stop the ZooKeeper Server:
    $ sudo service zookeeper-server stop

Step 2: Install the new version of HBase

To install the new version of HBase, follow directions in the next section, HBase Installation.