This is the documentation for CDH 4.7.0.
Documentation for other versions is available at Cloudera Documentation.

Configuring Other CDH Components to Use HDFS HA

You can use the HDFS High Availability NameNodes with other components of CDH, including HBase, Oozie, and Hive.

Configuring HBase to Use HDFS HA

To configure HBase to use HDFS HA, proceed as follows.

Step 1: Shut Down the HBase Cluster

To shut HBase down gracefully, stop the Thrift server and clients, then stop the cluster:

  1. Stop the Thrift server and clients:
    sudo service hbase-thrift stop
  2. Stop the cluster by shutting down the master and the region servers:
    • Use the following command on the master node:
      sudo service hbase-master stop
    • Use the following command on each node hosting a region server:
      sudo service hbase-regionserver stop

Step 2: Configure hbase.rootdir

Change the distributed file system URI in hbase-site.xml to the name specified in the dfs.nameservices property in hdfs-site.xml. The clients must also have access to hdfs-site.xml's dfs.client.* settings to properly use HA.

For example, suppose the HDFS HA property dfs.nameservices is set to ha-nn in hdfs-site.xml. To configure HBase to use the HA NameNodes, specify that same value as part of your hbase-site.xml's hbase.rootdir value:

<!-- Configure HBase to use the HA NameNode nameservice -->
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://ha-nn/hbase</value>
</property>

Step 3: Clean up /hbase/splitlogs

  Note: If you fail to perform this step HBase may fail to start because it is trying to use the old copy of the namespace.

Do the following on the ZooKeeper node:

  1. Run /usr/lib/zookeeper/bin/zkCli.sh
    ls /hbase/splitlogs
  2. rmr /hbase/splitlogs
  3. If this shows any content, do:
    rmr /hbase/splitlogs

Step 4: Restart HBase

  1. Start the HBase Master
  2. Start each of the HBase Region Servers

HBase-HDFS HA Troubleshooting

Problem: HMasters fail to start.

Solution: Check for this error in the hmaster logs:

2012-05-17 12:21:28,929 FATAL master.HMaster (HMaster.java:abort(1317)) - Unhandled exception. Starting shutdown.
java.lang.IllegalArgumentException: java.net.UnknownHostException: ha-nn
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:431)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:161)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:126)
...

If so, verify that Hadoop's hdfs-site.xml and core-site.xml files are in your hbase/conf directory. This may be necessary if you put your configurations in non-standard places.

Configuring Oozie to Use HDFS HA

To configure an Oozie workflow to use HDFS HA, use the HA HDFS URI instead of the NameNode URI in the <name-node> element of the workflow.

Example:

<action name="mr-node">
  <map-reduce>
    <job-tracker>${jobTracker}</job-tracker>
    <name-node>hdfs://ha-nn

where ha-nn is the value of dfs.nameservices in hdfs-site.xml.

Upgrading the Hive Metastore to Use HDFS HA

For CDH 4.1 and later, the Hive Metastore can be configured to use HDFS High Availability.. See Hive Installation.

To configure the Hive metastore to use HDFS HA, change the records to reflect the location specified in the dfs.nameservices property, using the Hive metatool to obtain and change the locations.

  Note: Before attempting to upgrade the Hive metastore to use HDFS HA, shut down the metastore and back it up to a persistent store.

If you are unsure which version of Avro SerDe is used, use both the serdePropKey and tablePropKey arguments. For example:

$ metatool -listFSRoot  
hdfs://oldnamenode.com/user/hive/warehouse  
$ metatool -updateLocation hdfs://nameservice1 hdfs://oldnamenode.com -tablePropKey avro.schema.url 
-serdePropKey schema.url  
$ metatool -listFSRoot 
hdfs://nameservice1/user/hive/warehouse

where:

  • hdfs://oldnamenode.com/user/hive/warehouse identifies the NameNode location.
  • hdfs://nameservice1 specifies the new location and should match the value of the dfs.nameservices property.
  • tablePropKey is a table property key whose value field may reference the HDFS NameNode location and hence may require an update. To update the Avro SerDe schema URL, specify avro.schema.url for this argument.
  • serdePropKey is a SerDe property key whose value field may reference the HDFS NameNode location and hence may require an update. To update the Haivvero schema URL, specify schema.url for this argument.
  Note: The Hive MetaTool is a best effort service that tries to update as many Hive metastore records as possible. If it encounters an error during the update of a record, it skips to the next record.