This is the documentation for CDH 4.7.1.
Documentation for other versions is available at Cloudera Documentation.

Configuration Settings for HBase

This section contains information on configuring the Linux host and HDFS for HBase.

Using DNS with HBase

HBase uses the local hostname to report its IP address. Both forward and reverse DNS resolving should work. If your machine has multiple interfaces, HBase uses the interface that the primary hostname resolves to. If this is insufficient, you can set hbase.regionserver.dns.interface in the hbase-site.xml file to indicate the primary interface. To work properly, this setting requires that your cluster configuration is consistent and every host has the same network interface configuration. As an alternative, you can set hbase.regionserver.dns.nameserver in the hbase-site.xml file to choose a different name server than the system-wide default.

Using the Network Time Protocol (NTP) with HBase

The clocks on cluster members should be in basic alignments. Some skew is tolerable, but excessive skew could generate odd behaviors. Run NTP on your cluster, or an equivalent. If you are having problems querying data or unusual cluster operations, verify the system time. For more information about NTP, see the NTP site.

Setting User Limits for HBase

Because HBase is a database, it uses a lot of files at the same time. The default ulimit setting of 1024 for the maximum number of open files on Unix-like systems is insufficient. Any significant amount of loading will result in failures and cause the error message java.io.IOException...(Too many open files) to be logged in the HBase or HDFS log files. For more information about this issue, see the Apache HBase Book. You may also notice errors such as:

2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901

Configuring ulimit for HBase

Cloudera recommends increasing the maximum number of file handles to more than 10,000. Note that increasing the file handles for the user who is running the HBase process is an operating system configuration, not an HBase configuration. Also, a common mistake is to increase the number of file handles for a particular user but, for whatever reason, HBase will be running as a different user. HBase prints the ulimit it is using on the first line in the logs. Make sure that it is correct.

If you are using ulimit, you must make the following configuration changes:

  1. In the /etc/security/limits.conf file, add the following lines:
hdfs  -       nofile  32768
hbase -       nofile  32768
  Note:
  • Only the root user can edit this file.
  • If this change does not take effect, check other configuration files in the /etc/security/limits.d directory for lines containing the hdfs or hbase user and the nofile value. Such entries may be overriding the entries in /etc/security/limits.conf.

To apply the changes in /etc/security/limits.conf on Ubuntu and Debian systems, add the following line in the /etc/pam.d/common-session file:

session required  pam_limits.so

Using dfs.datanode.max.xcievers with HBase

A Hadoop HDFS DataNode has an upper bound on the number of files that it can serve at any one time. The upper bound property is called dfs.datanode.max.xcievers (the property is spelled in the code exactly as shown here). Before loading, make sure you have configured the value for dfs.datanode.max.xcievers in the conf/hdfs-site.xml file (by default found in /etc/hadoop/conf/hdfs-site.xml) to at least 4096 as shown below:

<property>
  <name>dfs.datanode.max.xcievers</name>
  <value>4096</value>
</property>

Be sure to restart HDFS after changing the value for dfs.datanode.max.xcievers. If you don't change that value as described, strange failures can occur and an error message about exceeding the number of xcievers will be added to the DataNode logs. Other error messages about missing blocks are also logged, such as:

10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: 
java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...