Managing HBase

Cloudera Manager requires certain additional steps to set up and configure the HBase service.

Creating the HBase Root Directory

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

When adding the HBase service, the Add Service wizard automatically creates a root directory for HBase in HDFS. If you quit the Add Service wizard or it does not finish, you can create the root directory outside the wizard by doing these steps:

  1. Choose Create Root Directory from the Actions menu in the HBase > Status tab.
  2. Click Create Root Directory again to confirm.

Graceful Shutdown

Minimum Required Role: Operator (also provided by Configurator, Cluster Administrator, Full Administrator)

A graceful shutdown of an HBase RegionServer allows the regions hosted by that RegionServer to be moved to other RegionServers before stopping the RegionServer. Cloudera Manager provides the following configuration options to perform a graceful shutdown of either an HBase RegionServer or the entire service.

To increase the speed of a rolling restart of the HBase service, set the Region Mover Threads property to a higher value. This increases the number of regions that can be moved in parallel, but places additional strain on the HMaster. In most cases, Region Mover Threads should be set to 5 or lower.

Gracefully Shutting Down an HBase RegionServer

  1. Go to the HBase service.
  2. Click the Instances tab.
  3. From the list of Role Instances, select the RegionServer you want to shut down gracefully.
  4. Select Actions for Selected > Decommission (Graceful Stop).
  5. Cloudera Manager attempts to gracefully shut down the RegionServer for the interval configured in the Graceful Shutdown Timeout configuration option, which defaults to 3 minutes. If the graceful shutdown fails, Cloudera Manager forcibly stops the process by sending a SIGKILL (kill -9) signal. HBase will perform recovery actions on regions that were on the forcibly stopped RegionServer.
  6. If you cancel the graceful shutdown before the Graceful Shutdown Timeout expires, you can still manually stop a RegionServer by selecting Actions for Selected > Stop, which sends a SIGTERM (kill -5) signal.

Gracefully Shutting Down the HBase Service

  1. Go to the HBase service.
  2. Select Actions > Stop. This tries to perform an HBase Master-driven graceful shutdown for the length of the configured Graceful Shutdown Timeout (three minutes by default), after which it abruptly shuts down the whole service.

Configuring the Graceful Shutdown Timeout Property

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

This timeout only affects a graceful shutdown of the entire HBase service, not individual RegionServers. Therefore, if you have a large cluster with many RegionServers, you should strongly consider increasing the timeout from its default of 180 seconds.

  1. Go to the HBase service.
  2. Click the Configuration tab.
  3. Select Scope > HBASE-1 (Service Wide)
  4. Use the Search box to search for the Graceful Shutdown Timeout property and edit the value.
  5. Click Save Changes to save this setting.

Configuring the HBase Thrift Server Role

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

The Thrift Server role is not added by default when you install HBase, but it is required before you can use certain other features such as the Hue HBase browser. To add the Thrift Server role:
  1. Go to the HBase service.
  2. Click the Instances tab.
  3. Click the Add Role Instances button.
  4. Select the host(s) where you want to add the Thrift Server role (you only need one for Hue) and click Continue. The Thrift Server role should appear in the instances list for the HBase server.
  5. Select the Thrift Server role instance.
  6. Select Actions for Selected > Start.

Enabling HBase Indexing

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

HBase indexing is dependent on the Key-Value Store Indexer service. The Key-Value Store Indexer service uses the Lily HBase Indexer Service to index the stream of records being added to HBase tables. Indexing allows you to query data stored in HBase with the Solr service.

  1. Go to the HBase service.
  2. Click the Configuration tab.
  3. Select Scope > HBASE-1 (Service Wide)
  4. Select Category > Backup.
  5. Select the Enable Replication and Enable Indexing properties.
  6. Click Save Changes.

Adding a Custom Coprocessor

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

The HBase coprocessor framework provides a way to extend HBase with custom functionality. To configure these properties in Cloudera Manager:
  1. Select the HBase service.
  2. Click the Configuration tab.
  3. Select Scope > All.
  4. Select Category > All.
  5. Type HBase Coprocessor in the Search box.
  6. You can configure the values of the following properties:
    • HBase Coprocessor Abort on Error (Service-Wide)
    • HBase Coprocessor Master Classes (Master Default Group)
    • HBase Coprocessor Region Classes (RegionServer Default Group)
  7. Enter a Reason for change, and then click Save Changes to commit the changes.

Disabling Loading of Coprocessors

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

In CDH 5.7 and higher, you can disable loading of system (HBase-wide) or user (table-wide) coprocessors. Cloudera recommends against disabling loading of system coprocessors, because HBase security functionality is implemented using system coprocessors. However, disabling loading of user coprocessors may be appropriate.
  1. Select the HBase service.
  2. Click the Configuration tab.
  3. Search for HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml.
  4. To disable loading of all coprocessors, add a new property with the name hbase.coprocessor.enabled and set its value to false. Cloudera does not recommend this setting.
  5. To disable loading of user coprocessors, add a new property with the name hbase.coprocessor.user.enabled and set its value to false.
  6. Enter a Reason for change, and then click Save Changes to commit the changes.

Enabling Hedged Reads on HBase

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

  1. Go to the HBase service.
  2. Click the Configuration tab.
  3. Select Scope > HBASE-1 (Service-Wide).
  4. Select Category > Performance.
  5. Configure the HDFS Hedged Read Threadpool Size and HDFS Hedged Read Delay Threshold properties. The descriptions for each of these properties on the configuration pages provide more information.
  6. Enter a Reason for change, and then click Save Changes to commit the changes.

Advanced Configuration for Write-Heavy Workloads

HBase includes several advanced configuration parameters for adjusting the number of threads available to service flushes and compactions in the presence of write-heavy workloads. Tuning these parameters incorrectly can severely degrade performance and is not necessary for most HBase clusters. If you use Cloudera Manager, configure these options using the HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml.
hbase.hstore.flusher.count
The number of threads available to flush writes from memory to disk. Never increase hbase.hstore.flusher.count to more of 50% of the number of disks available to HBase. For example, if you have 8 solid-state drives (SSDs), hbase.hstore.flusher.count should never exceed 4. This allows scanners and compactions to proceed even in the presence of very high writes.
hbase.regionserver.thread.compaction.large and hbase.regionserver.thread.compaction.small
The number of threads available to handle small and large compactions, respectively. Never increase either of these options to more than 50% of the number of disks available to HBase.

Ideally, hbase.regionserver.thread.compaction.small should be greater than or equal to hbase.regionserver.thread.compaction.large, since the large compaction threads do more intense work and will be in use longer for a given operation.

In addition to the above, if you use compression on some column families, more CPU will be used when flushing these column families to disk during flushes or compaction. The impact on CPU usage depends on the size of the flush or the amount of data to be decompressed and compressed during compactions.