Hedged Reads

Hadoop 2.4 introduced a new feature called hedged reads, in HDFS-5776. If a read from a block is slow, the HDFS client starts up another parallel, 'hedged' read against a different block replica. The result of whichever read returns first is used, and the outstanding read is cancelled. This feature helps in situations where a read occasionally takes a long time rather than when there is a systemic problem. Hedged reads can be enabled for HBase when the HFiles are stored in HDFS. This feature is disabled by default.

Enabling Hedged Reads for HBase Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

  1. Go to the HBase service.
  2. Click the Configuration tab.
  3. Expand the Service-Wide category.
  4. Select Performance.
  5. Configure the HDFS Hedged Read Threadpool Size and HDFS Hedged Read Delay Threshold properties. The descriptions for each of these properties on the configuration pages provide more information.
  6. Click Save Changes to commit the changes.

Enabling Hedged Reads for HBase Using the Command Line

To enable hedged reads for HBase, edit the hbase-site.xml file on each server. Set dfs.client.hedged.read.threadpool.size to the number of threads to dedicate to running hedged threads, and set the dfs.client.hedged.read.threshold.millis configuration property to the number of milliseconds to wait before starting a second read against a different block replica. Set dfs.client.hedged.read.threadpool.size to 0 or remove it from the configuration to disable the feature. After changing these properties, restart your cluster.

The following is an example configuration for hedged reads for HBase.

<property>
  <name>dfs.client.hedged.read.threadpool.size</name>
  <value>20</value>  <!-- 20 threads -->
</property>
<property>
  <name>dfs.client.hedged.read.threshold.millis</name>
  <value>10</value>  <!-- 10 milliseconds -->
</property>

Monitoring the Performance of Hedged Reads

You can monitor the performance of hedged reads using the following metrics emitted by Hadoop when hedged reads are enabled.
  • hedgedReadOps - the number of hedged reads that have occurred
  • hedgeReadOpsWin - the number of times the hedged read returned faster than the original read