Configuring Short-Circuit Reads

So-called "short-circuit" reads bypass the DataNode, allowing a client to read the file directly, as long as the client is co-located with the data. Short-circuit reads provide a substantial performance boost to many applications and help improve HBase random read profile and Impala performance.

Short-circuit reads require libhadoop.so (the Hadoop Native Library) to be accessible to both the server and the client. libhadoop.so is not available if you have installed from a tarball. You must install from an .rpm, .deb, or parcel in order to use short-circuit local reads.

Configuring Short-Circuit Reads Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

  1. Go to the HDFS service.
  2. Click the Configuration tab.
  3. Type "shortcircuit" into the Search field to display the Enable HDFS Short Circuit Read property, and verify that this feature is enabled (set to True).
  4. Go to the HBase service.
  5. Click the Configuration tab.
  6. Search for "shortcircuit".
  7. Verify that the Enable HDFS Short Circuit Read property is enabled.

Configuring Short-Circuit Reads Using the Command Line

Configure the following properties in hdfs-site.xml to enable short-circuit reads in a cluster that is not managed by Cloudera Manager:
<property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
</property>

<property>
    <name>dfs.client.read.shortcircuit.streams.cache.size</name>
    <value>1000</value>
</property>


<property>
    <name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name>
    <value>10000</value>
</property>

<property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hadoop-hdfs/dn._PORT</value>
</property>

If /var/run/hadoop-hdfs/ is group-writable, make sure its group is root.