Configuring Storage-Balancing for DataNodes

You can configure HDFS to distribute writes on each DataNode in a manner that balances out available storage among that DataNode's disk volumes.

By default a DataNode writes new block replicas to disk volumes solely on a round-robin basis. You can configure a volume-choosing policy that causes the DataNode to take into account how much space is available on each volume when deciding where to place a new replica.

You can configure
  • how much DataNode volumes are allowed to differ in terms of bytes of free disk space before they are considered imbalanced, and
  • what percentage of new block allocations will be sent to volumes with more available disk space than others.

Configuring Storage-Balancing for DataNodes Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

  1. Go to the HDFS service.
  2. Click the Configuration tab.
  3. Select Scope > DataNode.
  4. Select Category > Advanced.
  5. Configure the following properties (you can use the Search box to locate the properties):

    Property

    Value

    Description

    dfs.datanode.
    fsdataset.
    volume.choosing.
    policy
    org.apache.hadoop.
    hdfs.server.datanode.
    fsdataset.
    AvailableSpaceVolumeChoosingPolicy

    Enables storage balancing among the DataNode's volumes.

    dfs.datanode.
    available-space-
    volume-choosing-
    policy.balanced-
    space-threshold 
    10737418240 (default)

    The amount by which volumes are allowed to differ from each other in terms of bytes of free disk space before they are considered imbalanced. The default is 10737418240 (10 GB).

    If the free space on each volume is within this range of the other volumes, the volumes will be considered balanced and block assignments will be done on a pure round-robin basis.

    dfs.datanode.
    available-space-
    volume-choosing-
    policy.balanced-
    space-preference-
    fraction
    0.75 (default) What proportion of new block allocations will be sent to volumes with more available disk space than others. The allowable range is 0.0-1.0, but set it in the range 0.5 - 1.0 (that is, 50-100%), since there should be no reason to prefer that volumes with less available disk space receive more block allocations.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties.

  6. Click Save Changes to commit the changes.
  7. Restart the role.

Configuring Storage-Balancing for DataNodes Using the Command Line