Specifying Racks for Hosts

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

To get maximum performance, it is important to configure CDH so that it knows the topology of your network. Network locations such as hosts and racks are represented in a tree, which reflects the network “distance” between locations. HDFS will use the network location to be able to place block replicas more intelligently to trade off performance and resilience. When placing jobs on hosts, CDH will prefer within-rack transfers (where there is more bandwidth available) to off-rack transfers; the MapReduce and YARN schedulers use network location to determine where the closest replica is as input to a map task. These computations are performed with the assistance of rack awareness scripts.

Cloudera Manager includes internal rack awareness scripts, but you must specify the racks where the hosts in your cluster are located. If your cluster contains more than 10 hosts, Cloudera recommends that you specify the rack for each host. HDFS, MapReduce, and YARN will automatically use the racks you specify.

Cloudera Manager supports nested rack specifications. For example, you could specify the rack /rack3, or /group5/rack3 to indicate the third rack in the fifth group. All hosts in a cluster must have the same number of path components in their rack specifications.

To specify racks for hosts:

  1. Click the Hosts tab.
  2. Check the checkboxes next to the host(s) for a particular rack, such as all hosts for /rack123.
  3. Click Actions for Selected (n) > Assign Rack, where n is the number of selected hosts.
  4. Enter a rack name or ID that starts with a slash /, such as /rack123 or /aisle1/rack123, and then click Confirm.
  5. Optionally restart affected services. Rack assignments are not automatically updated for running services.