Adding an Additional DSSD D5 to a Cluster

You can add additional DSSD D5 appliances to a CDH cluster manged by Cloudera Manager, move hosts to a different appliance, and change the Block Replica Placement Policy.

Adding a New DSSD D5 with New Hosts

To increase capacity and performance, you can configure a cluster that already uses a single DSSD D5 appliance to use additional DSSD D5 appliances. You configure the cluster by assigning all hosts connected to a DSSD D5 appliance to a single "rack" and selecting one of three policies used by the NameNode to satisfy the configured replication factor. You can also move existing hosts to a new DSSD D5 appliance.

The data stored in the cluster before adding additional DSSD D5 appliances satisfies the Block Replica Placement Policy that was configured before adding the new appliances.
  • If the Block Replica Placement Policy was Maximize Capacity , existing data continues to be stored on the original DSSD D5 appliance.
  • If the Block Replica Placement Policy was Maximize Availability:
    • The NameNode does not report any under-replicated blocks.
    • The NameNode begins replicating blocks across multiple DSSD D5s in the background. This can cause a temporary drop in performance during the relocation process, but does not affect performance when reading existing blocks.
    • The cluster must have sufficient available capacity across the DSSD D5 appliances to complete the relocation of the blocks.
To add an additional DSSD D5 appliance:
  1. Perform the following tasks in the DSSD D5 environment:
    • Installing and racking the DSSD D5 Storage Appliance.
    • Installing the DSSD D5 PCI cards in the DataNode hosts.
    • Connecting the DataNode hosts to the DSSD D5.
    • Installing and configuring the DSSD D5 drivers.
    • Installing and configuring the DSSD D5 client software.
    • Creating a volume on the DSSD D5 for the DataNodes.
    • Identifying CPUs and NUMA nodes. See the EMC document DSSD Hadoop Plugin Installation Guide for more information. You use the information from this task in a later step to configure the Libflood CPU ID parameter during the initial configuration of Cloudera Manager.

    See the EMC DSSD D5 document DSSD D5 Installation and Service Guide for more information about these tasks.

  2. Assign the hosts attached to each DSSD D5 to a single rack ID. All hosts attached to a D5 should have the same rack assignment and each DSSD D5 should have a unique rack ID. See Specifying Racks for Hosts.
  3. Add the new hosts to the cluster. See Adding a Host to the Cluster.
  4. Add the HDFS DSSD DataNode role to each new host. See Adding a Role Instance. You can also create a DSSD DataNode role group to help manage configurations and operations that are common to all DataNodes or only to DataNodes for a specific DSSD D5. See Role Groups.
  5. (Optional) Change the Block Replica Placement Policy:
    1. Go to the HDFS service, select the Configuration tab, and search for the Block Replica Placement Policy property.
    2. Set the value of the Block Replica Placement Policy property to one of the following values:
      HDFS Default
      Places the first replica on the node where the client process writing the block resides, the second replica on a randomly-chosen remote rack, and a third on a randomly-chosen host in the same remote rack (assuming a replication factor of 3). This ordering is fixed.
      Maximize Capacity
      Places all replicas on the same rack and uses all the capacity of the DSSD D5 for HDFS. If there are fewer DataNode hosts than the configured replication factor, blocks are under-replicated. To avoid under-replication, make sure that there are more DataNodes than the replication factor.
      Maximize Availability
      Places replicas in as many racks as needed to meet the configured replication factor. After replicas have been placed on all available racks, additional replicas are placed randomly across the available racks. If there are fewer DataNode hosts than the configured replication factor, blocks are under-replicated. To avoid under-replication, make sure that there are more DataNodes than the replication factor.
  6. Perform a Rolling Restart on the cluster. Select Clusters > Cluster Name > Actions > Rolling Restart.

Moving Existing Hosts to a New DSSD D5

  1. Decommission the DataNodes you are moving. See Decommissioning Hosts.
  2. Shutdown the hosts you are moving.
  3. Complete the instructions in the DSSD D5 documentation for cabling, configuring, and installing the DSSD hardware.
  4. Assign the hosts you are moving to the new DSSD D5 to a single rack ID and assign the hosts for other DSSD D5 appliances to a single rack ID. All hosts attached to a D5 should have the same rack assignment and each DSSD D5 should have a unique rack ID. See Specifying Racks for Hosts.
  5. (Optional) Change the Block Replica Placement Policy (Cloudera recommends that you do not change the Block Replica Placement Policy when moving hosts):
    1. Go to the HDFS service, select the Configuration tab, and search for the Block Replica Placement Policy property.
    2. Set the value of the Block Replica Placement Policy property to one of the following values:
      HDFS Default
      Places the first replica on the node where the client process writing the block resides, the second replica on a randomly-chosen remote rack, and a third on a randomly-chosen host in the same remote rack (assuming a replication factor of 3). This ordering is fixed.
      Maximize Capacity
      Places all replicas on the same rack and uses all the capacity of the DSSD D5 for HDFS. If there are fewer DataNode hosts than the configured replication factor, blocks are under-replicated. To avoid under-replication, make sure that there are more DataNodes than the replication factor.
      Maximize Availability
      Places replicas in as many racks as needed to meet the configured replication factor. After replicas have been placed on all available racks, additional replicas are placed randomly across the available racks. If there are fewer DataNode hosts than the configured replication factor, blocks are under-replicated. To avoid under-replication, make sure that there are more DataNodes than the replication factor.
  6. Perform a Rolling Restart on the cluster. Select Clusters > Cluster Name > Actions > Rolling Restart.

Changing the Block Replica Placement Policy or Rack Assignments

Cloudera recommends that you retain the Block Replica Placement Policy in production systems. If it becomes necessary to change the policy, note the following considerations:
  • Changing from Maximize Availability to Maximize Capacity is not supported.
  • Changing from Maximize Capacity to Maximize Availability:
    • The NameNode does not report any under-replicated blocks.
    • The NameNode begins replicating blocks across multiple DSSD D5s in the background. This can cause a temporary drop in performance during the relocation process, but does not affect performance when reading existing blocks.
    • The cluster must have sufficient available capacity across the DSSD D5 appliances to complete the relocation of the blocks.
To change the Block Replica Placement Policy or rack assignments:
  1. (If necessary) Change the rack assignments. See Specifying Racks for Hosts.
  2. (If necessary) Change the Block Replica Placement Policy. Go to the HDFS service, select the Configuration tab, and search for the Block Replica Placement Policy property.
  3. Perform a Rolling Restart on the cluster. Select Clusters > Cluster Name > Actions > Rolling Restart.