Performing Maintenance on a Cluster Host

You can perform minor maintenance on cluster hosts by using Cloudera Manager to manage the host decommission and recommission process. In this process, you can specify whether to suppress alerts from the decommissioned host and, for hosts running the DataNode role, you can specify whether or not to replicate under-replicated data blocks to other DataNodes to maintain the cluster's replication factor. This feature is useful when performing minor maintenance on cluster hosts, such as adding memory or changing network cards or cables where the maintenance window is expected to be short and the extra cluster resources consumed by replicating missing blocks is undesirable.

You can also place hosts into Maintenance Mode, which suppresses unneeded alerts during a maintenance window but does not decommission the hosts.

To perform host maintenance on cluster hosts:
  1. Decommission the hosts.
  2. Perform the necessary maintenance on the hosts.
  3. Recommission the hosts.

Decommissioning Hosts

Minimum Required Role: Limited Operator (also provided by Operator, Configurator, Cluster Administrator, or Full Administrator)

Note that the Limited Operator and Operator roles do not allow you to suppress or enable alerts.

Cloudera Manager manages the host decommission and recommission process and allows you the option to specify whether to replicate the data to other DataNodes, and whether or not to suppress alerts.

Decommissioning a host decommissions and stops all roles on the host without requiring you to individually decommission the roles on each service. Decommissioning applies to only to HDFS DataNode, MapReduce TaskTracker, YARN NodeManager, and HBase RegionServer roles. If the host has other roles running on it, those roles are stopped.

To decommission one or more hosts:

  1. If the host has a DataNode, and you are planning to replicate data to other hosts (for longer term maintenance operations or to permanently decommission or repurpose the host), perform the steps in Tuning HDFS Prior to Decommissioning DataNodes.
  2. In Cloudera Manager, select the cluster where you want to decommission hosts.
  3. Click Hosts > All Hosts.
  4. Select the hosts that you want to decommission.
  5. Select Actions for Selected > Begin Maintenance (Suppress Alerts/Decommission.

    (If you are logged in as a user with the Limited Operator or Operator role, the menu item is labeled Decommission Host(s) and you will not see the option to suppress alerts.)

    The Begin Maintenance (Suppress Alerts/Decommission) dialog box opens. The role instances running on the hosts display at the top.

  6. To decommission the hosts and suppress alerts, select Decommission Host(s). When you select this option for hosts running a DataNode role, choose one of the following (if the host is not running a DataNode role, you will only see the Decommission Host(s) option:):
    • Decommission DataNodes

      This option re-replicates data to other DataNodes in the cluster according to the configured replication factor. Depending on the amount of data and other factors, this can take a significant amount of time and uses a great deal of network bandwidth. This option is appropriate when replacing disks, repurposing hosts for non-HDFS use, or permanently retiring hardware.

    • Take DataNode Offline
      This option does not re-replicate HDFS data to other DataNodes until the amount of time you specify has passed, making it less disruptive to active workloads. After this time has passed, the DataNode is automatically recommissioned, but the DataNode role is not started. This option is appropriate for short-term maintenance tasks such not involving disks, such as rebooting, CPU/RAM upgrades, or switching network cables.
      CAUTION:
      Taking multiple DataNodes offline simultaneously increases the chances that some HDFS data may become unavailable during maintenance. Configuring the proper value for the Maintenance State Minimal Block Replication HDFS configuration property will avoid risking data availability. See Cloudera Manager Configuration Properties Reference.
  7. Click Begin Maintenance.

    The Host Decommission Command dialog box opens and displays the progress of the command.

Recommissioning Hosts

Minimum Required Role: Operator (also provided by Configurator, Cluster Administrator, Full Administrator)

Only hosts that are decommissioned using Cloudera Manager can be recommissioned.

  1. In Cloudera Manager, select the cluster where you want to recommission hosts.
  2. Click Hosts > All Hosts.
  3. Select the hosts that you want to recommission.
  4. Select Actions for Selected > End Maintenance (Suppress Alerts/Decommission.

    The End Maintenance (Suppress Alerts/Decommission dialog box opens. The role instances running on the hosts display at the top.

  5. To recommission the hosts, select Recommission Host(s).
  6. Choose one of the following:
    • Bring hosts online and start all roles

      All decommissioned roles will be recommissioned and started. HDFS DataNodes will be started first and brought online before decommissioning to avoid excess replication.

    • Bring hosts online

      All decommissioned roles will be recommissioned but remain stopped. You can restart the roles later.

  7. Click End Maintenance.

    The Recommission Hosts and Start Roles Command dialog box opens and displays the progress of recommissioning the hosts and restarting the roles.

Stopping All the Roles on a Host

Minimum Required Role: Operator (also provided by Configurator, Cluster Administrator, Full Administrator)

  1. Click the Hosts tab.
  2. Select one or more hosts on which to stop all roles.
  3. Select Actions for Selected > Stop Roles on Hosts.

Starting All the Roles on a Host

Minimum Required Role: Operator (also provided by Configurator, Cluster Administrator, Full Administrator)

  1. Click the Hosts tab.
  2. Select one or more hosts on which to start all roles.
  3. Select Actions for Selected > Start Roles on Hosts.