Administering an HDFS High Availability Cluster
Manually Failing Over to the Standby NameNode
Manually Failing Over to the Standby NameNode Using Cloudera Manager
If you are running a HDFS service with HA enabled, you can manually cause the active NameNode to failover to the standby NameNode. This is useful for planned downtime—for hardware changes, configuration changes, or software upgrades of your primary host.
- Go to the HDFS service.
- Click the Instances tab.
- Select . (This option does not appear if HA is not enabled for the cluster.)
- From the pop-up, select the NameNode that should be made active, then click Manual Failover.
- When all the steps have been completed, click Finish.
Cloudera Manager transitions the NameNode you selected to be the active NameNode, and the other NameNode to be the standby NameNode. HDFS should never have two active NameNodes.
Manually Failing Over to the Standby NameNode Using the Command Line
To initiate a failover between two NameNodes, run the command hdfs haadmin -failover.
Moving an HA NameNode to a New Host
Moving an HA NameNode to a New Host Using Cloudera Manager
Moving an HA NameNode to a New Host Using the Command Line
Use the following steps to move one of the NameNodes to a new host.
In this example, the current NameNodes are called nn1 and nn2, and the new NameNode is nn2-alt. The example assumes that nn2-alt is already a member of this CDH 5 HA cluster, that automatic failover is configured and that a JournalNode on nn2 is to be moved to nn2-alt, in addition to NameNode service itself.
The procedure moves the NameNode and JournalNode services from nn2 to nn2-alt, reconfigures nn1 to recognize the new location of the JournalNode, and restarts nn1 and nn2-alt in the new HA configuration.
Step 1: Make sure that nn1 is the active NameNode
Make sure that the NameNode that is not going to be moved is active; in this example, nn1 must be active. You can use the NameNodes' web UIs to see which is active; see Start the NameNodes.
hdfs haadmin -failover nn2 nn1
Step 2: Stop services on nn2
- Stop the NameNode daemon:
$ sudo service hadoop-hdfs-namenode stop
- Stop the ZKFC daemon if it is running:
$ sudo service hadoop-hdfs-zkfc stop
- Stop the JournalNode daemon if it is running:
$ sudo service hadoop-hdfs-journalnode stop
- Make sure these services are not set to restart on boot. If you are not planning to use nn2 as a NameNode again, you may want remove the services.
Step 3: Install the NameNode daemon on nn2-alt
Step 4: Configure HA on nn2-altSee Enabling HDFS HA for the properties to configure on nn2-alt in core-site.xml and hdfs-site.xml , and explanations and instructions. You should copy the values that are already set in the corresponding files on nn2.
Step 5: Copy the contents of the dfs.name.dir and dfs.journalnode.edits.dir directories to nn2-alt
Use rsync or a similar tool to copy the contents of the dfs.name.dir directory, and the dfs.journalnode.edits.dir directory if you are moving the JournalNode, from nn2 to nn2-alt.
Step 6: If you are moving a JournalNode, update dfs.namenode.shared.edits.dir on nn1
If you are relocating a JournalNode from nn2 to nn2-alt, update dfs.namenode.shared.edits.dir in hdfs-site.xml on nn1 to reflect the new hostname. See this section for more information about dfs.namenode.shared.edits.dir.
Step 7: If you are using automatic failover, install the zkfc daemon on nn2-alt
For instructions, see Deploy Automatic Failover (if it is configured), but do not start the daemon yet.
Step 8: Start services on nn2-alt
Start the NameNode; start the ZKFC for automatic failover; and install and start a JournalNode if you want one to run on nn2-alt. Proceed as follows.
- Start the JournalNode daemon:
$ sudo service hadoop-hdfs-journalnode start
- Start the NameNode daemon:
$ sudo service hadoop-hdfs-namenode start
- Start the ZKFC daemon:
$ sudo service hadoop-hdfs-zkfc start
- Set these services to restart on boot; for example on a RHEL-compatible system:
$ sudo chkconfig hadoop-hdfs-namenode on $ sudo chkconfig hadoop-hdfs-zkfc on $ sudo chkconfig hadoop-hdfs-journalnode on
Step 9: If you are relocating a JournalNode, fail over to nn2-alt
hdfs haadmin -failover nn1 nn2-alt
Other HDFS haadmin Commands
After your HA NameNodes are configured and started, you will have access to some additional commands to administer your HA HDFS cluster. Specifically, you should familiarize yourself with the subcommands of the hdfs haadmin command.
This page describes high-level uses of some important subcommands. For specific usage information of each subcommand, you should run hdfs haadmin -help <command>.
getServiceState - determine whether the given NameNode is active or standby
Connect to the provided NameNode to determine its current state, printing either "standby" or "active" to STDOUT as appropriate. This subcommand might be used by cron jobs or monitoring scripts which need to behave differently based on whether the NameNode is currently active or standby.
checkHealth - check the health of the given NameNode
Connect to the provided NameNode to check its health. The NameNode is capable of performing some diagnostics on itself, including checking if internal services are running as expected. This command will return 0 if the NameNode is healthy, non-zero otherwise. One might use this command for monitoring purposes.
Using the dfsadmin Command When HA is Enabled
By default, applicable dfsadmin command options are run against both active and standby NameNodes. To limit an option to a specific NameNode, use the -fs option. For example,
To turn safe mode on for both NameNodes, run:
hdfs dfsadmin -safemode enter
To turn safe mode on for a single NameNode, run:
hdfs dfsadmin -fs hdfs://<host>:<port> -safemode enter
For a full list of dfsadmin command options, run: hdfs dfsadmin -help.
Converting From an NFS-mounted Shared Edits Directory to Quorum-based Storage
Converting From an NFS-mounted Shared Edits Directory to Quorum-based Storage Using Cloudera Manager
Converting a HA configuration from using an NFS-mounted shared edits directory to Quorum-based storage involves disabling the current HA configuration then enabling HA using Quorum-based storage.