Enabling Navigator HSM KMS High Availability

CDH 5.12.0 and higher supports HSM KMS high availability. For new installations, you can use the Set up HDFS Data At Rest Encryption wizard to install and configure HSM KMS high availability. If you have an existing standalone HSM KMS service, use the following procedure to enable HSM KMS high availability:
  1. If you do not have a ZooKeeper service in your cluster, add one using the instructions in Adding a Service.
  2. Run the Add Role Instances wizard for the HSM KMS service (HSM KMS service > Actions > Add Role Instances).
  3. Click Select hosts and check the box for the host where you want to add the additional Key Management Server Proxy role. See Resource Planning for Data at Rest Encryption for considerations when selecting a host. Click OK and then Continue.
  4. On the Review Changes page of the wizard, confirm the authorization code, organization name, and HSM KMS settings, and then click Finish.
  5. Go to HSM KMS service > Configuration and make sure that the ZooKeeper Service dependency is set to the ZooKeeper service for your cluster.
  6. In the Add Role Instance path, the initialize metastore action does not run automatically (as it does for the Add Service wizard). When a new metastore instance is added, the initialize metastore action must be run manually before starting the metastore). So, stop both role instances (metastore and proxy) and then run the initialize metastore action.
  7. Restart the HSM KMS service (HSM KMS service > Actions > Restart).
  8. Restart the cluster.
  9. Redeploy the client configuration (Home > Cluster-wide > Select from Cluster drop-down menu (arrow icon) > Deploy Client Configuration).
  10. Re-run the steps in Validating Hadoop Key Operations.

HSM KMS High Availability Backup and Recovery

When running the HSM KMS in high availability mode, if either of the two nodes fails, a role instance can be assigned to another node and federated into the service by the single remaining active node. In other words, you can bring a node that is part of the cluster, but that is not running HSM KMS role instances, into the service by making it an HSM KMS role instance–more specifically, an HSM KMS proxy role instance and an HSM KMS metastore role instance. So each node acts as an online ("hot" backup) backup of the other. In many cases, this will be sufficient. However, if a manual ("cold" backup) backup of the files necessary to restore the service from scratch is desirable, you can create that as well.

To create a backup, copy the /var/lib/hsmkp and /var/lib/hsmkp-meta directories on one or more of the nodes running HSM KMS role instances.

To restore from a backup: bring up a completely new instance of the HSM KMS service, and copy the /var/lib/hsmkp and /var/lib/hsmkp-meta directories from the backup onto the file system of the restored nodes before starting HSM KMS for the first time.