HBase Health Tests

HBase Master Health

This is an HBase service-level health test that checks for an active, healthy HBase Master. The test returns "Bad" health if the service is running and an active Master cannot be found. If an active Master is found, then the test checks the health of that Master as well as the health of any standby Masters configured. A "Good" health result will only be returned if the active Master is healthy and all running standby Masters are healthy. A failure of this health test may indicate stopped or unhealthy Master roles, or it may indicate a problem with communication between the Cloudera Manager Service Monitor and the HBase service. Check the status of the HBase service's Master roles and look in the Cloudera Manager Service Monitor's log files for more information when this test fails. This test can be enabled or disabled using the Active Master Health Test HBase service-wide monitoring setting. The check for a healthy standby Master can be enabled or disabled with Backup Masters Health Test. In addition, the HBase Active Master Detection Window can be used to adjust the amount of time that the Cloudera Manager Service Monitor has to detect the active HBase Master before this health test fails.

Short Name: HBase Master Health

Property Name Description Template Name Default Value Unit
Active Master Health Test When computing the overall HBase cluster health, consider the active HBase Master's health. hbase_master_health_enabled true no unit
Backup Masters Health Test When computing the overall HBase cluster health, consider the health of the backup HBase Masters. hbase_backup_masters_health_enabled true no unit
HBase Active Master Detection Window The tolerance window that will be used in HBase service tests that depend on detection of the active HBase Master. hbase_active_master_detecton_window 3 MINUTES

HBase RegionServers Health

This is a HBase service-level health test that checks that enough of the RegionServers in the cluster are healthy. The test returns "Concerning" health if the number of healthy RegionServers falls below a warning threshold, expressed as a percentage of the total number of RegionServers. The test returns "Bad" health if the number of healthy and "Concerning" RegionServers falls below a critical threshold, expressed as a percentage of the total number of RegionServers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 RegionServers, this test would return "Good" health if 95 or more RegionServers have good health. This test would return "Concerning" health if at least 90 RegionServers have either "Good" or "Concerning" health. If more than 10 RegionServers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy RegionServers. Check the status of the individual RegionServers for more information. This test can be configured using the HBase HBase service-wide monitoring setting.

Short Name: RegionServers Health

Property Name Description Template Name Default Value Unit
Healthy RegionServer Monitoring Thresholds The health test thresholds of the overall RegionServer health. The check returns "Concerning" health if the percentage of "Healthy" RegionServers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" RegionServers falls below the critical threshold. hbase_regionservers_healthy_thresholds critical:90.0, warning:95.0 PERCENT