Cloudera Data Science Workbench Health Tests

Cloudera Data Science Workbench Application Health

This Cloudera Data Science Workbench service-level health test checks for the presence of a running, healthy Application. The test returns "Bad" health if the service is running and the Application is not running. In all other cases it returns the health of the Application. A failure of this health test indicates a stopped or unhealthy Application. Check the status of the Application for more information. This test can be enabled or disabled using the Application Role Health Test Application service-wide monitoring setting.

Short Name: Application Health

Property Name Description Template Name Default Value Unit
Application Role Health Test When computing the overall CDSW health, consider Application's health CDSW_CDSW_APPLICATION_health_enabled true no unit

Cloudera Data Science Workbench CDSW Status

This health test ensures Cloudera Data Science Workbench is ready to serve requests. If unhealthy you should verify the service configurations and refer to Troubleshooting Cloudera Data Science Workbench.

Short Name: CDSW Status

Cloudera Data Science Workbench Docker Daemon Health

This is a Cloudera Data Science Workbench service-level health test that checks that enough of the Docker Daemons in the cluster are healthy. The test returns "Concerning" health if the number of healthy Docker Daemons falls below a warning threshold, expressed as a percentage of the total number of Docker Daemons. The test returns "Bad" health if the number of healthy and "Concerning" Docker Daemons falls below a critical threshold, expressed as a percentage of the total number of Docker Daemons. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Docker Daemons, this test would return "Good" health if 95 or more Docker Daemons have good health. This test would return "Concerning" health if at least 90 Docker Daemons have either "Good" or "Concerning" health. If more than 10 Docker Daemons have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Docker Daemons. Check the status of the individual Docker Daemons for more information. This test can be configured using the Cloudera Data Science Workbench Cloudera Data Science Workbench service-wide monitoring setting.

Short Name: Docker Daemon Health

Property Name Description Template Name Default Value Unit
Healthy Docker Daemon Monitoring Thresholds The health test thresholds of the overall Docker Daemon health. The check returns "Concerning" health if the percentage of "Healthy" Docker Daemons falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Docker Daemons falls below the critical threshold. CDSW_CDSW_DOCKER_healthy_thresholds critical:70.0, warning:95.0 PERCENT

Cloudera Data Science Workbench Master Health

This Cloudera Data Science Workbench service-level health test checks for the presence of a running, healthy Master. The test returns "Bad" health if the service is running and the Master is not running. In all other cases it returns the health of the Master. A failure of this health test indicates a stopped or unhealthy Master. Check the status of the Master for more information. This test can be enabled or disabled using the Master Role Health Test Master service-wide monitoring setting.

Short Name: Master Health

Property Name Description Template Name Default Value Unit
Master Role Health Test When computing the overall CDSW health, consider Master's health CDSW_CDSW_MASTER_health_enabled true no unit

Cloudera Data Science Workbench Worker Health

This is a Cloudera Data Science Workbench service-level health test that checks that enough of the Workers in the cluster are healthy. The test returns "Concerning" health if the number of healthy Workers falls below a warning threshold, expressed as a percentage of the total number of Workers. The test returns "Bad" health if the number of healthy and "Concerning" Workers falls below a critical threshold, expressed as a percentage of the total number of Workers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 Workers, this test would return "Good" health if 95 or more Workers have good health. This test would return "Concerning" health if at least 90 Workers have either "Good" or "Concerning" health. If more than 10 Workers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy Workers. Check the status of the individual Workers for more information. This test can be configured using the Cloudera Data Science Workbench Cloudera Data Science Workbench service-wide monitoring setting.

Short Name: Worker Health

Property Name Description Template Name Default Value Unit
Healthy Worker Monitoring Thresholds The health test thresholds of the overall Worker health. The check returns "Concerning" health if the percentage of "Healthy" Workers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" Workers falls below the critical threshold. CDSW_CDSW_WORKER_healthy_thresholds critical:70.0, warning:95.0 PERCENT