Managing the Cloudera Data Science Workbench Service in Cloudera Manager

This topic describes how to configure and manage Cloudera Data Science Workbench using Cloudera Manager. The contents of this topic only apply to CSD-based deployments. If you installed Cloudera Data Science Workbench using the RPM, the Cloudera Data Science Workbench service will not be available to you in Cloudera Manager.

Adding the Cloudera Data Science Workbench Service

Cloudera Data Science Workbench is available as an add-on service for Cloudera Manager. To install Cloudera Data Science Workbench, you require the following files: a CSD JAR file that contains all the configuration needed to describe and manage the new Cloudera Data Science Workbench service, and the Cloudera Data Science Workbench parcel.

To install this service, first download and copy the CSD file to the Cloudera Manager Server host. Then use Cloudera Manager to distribute the Cloudera Data Science Workbench parcel to the relevant gateway hosts. You can then use Cloudera Manager's Add Service wizard to add the Cloudera Data Science Workbench service to your cluster.

For the complete set of instructions, see Install Cloudera Data Science Workbench.

Roles Associated with the Cloudera Data Science Workbench Service

Master

Runs the Kubernetes master components on the CDSW master host.

The Master role must only be assigned to the Cloudera Data Science Workbench master host.

Worker

Runs the Kubernetes worker/host components on the CDSW worker hosts.

The Worker role must be assigned to all Cloudera Data Science Workbench worker hosts. Do not assign the Master and Worker roles to the same host. Even if you are running a single-host proof-of-concept deployment, the single Master host will be able to run user workloads just as a worker host can.

Docker Daemon

Runs underlying Docker processes on all Cloudera Data Science Workbench hosts.

The Docker Daemon role must be assigned to every Cloudera Data Science Workbench gateway host.

Application

Runs the Cloudera Data Science Workbench application. The Application role must only be assigned to the Cloudera Data Science Workbench master host.

The Application role requires the underlying Docker Daemon and Master/Worker roles to be up and running before the Cloudera Data Science Workbench web application can be started. If you want to restart the CDSW application, you must restart the CDSW service.

Similarly, do not attempt to restart the underlying Docker Daemon role while the Master/Worker roles are still running on a host. This will result in the operation hanging indefinitely. To avoid this, always perform a full service restart.

Accessing Cloudera Data Science Workbench from Cloudera Manager

  1. Log into the Cloudera Manager Admin Console.
  2. Go to the CDSW service.
  3. Click CDSW Web UI to visit the Cloudera Data Science Workbench web application.

Configuring Cloudera Data Science Workbench Properties

In a CSD-based deployment, Cloudera Manager allows you to configure Cloudera Data Science Workbench properties without having to directly edit any configuration file.

  1. Log into the Cloudera Manager Admin Console.
  2. Go to the CDSW service.
  3. Click the Configuration tab.
  4. Use the search bar to look for the property you want to configure. You can use Cloudera Manager to configure proxies, enable TLS, reserve the master host, and enable GPU support for Cloudera Data Science Workbench.

    If you have recently migrated from an RPM-based deployment to a CSD-based deployment, a list of the properties in cdsw.conf, along with their corresponding properties in Cloudera Manager can be found in the upgrade guide here.

  5. Click Save Changes.

Starting, Stopping, and Restarting the Service

To start, stop, and restart the Cloudera Data Science Workbench service:
  1. Log into the Cloudera Manager Admin Console.
  2. On the Home > Status tab, click to the right of the CDSW service and select the action (Start, Stop, or Restart) you want to perform from the dropdown.
  3. Confirm your choice on the next screen. When you see a Finished status, the action is complete.

Points to Remember

  • After a restart, the Cloudera Data Science Workbench service in Cloudera Manager will display Good health even though the Cloudera Data Science Workbench web application might need a few more minutes to get ready to serve requests.

  • The CDSW service must be restarted every time client configuration is redeployed to the Cloudera Data Science Workbench hosts.

Reserving the Master Host for Internal CDSW Components

Starting with version 1.4.3, Cloudera Data Science Workbench allows you to reserve the master host for running internal application components and services such as Livelog, the PostgreSQL database, and so on, while user workloads run exclusively on worker hosts.

By default, the master host runs both, user workloads as well as the application's internal services. However, depending on the size of your CDSW deployment and the number of workloads running at any given time, it's possible that user workloads might dominate resources on the master host. Enabling this feature will ensure that CDSW's application components always have access to the resources they need on the master host and are not adversely affected by user workloads.

Depending on your deployment type, use one of the following sets of instructions to enable this feature:

RPM Deployments

To enable this feature on RPM-based deployments, go to the /etc/cdsw/config/cdsw.conf file and set the RESERVE_MASTER property to true.

CSD Deployments

On CSD-based deployments, this feature can be enabled in Cloudera Manager. Note that this feature is not yet available as a configuration property in Cloudera Manager. However, you can use an Advanced Configuration Snippet (Safety Valve) to configure this as follows:

  1. Log into the Cloudera Manager Admin Console.
  2. Go to the CDSW service.
  3. Click the Configuration tab.
  4. Use the search bar to look for the Master Advanced Configuration Snippet (Safety Valve) for cdsw.properties property. Add the following string to the value field:
    RESERVE_MASTER=true
  5. Click Save Changes.
  6. Restart the CDSW service to have this change go into effect.

Managing Cloudera Data Science Workbench Worker Hosts

You can add or remove workers from Cloudera Data Science Workbench using Cloudera Manager. For instructions, see:

Health Tests

Cloudera Manager runs a few health tests to confirm whether Cloudera Data Science Workbench and it's components (Master and Workers) are running, and ready to serve requests.

You can choose to enable or disable individual or summary health tests, and in some cases specify what should be included in the calculation of overall health for the service, role instance, or host. See Configuring Monitoring Settings for more information.

Tracking Disk Usage on the Application Block Device

This section demonstrates how to use Cloudera Manager to chart disk usage on the Application block device over time, and to create a trigger to notify cluster administrators when free space on the block device falls below a certain threshold. The latter is particularly important because once the Application block device runs out of memory, Cloudera Data Science Workbench will stop launching any new sessions or jobs. Advance notifications will give administrators a chance to expand the block device or cleanup existing data before Cloudera Data Science Workbench users run into any problems.

Create a Chart to Track Disk Usage on the Application Block Device

The following steps use Cloudera Manager's Chart Builder to track disk usage on the Application Block Device (mounted to /var/lib/cdsw on the CDSW master host) over time.
  1. Log into the Cloudera Manager Admin Console.
  2. Click Charts > Chart Builder.
  3. Enter a tsquery that charts memory usage on the block device. For example, the following tsquery creates a chart to track unallocated memory on the Application block device.
    select capacity_free where mountpoint="/var/lib/cdsw" and category=FILESYSTEM and hostname="<CDSW_Master_hostname>"
    Alternatively, you could use the following tsquery to track the amount of memory already in use on the block device.
    select capacity, capacity_used where mountpoint="/var/lib/cdsw" and category=FILESYSTEM and hostname="<CDSW_Master_hostname>"
    Make sure you insert the hostname for your master host as indicated in the queries.
  4. Click Build Chart. You should see a preview of the chart below.



  5. Click Save.
  6. Enter a name for the chart.
  7. Select Add chart to another dashboard. From the dropdown list of available System Dashboards, select CDH Cloudera Data Science Workbench Status Page.
  8. Click Save Chart. If you navigate back to the CDSW service page, you should now see the new chart on this page.

    For more details about Cloudera Manager's Chart Builder, see the following topic in the Cloudera Manager documentation: Charting Time Series Data.

Create a Trigger to Notify Cluster Administrators when Free Space Runs Low

The following steps create a trigger to alert Cloudera Manager cluster administrators when free space on the Application Block Device has fallen below a specific threshold.
  1. Log in to Cloudera Manager and go to the CDSW service page.
  2. Click Create Trigger.
  3. Give the trigger a name.
  4. Modify the Expression field to include a condition for the trigger to fire. For example, if the trigger should fire when unallocated memory on the Application Block Device falls below 250GB, the expression should be:
    IF (select capacity_free where mountpoint="/var/lib/cdsw" and category=FILESYSTEM and hostname="<CDSW_Master_hostname>" and LAST (capacity_free) < 250GB) DO health:concerning
    On the right hand side of the page, you should see a preview of the query you have entered and a chart that displays the result of the query as in the following sample image. Note that if the query is incorrect or incomplete you will not see the preview on the right.


  5. Click Create Trigger. If you navigate back to the CDSW service page, you should now see the new trigger in the list of Health Tests.

    For more details about Triggers, refer the following topic in the Cloudera Manager documentation: Triggers.

Creating Diagnostic Bundles

Diagnostic data for Cloudera Data Science Workbench is now available as part of the Cloudera Manager diagnostic bundle. For details on usage and diagnostic data collection in Cloudera Data Science Workbench, see Data Collection in Cloudera Data Science Workbench.