Known Issues and Limitations in Cloudera Data Science Workbench 1.6.x

Installation

During the Cloudera Data Science Workbench startup process, you might see certain timeout issues.
Pods not ready in cluster default ['role/<pod_name>'].
This is due to an issue with some pods taking longer to start up and other dependent processes timing out. Restart the CDSW service to get past this issue.

Cloudera Bug: DSE-6845, DSE-6855

Upgrades

TSB-350: Permanent Fix for Data Loss Risk During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

TSB-346 was released in the time-frame of CDSW 1.4.2 to fix this issue, but it only turned out to be a partial fix. With CDSW 1.4.3, we have fixed the issue permanently and released TSB-350 to address this fix. Note that the script that was provided with TSB-346 still ensures that data loss is prevented and must be used to shutdown/restart all the affected CDSW released listed below.

Affected Versions: Cloudera Data Science Workbench 1.0.x, 1.1.x, 1.2.x, 1.3.x, 1.4.0, 1.4.1, 1.4.2

Fixed Version: Cloudera Data Science Workbench 1.4.3 (and higher)

Cloudera Bug: DSE-5108

The complete text for TSB-350 is available in the 1.4.3 release notes and in the Cloudera Security Bulletins: TSB-350: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart.

TSB-346: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

Stopping Cloudera Data Science Workbench involves unmounting the NFS volumes that store CDSW project directories and then cleaning up a folder where the kubelet stores its temporary state. However, due to a race condition, this NFS unmount process can take too long or fail altogether. If this happens, CDSW projects that remain mounted will be deleted by the cleanup step.

Products affected: Cloudera Data Science Workbench

Releases affected: Cloudera Data Science Workbench versions -
  • 1.0.x

  • 1.1.x

  • 1.2.x

  • 1.3.0, 1.3.1

  • 1.4.0, 1.4.1

Users affected: This potentially affects all CDSW users.

Detected by: Nehmé Tohmé (Cloudera)

Severity (Low/Medium/High): High

Impact: If the NFS unmount fails during shutdown, data loss can occur. All CDSW project files might be deleted.

CVE: N/A

Immediate action required: If you are running any of the affected Cloudera Data Science Workbench versions, you must run the following script on the CDSW master host every time before you stop or restart Cloudera Data Science Workbench. Failure to do so can result in data loss.

This script should also be run before initiating a Cloudera Data Science Workbench upgrade. As always, we recommend creating a full backup prior to beginning an upgrade.

cdsw_protect_stop_restart.sh - Available for download at: cdsw_protect_stop_restart.sh.

#!/bin/bash

set -e

cat << EXPLANATION


This script is a workaround for Cloudera TSB-346. It protects your
CDSW projects from a rare race condition that can result in data loss.
Run this script before stopping the CDSW service, irrespective of whether
the stop precedes a restart, upgrade, or any other task.

Run this script only on the master node of your CDSW cluster.

You will be asked to specify a target folder on the master node where the
script will save a backup of all your project files. Make sure the target
folder has enough free space to accommodate all of your project files. To
determine how much space is required, run 'du -hs /var/lib/cdsw/current/projects'
on the CDSW master node.

This script will first back up your project files to the specified target
folder. It will then temporarily move your project files aside to protect
against the data loss condition. At that point, it is safe to stop the CDSW
service. After CDSW has stopped, the script will move the project files back
into place.

Note: This workaround is not required for CDSW 1.4.2 and higher.



EXPLANATION

read -p "Enter target folder for backups: " backup_target

echo "Backing up to $backup_target..."
rsync -azp /var/lib/cdsw/current/projects "$backup_target"

read -n 1 -p "Backup complete. Press enter when you are ready to stop CDSW: "

echo "Deleting all Kubernetes resources..."
kubectl delete configmaps,deployments,daemonsets,replicasets,services,ingress,secrets,persistentvolumes,persistentvolumeclaims,jobs --all
kubectl delete pods --all

echo "Temporarily saving project files to /var/lib/cdsw/current/projects_tmp..."
mkdir /var/lib/cdsw/current/projects_tmp
mv /var/lib/cdsw/current/projects/* /var/lib/cdsw/current/projects_tmp

echo -e "Please stop the CDSW service."

read -n 1 -p "Press enter when CDSW has stopped: "

echo "Moving projects back into place..."
mv /var/lib/cdsw/current/projects_tmp/* /var/lib/cdsw/current/projects
rm -rf /var/lib/cdsw/current/projects_tmp

echo -e "Done. You may now upgrade or start the CDSW service."
echo -e "When CDSW is running, if desired, you may delete the backup data at $backup_target"

Addressed in release/refresh/patch: This issue is fixed in Cloudera Data Science Workbench 1.4.2.

Note that you are required to run the workaround script above when you upgrade from an affected version to a release with the fix. This helps guard against data loss when the affected version needs to be shut down during the upgrade process.

For the latest update on this issue see the corresponding Knowledge article:

TSB 2018-346: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

(Red Hat Only) Host Reboot Required for Upgrades from Cloudera Data Science Workbench 1.4.0

Cloudera Data Science Workbench 1.4.2 includes a fix for a Red Hat kernel slab leak issue that was found in Cloudera Data Science Workbench 1.4.0. However, to have this fix go into effect, Red Hat users must reboot all Cloudera Data Science Workbench hosts before proceeding with an upgrade from CDSW 1.4.0 to CDSW 1.4.2 (or higher).

Therefore, if you are a Red Hat user upgrading from Cloudera Data Science Workbench 1.4.0, your upgrade path will require the following steps:
  1. Run the cdsw_protect_stop_restart.sh script to safely stop CDSW.
  2. Backup all your application data.
  3. Reboot all Cloudera Data Science Workbench hosts. As a precaution, you should consult your cluster/IT administrator before you start rebooting hosts.
  4. Proceed with the upgrade to Cloudera Data Science Workbench 1.4.2 (or higher).
These steps have also been added to the upgrade documentation here:

Cloudera Bug: DSE-4098

On a TLS-enabled cluster Cloudera Manager points the Cloudera Data Science Workbench web UI to http:// instead of https://

After upgrading the Cloudera Data Science Workbench parcel and CSD from 1.5.x to 1.6.x, the link to the Cloudera Data Science Workbench web UI from Cloudera Manager redirects to http://cdsw.your-company.com instead of https://cdsw.your-company.com on a TLS-enabled cluster.

Workaround: You can manually enter the complete domain name with the https protocol in your web browser. Alternatively, contact Cloudera Support to obtain a hotfix and the instructions to apply the patch. Quote the following issue while raising the support request: ENGESC-199.

Cloudera Bug: ENGESC-199

CDH Integration

CDH client configuration changes require a full Cloudera Data Science Workbench reset

Cloudera Data Science Workbench does not automatically detect configuration changes on the CDH cluster. Therefore, any changes made to CDH services, ranging from updates to service configuration properties to complete CDH or CDS parcel upgrades, must be followed by a full reset of Cloudera Data Science Workbench.

Workaround: Depending on your deployment, use one of the following sets of steps to perform a full reset of Cloudera Data Science Workbench. Note that this reset does not impact your data in any way.
  • CSD Deployments - To reset Cloudera Data Science Workbench using Cloudera Manager:
    1. Log into the Cloudera Manager Admin Console.
    2. On the Cloudera Manager homepage, click to the right of the CDSW service and select Restart. Confirm your choice on the next screen and wait for the action to complete.
    OR
  • RPM Deployments - Run the following steps on the Cloudera Data Science Workbench master host:

    cdsw stop
    cdsw start

Cloudera Manager Integration

CSD distribution/activation fails on mixed-OS clusters when there are third-party parcels running on OSs that are not supported by Cloudera Data Science Workbench

For example, adding a new CDSW gateway host on a RHEL 6 cluster running RHEL-6 compatible parcels will fail. This is because Cloudera Manager will not allow distribution of the RHEL 6 parcels on the new host which will likely be running a CDSW-compatible operating system such as RHEL 7.

Workaround: To ensure adding a new CDSW gateway host is successful, you must create a copy of the 'incompatible' third-party parcel files and give them the corresponding RHEL 7 names so that Cloudera Manager allows them to be distributed on the new gateway host. Use the following sample instructions to do so:
  1. SSH to the Cloudera Manager Server host.
  2. Navigate to the directory that contains all the parcels. By default, this is /opt/cloudera/parcel-repo.
    cd /opt/cloudera/parcel-repo
  3. Make a copy of the incompatible third-party parcel with the new name. For example, if you have a RHEL 6 parcel that cannot be distributed on a RHEL 7 CDSW host:
    cp <PARCELNAME.cdh5.x.x.p0.123>-el6.parcel <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel
  4. Repeat the previous step for parcel's SHA file.
    cp <PARCELNAME.cdh5.x.x.p0.123>-el6.parcel.sha <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel.sha
  5. Update the new files' owner and permissions to match those of existing parcels in the /opt/cloudera/parcel-repo directory.
    chown cloudera-scm:cloudera-scm <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel
    chown cloudera-scm:cloudera-scm <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel.sha
    chmod 640 <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel
    chmod 640 <PARCELNAME.cdh5.x.x.p0.123>-el7.parcel.sha
    
You should now be able to add new gateway hosts for Cloudera Data Science Workbench to your cluster.

Cloudera Bug: OPSAPS-42130, OPSAPS-31880

CDSW Service health status after a restart does not match the actual state of the application

After a restart, the Cloudera Data Science Workbench service in Cloudera Manager will display Good health even though the Cloudera Data Science Workbench web application might need a few more minutes to get ready to serve requests.

Cloudera Data Science Workbench diagnostics data might be missing from Cloudera Manager diagnostic bundles.

This occurs because the default timeout for Cloudera Manager data collection is currently set to 3 minutes. However, in the case of Cloudera Data Science Workbench, collecting metrics and logs using the cdsw logs command can take longer than 3 minutes.

Workaround: Use the following steps to modify the default timeout for Cloudera Data Science Workbench data collection:
  1. Login to the Cloudera Manager Admin Console.
  2. Go to the CDSW service.
  3. Click Configuration.
  4. Search for the Docker Daemon Diagnostics Collection Timeout property and set it to 5 minutes.
  5. Click Save Changes.

Alternatively, you can generate a diagnostic bundle by running the cdsw logs command directly on the Master host.

Cloudera Bug: OPSAPS-44016, DSE-3160

CDS Powered By Apache Spark

Scala sessions can fail if dependencies take longer than 15 minutes

If the dependencies in spark-defaults.conf (spark.jars, spark.packages, etc) take longer than 15 minutes to resolve, then scala sessions will fail the first time.

Workaround: Use one of the following workarounds:
  • Restart the session.
  • Mount the Spark dependency directory from the CDSW host machines.

On TLS-enabled CDSW deployments, the embedded Spark UI does not work

If you have a TLS-enabled CDSW deployment, the embedded Spark UI tab does not render as expected.

Workaround: To work around this issue, launch the Spark UI in a separate tab and append '/jobs' after the URL. For example, if your engineID is tb0z9ydiua5q9v2d and the DOMAIN is example.com then view the Spark UI at: https://spark-tb0z9ydiua5q9v2d.example.com/jobs/

Alternative workaround: To view running Spark jobs, navigate to Spark History Server UI > Show Incomplete Applications > Application ID

Affected Versions: This issue affects CDSW 1.6.x and CDSW 1.7.x on the following platforms:
  • CDH 5: CDS 2.4 release 2 (and lower)
  • CDH 6: Versions of Spark that ship with CDH 6.0.x, CDH 6.1.x, CDH 6.2.1 (and lower), CDH 6.3.2 (and lower)
Solution: Upgrade to CDSW version 1.7.1 or higher, and either:
  • CDH version 6.4.0, 6.2.2, 6.3.3 or higher
  • CDH 5 with Spark 2.4 release 3

Spark lineage collection is not supported with Cloudera Data Science Workbench

Lineage collection is enabled by default in Spark 2.3. This feature does not work with Cloudera Data Science Workbench because the lineage log directory is not automatically mounted into CDSW engines when a session/job is started.

Affected Versions: CDS 2.3 release 2 (and higher) Powered By Apache Spark

With Spark 2.3 release 3, if Spark cannot find the lineage log directory, it will automatically disable lineage collection for that application. Spark jobs will continue to execute in Cloudera Data Science Workbench, but lineage information will not be collected.

With Spark 2.3 release 2, Spark jobs will fail in Cloudera Data Science Workbench. Either upgrade to Spark 2.3 release 3 which includes a partial fix (as described above) or use one of the following workarounds to disable Spark lineage:

Workaround 1: Disable Spark Lineage Per-Project in Cloudera Data Science Workbench

To do this, set spark.lineage.enabled to false in a spark-defaults.conf file in your Cloudera Data Science Workbench project. This will need to be done individually for each project as required.

Workaround 2: Disable Spark Lineage for the Cluster

  1. Log in to Cloudera Manager and go to the Spark 2 service.
  2. Click Configuration.
  3. Search for the Enable Lineage Collection property and uncheck the checkbox to disable lineage collection.
  4. Click Save Changes.
  5. Go back to the Cloudera Manager homepage and restart the CDSW service for this change to go into effect.

Cloudera Bug: DSE-3720, CDH-67643

Monitoring Spark Applications invoked from CDSW

To monitor spark_on_yarn applications invoked from CDSW, an embedded Spark UI is displayed right next to the session/job. This was achieved by disabling RM proxy. However, with this change, attempts to access the same Spark application using the RM UI will result in Error 500 (connection refused).

Affected Versions: CDSW 1.6 and higher.

Workaround: If the Administrator wants to troubleshoot a running spark-on-yarn application invoked by an end-user from the workbench, the user must share their session using the Share button on the right side of the console. An alternate workaround which will not provide realtime updates is to access the Spark Application UI from the Spark History Server UI > Incomplete Applications.

Cloudera Bug: DSE-4979

Crashes and Hangs

  • Third-party security and orchestration software (such as McAfee, Tanium, Symantec) can lead to CDSW crashing randomly

    Workaround: Disable all third-party security agents on CDSW hosts.

    Cloudera Bug: DSE-8550

  • High I/O utilization on the application block device can cause the application to stall or become unresponsive. Users should read and write data directly from HDFS rather than staging it in their project directories.

  • Installing ipywidgets or a Jupyter notebook into a project can cause Python engines to hang due to an unexpected configuration. The issue can be resolved by deleting the installed libraries from the R engine terminal.

Third-party Editors

  • Logs generated by a browser IDE do not appear within the IDE. They are displayed in the Logs tab for the session.

  • Sessions with Browser IDEs running do not adhere to the limit set in IDLE_MAXIMUM_MINUTES. Session logs show the warning message that states that the idle session will timeout, but the timeout does not occur. The session continues to run and consume resources until the timeout set in SESSION_MAXIMUM_MINUTES is reached. Ensure that you manually stop a session after you are finished, so that the resources are available to other users.

    Cloudera Bug: DSE-6651

  • Sessions with Browser IDEs running time out with no warning after the time limit set in SESSION_MAXIMUM_MINUTES is reached, regardless of whether or not the session is idle. Periodically stop the browser IDE and session manually to avoid reaching SESSION_MAXIMUM_MINUTES.

    Cloudera Bug: DSE-6652

  • The Windows version of the cdswctl utility has the filename, 'cdswctl'. After downloading the client, you must rename this file to 'cdswctl.exe'. Additionally, all commands on Windows that use the name of the client must use this full name, with the extension.

    Fixed Version: Cloudera Data Science Workbench 1.6.1. In version 1.6.1, CDSW automatically adds the required .exe extension to the filename.

    Cloudera Bug: DSE-7035

Engines

  • Configuring duplicate mount points in the site admin panel (Admin > Engines > Mounts) results in sessions crashing in the workbench.

    Cloudera Bug: DSE-3308

  • Spawning remote workers fails in R when the env parameter is not set. For more details, see Distributed Computing with Workers.

    Cloudera Bug: DSE-3384

  • Autofs mounts are not supported with Cloudera Data Science Workbench.

    Cloudera Bug: DSE-2238

  • When using Conda to install Python packages, you must specify the Python version to match the Python versions shipped in the engine image (2.7.11 and 3.6.1). If not specified, the conda-installed Python version will not be used within a project. Pip (pip and pip3) does not face this issue.

  • When engine version 8 (or higher) is used, and the Allow containers to run as root property is disabled, the creation of containers that run with root privileges is prevented. Additionally, the elevation of privileges from the cdsw user to root (for example, using a setuid binary) is also prevented.

    As a result, running the ping command, which is actually a setuid binary, will fail in engine 8 (or higher) when Allow containers to run as root property is disabled.

    $ ping www.google.com
    Ping: icmp open socket: Operation not permitted.

Custom Engine Images

  • Cloudera Data Science Workbench only supports customized engines that are based on the Cloudera Data Science Workbench base image.

  • Cloudera Data Science Workbench does not support creation of custom engines larger than 10 GB.

    Cloudera Bug: DSE-4420

  • Cloudera Data Science Workbench does not support pulling images from registries that require Docker credentials.

    Cloudera Bug: DSE-1521

  • The contents of certain pre-existing standard directories such as /home/cdsw, /tmp, /opt/cloudera, and so on, cannot be modified while creating customized engines. This means any files saved in these directories will not be accessible from sessions that are running on customized engines.

    Workaround: Create a new custom directory in the Dockerfile used to create the customized engine, and save your files to that directory. Or, create a new custom directory on all the Cloudera Data Science Workbench gateway hosts and save your files to those directories. Then, mount this directory to the custom engine.

Experiments

  • Experiments do not store snapshots of project files. You cannot automatically restore code that was run as part of an experiment.

  • Experiments will fail if your project filesystem is too large for the Git snapshot process. As a general rule, any project files (code, generated model artifacts, dependencies, etc.) larger than 50 MB must be part of your project's .gitignore file so that they are not included in snapshots for experiment builds.

  • Experiments cannot be deleted. As a result, be conscious of how you use the track_metrics and track_file functions.
    • Do not track files larger than 50MB.
    • Do not track more than 100 metrics per experiment. Excessive metric calls from an experiment may cause Cloudera Data Science Workbench to hang.
  • The Experiments table will allow you to display only three metrics at a time. You can select which metrics are displayed from the metrics dropdown. If you are tracking a large number of metrics (100 or more), you might notice some performance lag in the UI.

  • Arguments are not supported with Scala experiments.

  • The track_metrics and track_file functions are not supported with Scala experiments.

  • The UI does not display a confirmation when you start an experiment or any alerts when experiments fail.

GPU Support

Only CUDA-enabled NVIDIA GPU hardware is supported

Cloudera Data Science Workbench only supports CUDA-enabled NVIDIA GPU cards.

Heterogeneous GPU hardware is not supported

You must use the same GPU hardware across a single Cloudera Data Science Workbench deployment.

Jobs

  • Emails triggered by jobs don't deliver attachments as expected. Attachments are delivered correctly to the owner of a job. However, any additional external recipients receive either blank attachments or no attachments at all.

    Affected Versions: Cloudera Data Science Workbench 1.6.x

    Cloudera Bug: DSE-8806

  • Cloudera Data Science Workbench does not support changing your API key, or having multiple API keys.

  • Currently, you cannot create a job, stop a job, or get the status of a job using the Jobs API.

Models

  • Known Issues with Model Builds and Deployed Models
    • Re-deploying or re-building models results in model downtime (usually brief).

    • Re-starting Cloudera Data Science Workbench does not automatically restart active models. These models must be manually restarted so they can serve requests again.

      Cloudera Bug: DSE-4950

    • Model deployment will fail if your project filesystem is too large for the Git snapshot process. As a general rule, any project files (code, generated model artifacts, dependencies, etc.) larger than 50 MB must be part of your project's .gitignore file so that they are not included in snapshots for model builds.

    • Model builds will fail if your project filesystem includes a .git directory (likely hidden or nested). Typical build stage errors include:
      Error: 2 UNKNOWN: Unable to schedule build: [Unable to create a checkpoint of current source: [Unable to push sources to git server: ...

      To work around this, rename the .git directory (for example, NO.git) and re-build the model.

      Cloudera Bug: DSE-4657

    • JSON requests made to active models should not be more than 5 MB in size. This is because JSON is not suitable for very large requests and has high overhead for binary objects such as images or video. Call the model with a reference to the image or video, such as a URL, instead of the object itself.

    • Any external connections, for example, a database connection or a Spark context, must be managed by the model's code. Models that require such connections are responsible for their own setup, teardown, and refresh.

    • Model logs and statistics are only preserved so long as the individual replica is active. Cloudera Data Science Workbench may restart a replica at any time it is deemed necessary (such as bad input to the model).

  • Limitations
    • Scala models are not supported.

    • Spawning worker threads is not supported with models.

    • Models deployed using Cloudera Data Science Workbench are not highly-available.

    • Dynamic scaling and auto-scaling are not currently supported. To change the number of replicas in service, you will have to re-deploy the build.

Networking

  • CDSW cannot launch sessions due to connection errors resulting from a segfault

    Sample error:
    transport: Error while dialing dial tcp 100.77.93.252:20051: connect: connection refused
    Workaround: Enable IPv6 on all CDSW hosts
    1. Double-check that IPv6 is currently disabled during boot time, i.e. ipv6.disable should be equal to 1.
      $ dmesg 
      [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-514.el7.x86_64 root=UUID=3e109aa3-f171-4614-ad07-c856f20f9d25 ro console=tty0 crashkernel=auto console=ttyS0,115200 ipv6.disable=1
      $ cat /proc/cmdline
      .....ipv6.disable=1
    2. Edit /etc/default/grub and delete the ipv6.disable=1 entry from GRUB_CMDLINE_LINUX. For example:
      GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=auto rd.lvm.lv=rhel/root"
    3. Run the grub2-mkconfig command to regenerate the grub.cfg file:
      grub2-mkconfig -o /boot/grub2/grub.cfg
      Alternatively, on UEFI systems, you would run the following command:
      grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
    4. Follow the above steps for both CDSW Master and Worker nodes.
    5. Stop the Cloudera Data Science Workbench service.
    6. Reboot all the Cloudera Data Science Workbench hosts to enable IPv6 support.
    7. Start the Cloudera Data Science Workbench service. Run dmesg on the CDSW hosts to ensure there are no segfault errors seen.

    Cloudera Bug: DSE-7238, DSE-7455

  • Custom /etc/hosts entries on Cloudera Data Science Workbench hosts do not propagate to sessions and jobs running in containers.

    Cloudera Bug: DSE-2598

  • Initialisation of Cloudera Data Science Workbench (cdsw init) will fail if localhost does not resolve to 127.0.0.1.

  • Cloudera Data Science Workbench does not support DNS servers running on 127.0.0.1:53. This IP address resolves to the container localhost within Cloudera Data Science Workbench containers. As a workaround, use either a non-loopback address or a remote DNS server.
  • Kubernetes throws the following error when /etc/resolv.conf lists more than three domains:
    Resolv.conf file '/etc/resolv.conf' contains search line consisting of more than 3 domains!
    Due to a limitation in the libc resolver, only two DNS servers are supported in /etc/resolv.conf. Kubernetes uses one additional entry for the cluster DNS.

Security

Working in the terminal or an editor should not count as idle session

If a user opens a workbench and is either working exclusively in the terminal or just editing files, Cloudera Data Science Workbench counts that time as idle time and the user gets kicked out after the configured max idle timeout.

Workaround:
  • Increase the idle session timeout by adding a new environmental variable IDLE_MAXIMUM_MINUTES. Click CDSW > Project > Settings > Environmental variables.

    You can set the value of the variables IDLE_MAXIMUM_MINUTES or SESSION_MAXIMUM_MINUTES to their maximum allowed value, which is 35000 (~3 weeks).

  • Alternatively, run a simple script inside CDSW session to keep the session alive. Opening the Cloudera Data Science Workbench and create a file as shown here (assuming Python project), and then run it in the Workbench.
    import time
    time.sleep(10000)

Cloudera Bug: DSE-3080

SSH access to Cloudera Data Science Workbench hosts must be disabled

The container runtime and application data storage is not fully secure from untrusted users who have SSH access to the gateway hosts. Therefore, SSH access to the gateway hosts for untrusted users should be disabled for security and resource utilization reasons.

TLS/SSL

  • Deployments using a custom Certificate Authority (signed by either their organisation's internal CA or a non-default CA) see HTTP Error 500 when attempting to launch the Terminal or Jupyter Notebook sessions from the Workbench

    This occurs because the engine is not aware of the custom/intermediate CA, which means a few engine operations such as workers, terminal access, and notebooks can fail. Use one of the following workarounds to get past this issue:

    Affected Versions: Cloudera Data Science Workbench 1.6.0. Use one of the workarounds listed below.

    Fixed Version: Cloudera Data Science Workbench 1.6.1.

    If you are using version 1.6.1:
    1. Site administrators can go to the Admin > Security page and paste the internal CA's root certificate file contents directly into the Root CA configuration field.
    2. Configure the REQUESTS_CA_BUNDLE environment variable, either globally in the Site Admin panel (Admin > Engines), or per-project. Set the variable as follows:
      REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
    (For v1.6.0 only) Workaround 1: Add your trusted certificates as a host mount
    1. Download /etc/ssl/certs/ca-certificates.crt: ca-certificates.crt

      The file provided here is from CDSW's base engine v8 image which uses Ubuntu 16.04 and contains 148 certificates from Authorized Root/Intermediate CAs (a root certificate is a special kind of X.509 digital certificate that can be used to issue other certificates).

      To see the list of all issuers included in the file, run the following command:

      awk -v cmd='openssl x509 -noout -issuer' '
       /BEGIN/{close(cmd)};{print | cmd}' < /etc/ssl/certs/ca-certificates.crt
      # Sample output list of issuers
      ......
      issuer= /CN=ACCVRAIZ1/OU=PKIACCV/O=ACCV/C=ES
      issuer= /CN=ACEDICOM Root/OU=PKI/O=EDICOM/C=ES
      issuer= /C=US/O=Amazon/CN=Amazon Root CA 1
      ......
    2. Append all your internal Root CA/Intermediate CA certificate(s) that are there in your trust chain at the end of file that you just downloaded: /etc/ssl/certs/ca-certificates.crt.
      cat ca.crt >> /etc/ssl/certs/ca-certificates.crt
      The format of the certificates should remain the same.
      ---BEGIN CERTIFICATE---
      xxxxxxxxxxxxxxxxxxxx
      xxxxxxxxxxxxxxxxxxxx
      ---END CERTIFICATE---
    3. Use either scp or rsync to copy the modified /etc/ssl/certs/ca-certificates.crt file to all the CDSW nodes (workers, master).
    4. Mount this modified ca-certificates.crt file to the base engine so that it is automatically loaded into the engine every time a session/job/terminal ... starts. To do this:
      1. Login to CDSW as a Site Administrator.
      2. Navigate to Admin > Engines.
      3. Scroll to the Mounts section and add /etc/ssl/certs/ca-certificates.crt to the list of host mounts. You do not need to grant write access.
    5. Configure the REQUESTS_CA_BUNDLE environment variable. This is especially required for Python workloads.
      1. Login to CDSW as a Site Administrator.
      2. Click Admin > Engines.
      3. Under the Environmental Variables section, enter the following name and value and click Add:
        • Name: REQUESTS_CA_BUNDLE
        • Value: /etc/ssl/certs/ca-certificates.crt
    6. For a quick test run, start a new session and run a curl command on your CDSW URL:
      !curl -v https://<your_cdsw_domain>.com
      If it works, the output should say "Cloudera Data Science Workbench". Terminal Access and Jupyter Notebooks should also work now.
    (For v1.6.0 only) Workaround 2: Create a custom Docker image with your trusted internal CA certificates copied into it
    1. Create a directory on your CDSW master node and copy all the required internal CA certificates to it. Make sure that all the PEM certificate files end with the .crt extension.
    2. Create a Dockerfile with instructions to copy all the internal CA certificate files (from the previous step, ending with .crt) into /usr/local/share/ca-certificates/. Then run the update-ca-certificates commands so that local internal CAs will implicitly be trusted by the engine.
      # Dockerfile
      FROM docker.repository.cloudera.com/cdsw/engine:8
      COPY *.crt /usr/local/share/ca-certificates/
      RUN update-ca-certificates
      
    3. Build the custom image.
      docker build --network=host -t customengineimage:8 . -f Dockerfile
      Run the following command to verify that the image was created.
      docker images
    4. Once the custom image is created, follow the steps to distribute and whitelist the new image so that projects can start using the new image.
    5. For a quick test run, start a new session with the new custom engine and run a curl command on your CDSW URL:
      !curl -v https://<your_cdsw_domain>.com
      If it works, the output should say "Cloudera Data Science Workbench". Terminal Access and Jupyter Notebooks should also work now.

    Cloudera Bug: DSE-7237, DSE-7173

  • Self-signed certificates where the Certificate Authority is not part of the user's trust store are not supported for TLS termination. For more details, see Enabling TLS/SSL - Limitations.

  • Cloudera Data Science Workbench does not support the use of encrypted private keys for TLS.

    Cloudera Bug: DSE-1708

  • A "certificate has expired" error displays when you log in to the Cloudera Data Science Workbench web UI. This issue can occur if Cloudera Data Science Workbench exceeds 365 days of continuous uptime because the internal certificate for Kubernetes expires after 1 year.

    Workaround: Restart the Cloudera Data Science Workbench deployment.
    • For CSD installations, restart the Cloudera Data Science Workbench service in Cloudera Manager.
    • For RPM installations, run the following command on the Master host:
      cdsw restart

Kerberos

  • Using Kerberos plugin modules in krb5.conf is not supported.

  • Modifying the default_ccache_name parameter in krb5.conf does not work in Cloudera Data Science Workbench. Only the default path for this parameter, /tmp/krb5cc_${uid}, is supported.

  • PowerBroker-equipped Active Directory is not supported.

    Cloudera Bug: DSE-1838

  • When you upload a Kerberos keytab to authenticate yourself to the CDH cluster, Cloudera Data Science Workbench might display a fleeting error message ('cancelled') in the bottom right corner of the screen, even if authentication was successful. This error message can be ignored.

    Cloudera Bug: DSE-2344

Usability

  • In some cases, the application switcher (grid icon) does not show any other applications, such as Hue or Ranger.

    Cloudera Bug: DSE-865

  • Scala sessions hang when running large scripts (longer than 100 lines) in the Workbench editor.

    Workaround 1:

    Execute the script in manually-selected chunks. For example, highlight the first 50 lines and select Run > Run Line(s).

    Workaround 2:

    Restructure your code by moving content into imported functions so as to bring the size down to under 100 lines.

  • The R engine is unable to display multi-byte characters in plots. Examples of multi-byte characters include languages such as Korean, Japanese, and Chinese.

    Workaround: Use the showtext R package to support more fonts and characters. For example, to display Korean characters:
    install.packages('showtext')
    library(showtext)
    font_add_google("Noto Sans KR", "noto")
    showtext_auto()

    Cloudera Bug: DSE-7308

  • In a scenario where 100s of users are logged in and creating processes, the nproc and nofile limits of the system may be reached. Use ulimits or other methods to increase the maximum number of processes and open files that can be created by a user on the system.

  • When rebooting, Cloudera Data Science Workbench hosts can take a significant amount of time (about 30 minutes) to become ready.

  • Long-running operations such as fork and clone can time out when projects are large or connections outlast the HTTP timeouts of reverse proxies.

  • The Scala kernel does not support auto-complete features in the editor.

  • Scala and R code can sometimes indent incorrectly in the workbench editor.

    Cloudera Bug: DSE-1218