Cloudera Data Science Workbench Release Notes

Cloudera Data Science Workbench 1.6.0

This section lists the release notes for Cloudera Data Science Workbench 1.6.0.

New Features and Changes in Cloudera Data Science Workbench 1.6.0

  • Bring Your Own Editor

    You can now take advantage of all the benefits of Cloudera Data Science Workbench while using an editor you are familiar with. This feature supports third-party IDEs that run on your local machine like PyCharm and browser-based IDEs such as Juypter. Base Image v8 ships with Jupyter preconfigured and can be selected from the Start Session menu.

    For details, see Editors.

  • Multiple Cloudera Data Science Workbench Deployments

    You can now have multiple Cloudera Data Science Workbench CSD deployments associated with one instance of Cloudera Manager.

    For details, see Multiple Cloudera Data Science Workbench Deployments.

  • Audits

    Cloudera Data Science Workbench logs specific events, such as user logins and sharing, that you can view by querying a database. For more information, see Monitoring User Events and Tracked User Events.

  • Expanded Support for Distributed Machine Learning

    Cloudera Data Science Workbench 1.6 (and higher) allows you to run distributed workloads with frameworks such as TensorFlowOnSpark, H2O, XGBoost, and so on. This is similar to what you can already do with Spark workloads that run on the attached CDH/HDP cluster. For details, see Running Distributed ML Workloads on YARN.

  • cdswctl CLI Client

    The cdwctl client provides an additional way to interact with your Cloudera Data Science Workbench deployment to perform certain actions. For example, you can use the cdswctl client to start an SSH-endpoint on your local machine and then connect a local IDE, such as PyCharm, to Cloudera Data Science Workbench.

    You can download cdswctlfrom the Cloudera Data Science Workbench web UI and use it from your local machine. Note that this client differs from the cdsw CLI tool used to run commands such as cdsw status, which exists within the Cloudera Data Science Workbench deployment.

    For details, see cdswctl Command Line Interface Client.

  • Status and Validate Commands

    The CDSW service in Cloudera Manager now includes two new commands that can be used to assess the status of your Cloudera Data Science Workbench deployment: Status and Validate. They are the equivalent of the cdsw status and cdsw validate commands that are available via the CLI.

    For details, see Checking the Status of the CDSW Service.

  • Experiments

    • If your cluster has been equipped with GPUs, you can now use GPUs to run experiments on Cloudera Data Science Workbench.
    • Tracked experiment files now refresh and appear automatically on the Overview page for a run of an experiment. Previously, you had to manually refresh the page after an experiment completes.
  • Command Line Interface (CLI) Changes - RPM Deployments only

    • The cdsw reset command has been removed and replaced by the cdsw stop command.
    • The cdsw init command has been removed and replaced by the cdsw start command.

    For details on how these commands behave on the master and worker hosts, refer to the Cloudera Data Science Workbench Command Line Reference.

  • Kubernetes and Weave

    Kubernetes has been upgraded to version 1.11.7. Weave Net has been upgraded to version 2.5.1. This upgrade resolves Weave issue #2934.

  • Logs

    • Staging Directory

      You can now configure the temporary directory that Cloudera Data Science Workbench uses to stage logs when collecting a diagnostic bundle. Old logs in the directory are deleted when a new diagnostic bundle is collected or when the size grows larger than 10 MB.

    • Logs tab

      Running sessions now display a Logs tab. This tab displays engine logs and, if applicable, Spark logs for the running session. Previously, if you wanted to access these logs, that required logging into the Cloudera Data Science Workbench host(s) and the Spark server.

      For details, see Diagnostic Bundles.

  • Operating System

    Cloudera Data Science Workbench 1.6 supports RHEL and CentOS 7.6.

  • Workload Scheduling Changes

    • Starting with version 1.6, Cloudera Data Science Workbench allows you to specify a list of CDSW gateway hosts that are labeled as Auxiliary Nodes. These hosts will be deprioritized during workload scheduling. That is, they will be chosen to run workloads that can’t be scheduled on any other hosts. For example, sessions with very large resource requests, or when the other hosts are fully utilized.

      For details, see Customize Workload Scheduling.

    • Reserve Master Host

      Cloudera Data Science Workbench 1.4.3 introduced a new feature that allowed you to reserve the CDSW Master host for running internal application components. Starting with version 1.6, this feature can be enabled on CSD-based deployments using the Reserve Master Host property in Cloudera Manager. Safety valves are no longer needed.

      For details, see Reserving the Master Host for Internal CDSW Components.

  • Security
    • FreeIPA Support

      In addition to MIT Kerberos and Active Directory, Cloudera Data Science Workbench now also supports FreeIPA as an identity management system. For details, see Configure FreeIPA.

    • New User Role - Operator

      Version 1.6 includes a new access role called Operator. When a user is assigned the Operator role on a project, they will be able to start and stop pre-existing jobs and will have view-only access to project code, data, and results.

    • Restricting User-Controlled Kubernetes Pods

      Cloudera Data Science Workbench 1.6 includes three new properties that allow you to control the permissions granted to user-controlled Kubernetes pods. An example of a user-controlled pod is the engine pod, which provides the environment for sessions, jobs, etc. These pods are launched in a per-user Kubernetes namespace. Since the user has the ability to launch arbitrary pods, these settings restrict what those pods can do.

      For details, see Restricting User-Controlled Kubernetes Pods.

    • LDAP/SAML Configuration Changes

      Previously, if you wanted to grant the site administrator role to users of an LDAP/SAML group, that group had to be listed under 2 properties: LDAP/SAML Full Administrator Groups and LDAP/SAML User Groups. If a group was only listed under LDAP/SAML Full Administrator Groups, and not under LDAP/SAML User Groups, users of that group would not be able to log in to CDSW.

      With version 1.6, you do not need to list the admin groups under both properties. Users belonging to groups listed under LDAP/SAML Full Administrator Groups will be able to log in and have site administrator access to Cloudera Data Science Workbench as expected.

    • Project and Team Creation

      Site administrators can now restrict whether or not users can create projects or teams with the following properties on the Settings page:
      • Allow users to create projects
      • Allow users to create teams
      For details, see User Access to Features.
    • Session Tokens

      The method by which the Cloudera Data Science Workbench web UI session tokens are stored has been hardened. Users must log out of the Cloudera Data Science Workbench web UI and back in after upgrading to version 1.6.0.

    • Sharing

      Site administrators can now control whether consoles can be shared with the Allow console output sharing property on the Admin > Security page. Disable this property to remove the Share button from the project workspace and workbench UI as well as disable access to all shared console outputs across the deployment. Note that re-enabling this property does not automatically grant access to previously shared consoles. You will need to manually share each console again.

    • TLS/SSL

      Cloudera Data Science Workbench now defaults to using TLS 1.2. The default cipher suites have also been upgraded to Mozilla's Modern cipher suites.

  • Spark UI

    The Spark UI is now available as a tab within running sessions that use Spark.

Engine Upgrade

Cloudera Data Science Workbench 1.6.0 (and later) ships version 8 of the base engine image which includes the following versions of R and Python:
  • R - 3.5.1
  • Python - 2.7.11, 3.6.1

Pre-installed Packages in Engine 8

For details about the packages included in the base engine, see Cloudera Data Science Workbench Engine Versions and Packaging.

(For Upgrades Only) Move Existing Projects to the Latest Base Engine Images

Make sure you test and upgrade existing projects to Base Image v8 (Project Settings > Engine) to take advantage of the latest fixes. There are two reasons to do this:
  • Container Security

    Security best practices dictate that engine containers should not run as the root user. Engines (v7 and lower) briefly initialize as the root user and then run as the cdsw user. Engines v8 (and higher) now follow the best practice and run only as the cdsw user. For more details, see Restricting User-Created Pods.

  • CDH 6 Compatibility

    The base engine image you use must be compatible with the version of CDH you are running. This is especially important if you are running workloads on Spark. Older base engines (v6 and lower) cannot support the latest versions of CDH 6. If you want to run Spark workloads on CDH 6, you must upgrade your projects to base engine 7 (or higher).

Incompatible Changes in Cloudera Data Science Workbench 1.6.0

  • SLES 12 SP2, SP3 are not supported with Cloudera Data Science Workbench 1.6.0

    SLES 12 SP2 and SP3 have reached the end of general support with SUSE and will not be supported with Cloudera Data Science Workbench 1.6.0 (and higher).

  • GPU Setup Changes
    • nvidia-docker1 is no longer supported.
    • The NVIDIA Library Path property is no longer available.

      Cloudera Data Science Workbench 1.6 ships with nvidia-docker2 installed by default. The path to the NVIDIA library volumes is also set automatically when GPUs are enabled. Review the revised GPU setup steps here: Enabling Cloudera Data Science Workbench to use GPUs.

  • The CDSW_PUBLIC_PORT environment variable has been deprecated and will be removed in a future release. Use CDSW_APP_PORT or CDSW_READONLY_PORT environment variables instead.

    For details, see Engine Environment Variables.

Issues Fixed in Cloudera Data Science Workbench 1.6.0

  • Fixed an issue where you had to include pd.options.display.html.table_schema = True to show a horizontal scroll bar for Pandas Dataframe if there were too many columns. You no longer have to include the property.

    Cloudera Issue: DSE-3562

  • Fixed an issue where the built-in Workbench editor did not properly recognize imported code that uses tabs instead of spaces. This also resolves navigation issues that occurred within the editor when working with imported code that uses tabs.

    Cloudera Issue: DSE-2976, DSE-3221

  • Fixed an issue where an email with attachments triggered by a job fail to send if the attachment is over 4 MB.

    Cloudera Issue: DSE-5980, DSE-6003

  • Fixed an issue where large R scripts hang when run in the built-in Workbench editor.

    Cloudera Issue: DSE-2817

  • Fixed an issue where .md files were not rendered in Markdown. Previously, only README.md was rendered correctly.

    Cloudera Issue: DSE-3315

  • Fixed an issue with predict.py, the model training script in the Python template project.

    Cloudera Issue: DSE-5314

  • Fixed an issue where logs generated by the Cloudera Data Science Workbench diagnostic bundle were occupying too much space the /var/log/cdsw directory. The size of the generated bundle has been reduced and you can now configure a temporary staging directory to be used when a diagnostic bundle is generated.

    Cloudera Issue: DSE-5921

  • The cdsw-build.sh script used with models and experiments now runs as the cdsw user.

    Cloudera Issue: DSE-4340

  • The changes to GPU support in version 1.6 have also fixed an issue where GPUs were not automatically detected after a machine reboot.

    Cloudera Issue: DSE-2847

  • Fixed an issue where iFrame visualizations would not render in the Workbench due to the new HTTP security headers added in version 1.4.x.

    Cloudera Issue: DSE-5274

Known Issues and Limitations in Cloudera Data Science Workbench 1.6.0

Cloudera Data Science Workbench 1.5.0

This section lists the release notes for Cloudera Data Science Workbench 1.5.0.

New Features and Changes in Cloudera Data Science Workbench 1.5.0

  • Cloudera Enterprise 6.1 Support

    Cloudera Data Science Workbench is now supported with Cloudera Manager 6.1.x (and higher) and CDH 6.1.x (and higher). For details, see Cloudera Manager and CDH Requirements.

  • Cloudera Data Science Workbench on Hortonworks Data Platform (HDP)

    Cloudera Data Science Workbench can now be deployed on HDP 2.6.5 and HDP 3.1.0. For an architecture overview and installation instructions, see Deploying Cloudera Data Science Workbench 1.6.x on Hortonworks Data Platform.

  • Security Enhancements
    • Allow Site Administrators to Enable/Disable Project Uploads and Downloads - By default, all Cloudera Data Science Workbench users are allowed to upload and download files to/from a project. Version 1.5 introduces a new feature flag that allows site administrators to hide the UI features that let users upload and download project files.

      Note that this feature flag only removes the relevant features from the Cloudera Data Science Workbench UI. It does not disable the ability to upload and download files through the backend web API.

      For details on how to enable this feature, see Disabling Project File Uploads and Downloads.

  • OpenJDK Support

    Cloudera Data Science Workbench now supports Open JDK 8 on Cloudera Enterprise 5.16.1 (and higher). For details, see Product Compatibility Matrix - Supported JDK Versions.

  • Engines

    • Base engine upgraded with a new version of R - 3.5.1 (Base Image v7)
    • Debugging Improvements - Previously, engines and their associated logs were deleted immediately after an exit or a crash. With version 1.5, engines are now available for about 5 minutes after they have ended to allow you to collect the relevant logs.

      Additionally, when an engine exits with a non-zero status code, the last 50 lines from the engine's logs are now printed to the Workbench console. Note that a non-zero exit code and the presence of engine logs in the Workbench does not always imply a problem with the code. Events such as session timeouts and out-of-memory issues are also assigned non-zero exit codes and will display engine logs.

  • Installation and Upgrade
    • New Configuration Parameters - Version 1.5 includes three new configuration parameters that can be used to specify the type of distribution you are running, the directory for the installed packages/parcels, and the path where Anaconda is installed (for HDP only).
      • DISTRO
      • DISTRO_DIR
      • ANACONDA_DIR
      Details and sample values for these properties have been added to the relevant installation topics for CDH and HDP.
    • DOCKER_TMPDIR changed to /var/lib/cdsw/tmp/docker - Previously the Cloudera Data Science Workbench installer would temporarily decompress the base engine image file to the /var/lib/docker/tmp directory. Starting with version 1.5, the installer will use the /var/lib/cdsw/tmp/docker directory instead. Make sure you have an Application block device mounted to /var/lib/cdsw as recommended so that installation/upgrade can proceed without issues.
    • Improved Validation Checks - Improved the validation checks run by the installer and the error messages that are displayed during the installation process. Cloudera Data Science Workbench now:
      • Checks that space is available on the root directory, the Application Block Device and the Docker Block Device(s).
      • Checks that DNS forward and reverse lookup works for the Cloudera Data Science Workbench Domain and Master IP address provided.
      • Displays better error messages for the cdsw status and cdsw validate commands for easier debugging.
  • Command Line

    • cdsw logs - Previously, the cdsw logs command generated two log bundles - one in plaintext and one with sensitive information redacted. With version 1.5, the command now generates only a single bundle that has all the sensitive information redacted by default.

      To turn off redaction of log files for internal use, you can use the new --skip-redaction option as follows:
      cdsw logs --skip-redaction
  • Networking
    • Cloudera Data Science Workbench now uses DNS hostnames (not IP addresses) for internal communication between components. As a result, the wildcard DNS hostname configured for Cloudera Data Science Workbench must now be resolvable from both, the CDSW cluster, and your browser.

    • Cloudera Data Science Workbench now enables IPv4 forwarding (net.ipv4.conf.default.forwarding) during the installation process.

Engine Upgrade

Cloudera Data Science Workbench 1.5.0 (and later) ships version 7 of the base engine image which includes the following versions of R and Python:
  • R - 3.5.1
  • Python - 2.7.11, 3.6.1

Pre-installed Packages in Engine 7 - For details about the packages included in the base engine, see Cloudera Data Science Workbench Engine Versions and Packaging.

Upgrade Projects to Use the Latest Base Engine Images - Make sure you test and upgrade existing projects to Base Image v7 (Project Settings > Engine) to take advantage of the latest fixes.

Note that this is a required step if you are upgrading to using Cloudera Data Science Workbench on CDH 6.

The base engine image you use must be compatible with the version of CDH you are running. This is especially important if you are running workloads on Spark. Older base engines (v6 and lower) cannot support the latest versions of CDH 6. That is because these engines were configured to point to the Spark 2 parcel. However, on CDH 6 clusters, Spark is now packaged as a part of CDH 6 and the separate add-on Spark 2 parcel is no longer supported. If you want to run Spark workloads on CDH 6, you must upgrade your projects to base engine 7 (or higher).

CDSW Base Engine Compatibility for Spark Workloads on CDH 5 and CDH 6
Base Engine Versions CDH 5 CDH 6
Base engines 6 (and lower) Yes No
Base engines 7 (and higher) Yes Yes

Incompatible Changes in Cloudera Data Science Workbench 1.5.0

Deprecated Property - CDH Parcel Directory

The CDH parcel directory property is no longer available in the Site Administration panel at Admin > Engines. Depending on your deployment, use one of the following ways to configure this property:
  • CSD deployments: If you are using the default parcel directory, /opt/cloudera/parcels, no action is required. If you want to use a custom location for the parcel directory, configure this in Cloudera Manager as documented here.
  • RPM deployments: If you are using the default parcel directory, /opt/cloudera/parcels, no action is required. If you want to specify a custom location for the parcel directory, configure the DISTRO_DIR property in the cdsw.conf file on both master and worker hosts. Run cdsw restart after you make this change.

Issues Fixed in Cloudera Data Science Workbench 1.5.0

  • Fixed an issue with RPM installations where NO_PROXY settings were being ignored.

    Cloudera Bug: DSE-4444

  • Fixed an issue where CDSW would not start because of IP issues with web pods. Version 1.5 fixes this by enabling IPv4 forwarding at startup.

    Cloudera Bug: DSE-4609

  • Fixed an issue where engines would get deleted immediately after an exit/crash and engine logs did not persist which made it difficult to debug issues with crashes or auto-restarts.

    Cloudera Bug: DSE-4008, DSE-4417

  • Fixed intermittent issues with starting and stopping Cloudera Data Science Workbench on CSD deployments.

    Cloudera Bug: DSE-4426, DSE-4829

  • Fixed an issue where Cloudera Data Science Workbench was reporting incorrect file sizes for files larger than 2 MB.

    Cloudera Bug: DSE-4531, DSE-4532

  • Fixed an issue where the Run New Experiment dialog box did not include the file selector and the Script name had to be typed in manually.

    Cloudera Bug: DSE-3650

  • Fixed an issue where underlying Kubernetes processes were running out of resources leading to Out of Memory (OOM) errors. Cloudera Data Science Workbench now reserves compute resources for Kubernetes components.

    Cloudera Bug: DSE-4896, DSE-5001

  • Fixed an issue where the PYSPARK3_PYTHON environment variable was not working as expected for Python 3 workloads.

    Cloudera Bug: DSE-4329

  • Fixed an issue where Docker commands would fail on Cloudera Data Science Workbench engines that are not available locally (such as custom engine images) when an HTTP/HTTPS proxy was in use.

    Cloudera Bug: DSE-4427

  • Fixed an issue where installation of the XML package would fail in the R kernel.

    Cloudera Bug: DSE-2201

Known Issues and Limitations in Cloudera Data Science Workbench 1.5.0

For a complete list of the current known issues and limitations in Cloudera Data Science Workbench 1.5.x, see Known Issues in Cloudera Data Science Workbench 1.5.x.

Cloudera Data Science Workbench 1.4.3

This section lists the release notes for Cloudera Data Science Workbench 1.4.3.

New Features and Changes in Cloudera Data Science Workbench 1.4.3

  • Reserve Master Host for Internal Application Components

    Cloudera Data Science Workbench now allows you to reserve the master host for running internal application components and services such as Livelog, the PostgreSQL database, and so on, while user workloads run exclusively on worker hosts.

    By default, the master host runs both, user workloads as well as the application's internal services. However, depending on the size of your CDSW deployment and the number of workloads running at any given time, it's possible that user workloads might dominate resources on the master host. Enabling this feature will ensure that CDSW's application components always have access to the resources they need on the master host and are not adversely affected by user workloads.

    For details on how to enable this feature, see Reserving the Master Host for Internal CDSW Components.

  • Allow Only Session Creators to Execute Commands in Active Sessions

    By default, project contributors, project administrators, and site administrators have the ability to execute commands within your actively running sessions in the Workbench. Cloudera Data Science Workbench 1.4.3 introduces a new feature that allows site administrators to restrict this ability. When this feature is enabled, only the user that creates the session will be able to execute commands in that session. No other users, regardless of their permissions in the team or as project collaborators/administrators, will be able to execute commands on active sessions that were not created by them.

    For details on how to enable this feature, see Restricting Collaborator and Administrator Access to Active Sessions.

Issues Fixed in Cloudera Data Science Workbench 1.4.3

TSB-349: SQL Injection Vulnerability in Cloudera Data Science Workbench

An SQL injection vulnerability was found in Cloudera Data Science Workbench. This would allow any authenticated user to run arbitrary queries against CDSW’s internal database. The database contains user contact information, bcrypt-hashed CDSW passwords (in the case of local authentication), API keys, and stored Kerberos keytabs.

Products affected: Cloudera Data Science Workbench (CDSW)

Releases affected: CDSW 1.4.0, 1.4.1, 1.4.2

Users affected: All

Date/time of detection: 2018-10-18

Detected by: Milan Magyar (Cloudera)

Severity (Low/Medium/High): Critical (9.9): CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H

Impact: An authenticated CDSW user can arbitrarily access and modify the CDSW internal database. This allows privilege escalation in CDSW, Kubernetes, and the Linux host; creation, deletion, modification, and exfiltration of data, code, and credentials; denial of service; and data loss.

CVE: CVE-2018-20091

Immediate action required:

  1. Strongly consider performing a backup before beginning. We advise you to have a backup before performing any upgrade and before beginning this remediation work.

  2. Upgrade to Cloudera Data Science Workbench 1.4.3 (or higher).

  3. In an abundance of caution Cloudera recommends that you revoke credentials and secrets stored by CDSW. To revoke these credentials:

    1. Change the password for any account with a keytab or kerberos credential that has been stored in CDSW. This includes the Kerberos principals for the associated CDH cluster if entered on the CDSW “Hadoop Authentication” user settings page.

    2. With Cloudera Data Science Workbench 1.4.3 running, run the following remediation script on each CDSW host, including the master and all workers: Remediation Script for TSB-349

      Note: Cloudera Data Science Workbench will become unavailable during this time.

    3. The script performs the following actions:
      1. If using local user authentication, logs out every user and resets their CDSW password.

      2. Regenerates or deletes various keys for every user.

      3. Resets secrets used for internal communications.

    4. Fully stop and start Cloudera Data Science Workbench (a restart is not sufficient).

      • For CSD-based deployments, restart the CDSW service in Cloudera Manager.

        OR

      • For RPM-based deployments, run cdsw stop followed by cdsw start on the CDSW master host.

    5. If using internal TLS termination: revoke and regenerate the CDSW TLS certificate and key.

    6. For each user, revoke the previous CDSW-generated SSH public key for git integration on the git side (the private key in CDSW has already been deleted). A new SSH key pair has already been generated and should be installed in the old key’s place.

    7. Revoke and regenerate any credential stored within a CDSW project, including any passwords stored in projects’ environment variables.

  4. Verify all CDSW settings to ensure they are unchanged (e.g. SMTP server, authentication settings, custom docker images, host mounts, etc).

  5. Treat all CDSW hosts as potentially compromised with root access. Remediate per your policy.

Addressed in release/refresh/patch: Cloudera Data Science Workbench 1.4.3

For the latest update on this issue see the corresponding Knowledge article:

TSB 2019-349: CDSW SQL Injection Vulnerability

TSB-350: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

Stopping Cloudera Data Science Workbench involves unmounting the NFS volumes that store CDSW project directories and then cleaning up a folder where CDSW stores its temporary state. However, due to a race condition, this NFS unmount process can take too long or fail altogether. If this happens, any CDSW projects that remain mounted will be deleted.

TSB-2018-346 was released in the time-frame of CDSW 1.4.2 to fix this issue, but it only turned out to be a partial fix. With CDSW 1.4.3, we have fixed the issue permanently. However, the script that was provided with TSB-2018-346 still ensures that data loss is prevented and must be used to shutdown/restart all the affected CDSW released listed below. The same script is also available under the Immediate Action Required section below.

Products affected: Cloudera Data Science Workbench

Releases affected: Cloudera Data Science Workbench versions
  • 1.0.x

  • 1.1.x

  • 1.2.x

  • 1.3.0, 1.3.1

  • 1.4.0, 1.4.1, 1.4.2

Users affected: This potentially affects all CDSW users.

Detected by: Nehmé Tohmé (Cloudera)

Severity (Low/Medium/High): High

Impact: Potential data loss.

CVE: N/A

Immediate action required: If you are running any of the affected Cloudera Data Science Workbench versions, you must run the following script on the CDSW master host every time before you stop or restart Cloudera Data Science Workbench. Failure to do so can result in data loss.

This script should also be run before initiating a Cloudera Data Science Workbench upgrade. As always, we recommend creating a full backup prior to beginning an upgrade.

cdsw_protect_stop_restart.sh - Available for download at: cdsw_protect_stop_restart.sh.

#!/bin/bash

set -e

cat << EXPLANATION


This script is a workaround for Cloudera TSB-346 and TSB-350. It protects your
CDSW projects from a rare race condition that can result in data loss.
Run this script before stopping the CDSW service, irrespective of whether
the stop precedes a restart, upgrade, or any other task.

Run this script only on the master node of your CDSW cluster.

You will be asked to specify a target folder on the master node where the
script will save a backup of all your project files. Make sure the target
folder has enough free space to accommodate all of your project files. To
determine how much space is required, run 'du -hs /var/lib/cdsw/current/projects'
on the CDSW master node.

This script will first back up your project files to the specified target
folder. It will then temporarily move your project files aside to protect
against the data loss condition. At that point, it is safe to stop the CDSW
service. After CDSW has stopped, the script will move the project files back
into place.

Note: This workaround is not required for CDSW 1.4.3 and higher.



EXPLANATION

read -p "Enter target folder for backups: " backup_target

echo "Backing up to $backup_target..."
rsync -azp /var/lib/cdsw/current/projects "$backup_target"

read -n 1 -p "Backup complete. Press enter when you are ready to stop CDSW: "

echo "Deleting all Kubernetes resources..."
kubectl delete configmaps,deployments,daemonsets,replicasets,services,ingress,secrets,persistentvolumes,persistentvolumeclaims,jobs --all
kubectl delete pods --all

echo "Temporarily saving project files to /var/lib/cdsw/current/projects_tmp..."
mkdir /var/lib/cdsw/current/projects_tmp
mv /var/lib/cdsw/current/projects/* /var/lib/cdsw/current/projects_tmp

echo -e "Please stop the CDSW service."

read -n 1 -p "Press enter when CDSW has stopped: "

echo "Moving projects back into place..."
mv /var/lib/cdsw/current/projects_tmp/* /var/lib/cdsw/current/projects
rm -rf /var/lib/cdsw/current/projects_tmp

echo -e "Done. You may now upgrade or start the CDSW service."
echo -e "When CDSW is running, if desired, you may delete the backup data at $backup_target"

Addressed in release/refresh/patch: This issue is fixed in Cloudera Data Science Workbench 1.4.3.

Note that you are required to run the workaround script above when you upgrade from an affected version to a release with the fix. This helps guard against data loss when the affected version needs to be shut down during the upgrade process.

TSB-351: Unauthorized Project Access in Cloudera Data Science Workbench

Malicious CDSW users can bypass project permission checks and gain read-write access to any project folder in CDSW.

Products affected: Cloudera Data Science Workbench

Releases affected: Cloudera Data Science Workbench 1.4.0, 1.4.1, 1.4.2

Users affected: All CDSW Users

Date/time of detection: 10/29/2018

Detected by: Che-Yuan Liang (Cloudera)

Severity (Low/Medium/High): High (8.3: CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:L)

Impact: Project data can be read or written (changed, destroyed) by any Cloudera Data Science Workbench user.

CVE: CVE-2018-20090

Immediate action required:

Upgrade to a version of Cloudera Data Science Workbench with the fix (version 1.4.3, 1.5.0, or higher).

Addressed in release/refresh/patch: Cloudera Data Science Workbench 1.4.3 (and higher)

For the latest update on this issue see the corresponding Knowledge article:

TSB 2019-351: Unauthorized Project Access in Cloudera Data Science Workbench

Other Notable Fixed Issues in Cloudera Data Science Workbench 1.4.3

  • Fixed an issue where malicious Cloudera Data Science Workbench users were able to bypass project permission checks and gain read-write access to any project folder in Cloudera Data Science Workbench.

    Cloudera Bug: DSE-5138

  • Fixed an issue where Cloudera Data Science Workbench would become unresponsive because the web application was making too many simultaneous requests to the Kubernetes API server. CDSW now caches calls to the API and refreshes the cache periodically.

    Cloudera Bug: DSE-5265, DSE-5269

  • Fixed an issue where Cloudera Data Science Workbench workloads would intermittently crash with Exit Code 2: Misuse of Shell builtins.

    Cloudera Bug: DSE-4709

  • Fixed an issue where Cloudera Data Science Workbench would not start when internal TLS termination was enabled and the TLS private key/certificate pair in use did not include a trailing newline character.

    Cloudera Bug: DSE-4853

Known Issues and Limitations in Cloudera Data Science Workbench 1.4.3

For a complete list of the current known issues and limitations in Cloudera Data Science Workbench 1.4.x, see Known Issues and Limitations in Cloudera Data Science Workbench 1.6.x.

Cloudera Data Science Workbench 1.4.2

This section lists the release notes for Cloudera Data Science Workbench 1.4.2.

New Features and Changes in Cloudera Data Science Workbench 1.4.2

  • Operating System: Added support for RHEL / CentOS / Oracle Linux RHCK 7.5.

  • Engines
    • Mounts - By default, host mounts (specified at Admin > Engines > Mounts) are loaded into engine containers with read-only permissions. With version 1.4.2, a new checkbox allows you to make these mounted directories available in engine containers with read-write permissions instead.
    • Engine upgrade (Base Image v6)
  • Models
    • In Cloudera Data Science Workbench 1.4.0, model request sizes were limited to 100 KB. In version 1.4.2, this limit has now been increased to 5 MB. To take advantage of this higher threshold, you will need to upgrade to Cloudera Data Science Workbench 1.4.2 and rebuild your existing models.
  • Security
    Added three new properties to the Admin > Security page that allow you to customize HTTP headers accepted by Cloudera Data Science Workbench.
    • Enable HTTP security headers
    • Enable cross-origin resource sharing (CORS)
    • Enable HTTP Strict Transport Security (HSTS)
    For details, see Configuring HTTP Headers for Cloudera Data Science Workbench.

Engine Upgrade

Cloudera Data Science Workbench 1.4.2 ships version 6 of the base engine image which includes the following versions of R and Python:
  • R - 3.4.1
  • Python - 2.7.11, 3.6.1

Pre-installed Packages in Engine 6 - For details about the packages included in the base engine, see Cloudera Data Science Workbench Engine Versions and Packaging.

Additionally, Cloudera Data Science Workbench will now alert you when a new engine version is available. Make sure you test and upgrade existing projects to Base Image v6 (Project Settings > Engine) to take advantage of the latest fixes.

Issues Fixed in Cloudera Data Science Workbench 1.4.2

TSB-346: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

Stopping Cloudera Data Science Workbench involves unmounting the NFS volumes that store CDSW project directories and then cleaning up a folder where the kubelet stores its temporary state. However, due to a race condition, this NFS unmount process can take too long or fail altogether. If this happens, CDSW projects that remain mounted will be deleted by the cleanup step.

Products affected: Cloudera Data Science Workbench

Releases affected: Cloudera Data Science Workbench versions -
  • 1.0.x

  • 1.1.x

  • 1.2.x

  • 1.3.0, 1.3.1

  • 1.4.0, 1.4.1

Users affected: This potentially affects all CDSW users.

Detected by: Nehmé Tohmé (Cloudera)

Severity (Low/Medium/High): High

Impact: If the NFS unmount fails during shutdown, data loss can occur. All CDSW project files might be deleted.

CVE: N/A

Immediate action required: If you are running any of the affected Cloudera Data Science Workbench versions, you must run the following script on the CDSW master host every time before you stop or restart Cloudera Data Science Workbench. Failure to do so can result in data loss.

This script should also be run before initiating a Cloudera Data Science Workbench upgrade. As always, we recommend creating a full backup prior to beginning an upgrade.

cdsw_protect_stop_restart.sh - Available for download at: cdsw_protect_stop_restart.sh.

#!/bin/bash

set -e

cat << EXPLANATION


This script is a workaround for Cloudera TSB-346. It protects your
CDSW projects from a rare race condition that can result in data loss.
Run this script before stopping the CDSW service, irrespective of whether
the stop precedes a restart, upgrade, or any other task.

Run this script only on the master node of your CDSW cluster.

You will be asked to specify a target folder on the master node where the
script will save a backup of all your project files. Make sure the target
folder has enough free space to accommodate all of your project files. To
determine how much space is required, run 'du -hs /var/lib/cdsw/current/projects'
on the CDSW master node.

This script will first back up your project files to the specified target
folder. It will then temporarily move your project files aside to protect
against the data loss condition. At that point, it is safe to stop the CDSW
service. After CDSW has stopped, the script will move the project files back
into place.

Note: This workaround is not required for CDSW 1.4.2 and higher.



EXPLANATION

read -p "Enter target folder for backups: " backup_target

echo "Backing up to $backup_target..."
rsync -azp /var/lib/cdsw/current/projects "$backup_target"

read -n 1 -p "Backup complete. Press enter when you are ready to stop CDSW: "

echo "Deleting all Kubernetes resources..."
kubectl delete configmaps,deployments,daemonsets,replicasets,services,ingress,secrets,persistentvolumes,persistentvolumeclaims,jobs --all
kubectl delete pods --all

echo "Temporarily saving project files to /var/lib/cdsw/current/projects_tmp..."
mkdir /var/lib/cdsw/current/projects_tmp
mv /var/lib/cdsw/current/projects/* /var/lib/cdsw/current/projects_tmp

echo -e "Please stop the CDSW service."

read -n 1 -p "Press enter when CDSW has stopped: "

echo "Moving projects back into place..."
mv /var/lib/cdsw/current/projects_tmp/* /var/lib/cdsw/current/projects
rm -rf /var/lib/cdsw/current/projects_tmp

echo -e "Done. You may now upgrade or start the CDSW service."
echo -e "When CDSW is running, if desired, you may delete the backup data at $backup_target"

Addressed in release/refresh/patch: This issue is fixed in Cloudera Data Science Workbench 1.4.2.

Note that you are required to run the workaround script above when you upgrade from an affected version to a release with the fix. This helps guard against data loss when the affected version needs to be shut down during the upgrade process.

For the latest update on this issue see the corresponding Knowledge article:

TSB 2018-346: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart

TSB-328: Unauthenticated User Enumeration in Cloudera Data Science Workbench

Unauthenticated users can get a list of user accounts of Cloudera Data Science Workbench.

Products affected: Cloudera Data Science Workbench

Releases affected: Cloudera Data Science Workbench 1.4.0 (and lower)

Users affected: All users of Cloudera Data Science Workbench 1.4.0 (and lower)

Date/time of detection: June 11, 2018

Severity (Low/Medium/High): 5.3 (Medium) CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:N/A:N

Impact: Unauthenticated user enumeration in Cloudera Data Science Workbench.

CVE: CVE-2018-15665

Immediate action required: Upgrade to the latest version of Cloudera Data Science Workbench (1.4.2 or higher).

Note that Cloudera Data Science Workbench 1.4.1 is no longer publicly available due to TSB 2018-346: Risk of Data Loss During Cloudera Data Science Workbench (CDSW) Shutdown and Restart.

Addressed in release/refresh/patch: Cloudera Data Science Workbench 1.4.2 (and higher)

For the latest update on this issue see the corresponding Knowledge article:

TSB 2018-318: Unauthenticated User Enumeration in Cloudera Data Science Workbench

Other Notable Fixed Issues in Cloudera Data Science Workbench 1.4.2

  • Fixed an issue where attempting to fork a large project would result in unexpected 'out of memory' errors.

    Cloudera Bug: DSE-4464

  • Fixed an issue in version 1.4.0 where Cloudera Data Science Workbench workloads would intermittently get stuck in the Scheduling state due to a Red Hat kernel slab leak.

    Cloudera Bug: DSE-4098

  • Fixed an issue in version 1.4.0 where the Hadoop username on non-kerberized clusters defaulted to cdsw. This was a known issue and has been fixed in version 1.4.2. The Hadoop username will now once again default to your Cloudera Data Science Workbench username.

    Cloudera Bug: DSE-4240

  • Fixed an issue in version 1.4.0 where creating a project using Git via SSH did not work.

    Cloudera Bug: DSE-4278

  • Fixed an issue in version 1.4.0 where environmental variables set in the Admin panel were not being propagated to projects (experiments, sessions, jobs) as expected.

    Cloudera Bug: DSE-4422

  • Fixed an issue in version 1.4.0 where Cloudera Data Science Workbench would not start when external TLS termination was enabled.

    Cloudera Bug: DSE-4640

  • Fixed an issue in version 1.4.0 where HTTP/HTTPS proxy settings in Cloudera Manager were erroneously escaped when propagated to Cloudera Data Science Workbench engines.

    Cloudera Bug: DSE-4421

  • Fixed an issue in version 1.4.0 where SSH tunnels did not work as expected.

    Cloudera Bug: DSE-4741

  • Fixed an issue in version 1.4.0 where copying multiple files into a folder resulted in unexpected behavior such as overwritten files and incorrect UI messages.

    Cloudera Bug: DSE-4831

  • Fixed an issue in version 1.4.0 where workers (in engines) and collection of usage metrics failed on TLS-enabled clusters.

    Cloudera Bug: DSE-4293, DSE-4572

  • Fixed an issue in version 1.4.0 where the Files > New Folder dialog box did not work.

    Cloudera Bug: DSE-4807

  • Fixed an issue in version 1.4.0 where deleting an experiment did not work from certain dashboards. Consequently, deleting the parent project would also fail in such cases.

    Cloudera Bug: DSE-4227

Known Issues and Limitations in Cloudera Data Science Workbench 1.4.2

For a complete list of the current known issues and limitations in Cloudera Data Science Workbench 1.4.x, see Known Issues and Limitations in Cloudera Data Science Workbench 1.6.x.

Cloudera Data Science Workbench 1.4.0

This section lists the release notes for Cloudera Data Science Workbench 1.4.0.

New Features in Cloudera Data Science Workbench 1.4.0

  • Models and Experiments - Cloudera Data Science Workbench 1.4 extends the machine learning platform experience from research to production. Now you can use Cloudera Data Science Workbench to build, train, and deploy models in a unified workflow.
    • Experiments - Train and compare versioned, reproducible models

    • Models - Deploy and manage models as REST APIs to serve predictions

  • External Authentication
    • LDAP/SAML users can now restrict access to Cloudera Data Science Workbench to specific LDAP/SAML groups. Additionally, you can now specify groups that should automatically be granted site administrator privileges when they log in to Cloudera Data Science Workbench. For details, see Configuring External Authentication with LDAP and SAML.

    • Cloudera Data Science Workbench now supports multiple identity provider signing certificates for SAML 2.0 authentication.

    • Cloudera Data Science Workbench now supports SAML 2.0 Errata 05 E43 for SAML 2.0 authentication.

  • Projects and Workbench
    • Site administrators can now disable individual built-in template projects by using a checkbox in the Project Templates table at Admin > Settings. Only enabled project templates will be displayed in the dropdown menu when creating a new project.

    • The default .gitignore file that is created with each new project has been updated to:

      R
      node_modules
      *.pyc
      .*
      !.gitignore
    • Added support for multiple Terminal windows within a single session.

  • Networking
    • Cloudera Data Science Workbench now supports DNS resolution of localhost to non-local IP address (not 127.0.0.1).

    • Cloudera Data Science Workbench now appends the following default values to the NO_PROXY parameter if any of the following properties are configured: HTTP_PROXY, HTTPS_PROXY, or ALL_PROXY.
      "127.0.0.1,localhost,100.66.0.1,100.66.0.2,100.66.0.3,
      100.66.0.4,100.66.0.5,100.66.0.6,100.66.0.7,100.66.0.8,100.66.0.9,
      100.66.0.10,100.66.0.11,100.66.0.12,100.66.0.13,100.66.0.14,
      100.66.0.15,100.66.0.16,100.66.0.17,100.66.0.18,100.66.0.19,
      100.66.0.20,100.66.0.21,100.66.0.22,100.66.0.23,100.66.0.24,
      100.66.0.25,100.66.0.26,100.66.0.27,100.66.0.28,100.66.0.29,
      100.66.0.30,100.66.0.31,100.66.0.32,100.66.0.33,100.66.0.34,
      100.66.0.35,100.66.0.36,100.66.0.37,100.66.0.38,100.66.0.39,
      100.66.0.40,100.66.0.41,100.66.0.42,100.66.0.43,100.66.0.44,
      100.66.0.45,100.66.0.46,100.66.0.47,100.66.0.48,100.66.0.49,
      100.66.0.50,100.77.0.10,100.77.0.128,100.77.0.129,100.77.0.130,
      100.77.0.131,100.77.0.132,100.77.0.133,100.77.0.134,100.77.0.135,
      100.77.0.136,100.77.0.137,100.77.0.138,100.77.0.139"
  • Installation Validation Checks - Improved validation checks run during the installation process. Cloudera Data Science Workbench now:
    • Verifies that the wildcard DNS subdomain has been configured.
    • Verifies that resolv.conf is not pointing to 127.0.0.1.
    • Validates iptables chains to ensure there are no custom rules being set.
    • Throws a warning if you are using a self-signed TLS certificate, an expired certificate, or if the certificate is not valid for the wildcard domain used for Cloudera Data Science Workbench.
  • Command Line - Added a verbose option to the cdsw status command.
    cdsw status [-v|--verbose]
  • Kubernetes has been upgraded to version 1.8.12.

Engine Upgrade

Cloudera Data Science Workbench 1.4.0 (and later) ships version 5 of the base engine image which includes the following versions of R and Python:
  • R - 3.4.1
  • Python - 2.7.11, 3.6.1

Pre-installed Packages in Engine 5 - For details about the packages included in the base engine, see Cloudera Data Science Workbench Engine Versions and Packaging.

Additionally, Cloudera Data Science Workbench will now alert you when a new engine version is available. Make sure you test and upgrade existing projects to Base Image v5 (Project Settings > Engine) to take advantage of the latest fixes.

Incompatible Changes in Cloudera Data Science Workbench 1.4.0

Host Mounts are now Read-Only in Engines - Previously, mounts (specified at Admin > Engines > Mounts) were loaded into engine containers with read-write permissions.

Starting with version 1.4.0, mount points are now loaded into engines with read-only permissions.

Issues Fixed in Cloudera Data Science Workbench 1.4.0

  • Fixed an issue where Git would timeout when cloning a project took too long. The timeout has now been increased to 60 seconds when creating a new project from Git.

    Cloudera Bug: DSE-3363

  • Fixed an issue where manual parcel deployments could not detect parcel hash files with a .sha1 extension.

    Cloudera Bug: DSE-3301

  • Fixed several usability issues (file create, save, and so on) with Internet Explorer 11.

    Cloudera Bug: DSE-3426, DSE-3434

  • Fixed an issue where CSD installations would fail to recognize Oracle Linux 7.3 as a supported operating system.

    Cloudera Bug: DSE-3257

  • Fixed an issue where Cloudera Data Science Workbench would hang with 100 percent CPU utilization.

    Cloudera Bug: DSE-3450

  • Fixed a SAML 2.0 configuration issue where uploading the identity provider metadata XML file did not update identity provider signing certificate and/or SSO URL on Cloudera Data Science Workbench correctly.

    Cloudera Bug: DSE-3076

  • Fixed an issue with SAML 2.0 authentication where the identity provider’s signature was not being validated correctly.

    Cloudera Bug: DSE-3694

  • Fixed the Save As functionality in the project Workbench.

    Cloudera Bug: DSE-3870

  • Fixed an issue where if a user had some files opened in the Workbench in a previous session, and those files no longer existed in the project filesystem, a File Not Found error would occur when opening the Workbench.

    Cloudera Bug: DSE-3835

Known Issues and Limitations in Cloudera Data Science Workbench 1.4.0

For a complete list of the current known issues and limitations in Cloudera Data Science Workbench 1.4.x, see Known Issues and Limitations in Cloudera Data Science Workbench 1.6.x.

Cloudera Data Science Workbench 1.3.1

This section lists the release notes for Cloudera Data Science Workbench 1.3.1.

New Features in Cloudera Data Science Workbench 1.3.1

  • Operating System: Added support for RHEL / CentOS / Oracle Linux RHCK 7.5.
  • SAML
    • Cloudera Data Science Workbench now supports multiple identity provider signing certificates for SAML 2.0 authentication.
    • Cloudera Data Science Workbench now supports SAML 2.0 Errata 05 E43 for SAML 2.0 authentication.

Issues Fixed in Cloudera Data Science Workbench 1.3.1

Remote Command Execution and Information Disclosure in Cloudera Data Science Workbench

A configuration issue in Kubernetes used by Cloudera Data Science Workbench can allow remote command execution and privilege escalation in CDSW. A separate information permissions issue can cause the LDAP bind password to be exposed to authenticated CDSW users when LDAP bind search is enabled.

Products affected: Cloudera Data Science Workbench

Releases affected: Cloudera Data Science Workbench 1.3.0 (and lower)

Users affected: All users of Cloudera Data Science Workbench 1.3.0 (and lower)

Date/time of detection: May 16, 2018

Severity (Low/Medium/High): High

Impact: Remote command execution and information disclosure

CVE: CVE-2018-11215

Immediate action required: Upgrade to the latest version of Cloudera Data Science Workbench (1.3.1 or higher) and change the LDAP bind password if previously configured in Cloudera Data Science Workbench.

Addressed in release/refresh/patch: Cloudera Data Science Workbench 1.3.1 (and higher)

For the latest update on this issue see the corresponding Knowledge Base article:

TSB: 2018-313: Remote Command Execution and Information

Other Notable Fixed Issues in Cloudera Data Science Workbench 1.3.1

  • Fixed an issue where CSD installations would fail to recognize Oracle Linux 7.3 as a supported operating system.

    Cloudera Bug: DSE-3257

  • Fixed several usability issues (file create, save, and so on) with Internet Explorer 11.

    Cloudera Bug: DSE-3426, DSE-3434

  • Fixed a SAML 2.0 configuration issue where uploading the identity provider metadata XML file did not update identity provider signing certificate and/or SSO URL on Cloudera Data Science Workbench correctly.

    Cloudera Bug: DSE-3265

  • Fixed an issue where the owner of a console output could not view their own shared consoles from sessions /job runs when sharing with Specific user/team.

    Cloudera Bug: DSE-3143

  • Fixed issue with missing connectors in Jobs dependency chart.

    Cloudera Bug: DSE-3185

Known Issues and Limitations in Cloudera Data Science Workbench 1.3.1

For a list of the current known issues and limitations, refer the documentation for version 1.3.x at Cloudera Data Science Workbench 1.3.x.

Cloudera Data Science Workbench 1.3.0

This section lists the release notes for Cloudera Data Science Workbench 1.3.0.

New Features and Changes in Cloudera Data Science Workbench 1.3.0

  • Added support for SUSE Linux Enterprise Server 12 SP3.

  • Site administrators can now add template projects that are customized for their organization's use-cases.

  • Version 1.3 introduces a new environment variable for Python 3 sessions called PYSPARK3_PYTHON. Python 2 sessions will continue to use the default PYSPARK_PYTHON variable. This will allow you to configure distinct variables for Python 2 and Python 3 applications.

  • In the Cloudera Manager CDSW service, the Wildcard DNS Domain property has been renamed to Cloudera Data Science Workbench Domain.

  • Output for the cdsw version command now includes the type of deployment you are running – RPM or CSD.

  • Added log4j and spark-defaults sample configuration to the PySpark and Scala template projects.

Issues Fixed in Cloudera Data Science Workbench 1.3.0

  • Fixed an issue where the cdsw status command failed to run all the required system checks.

    Cloudera Bug: DSE-3070

  • Session lists now include additional metadata to help distinguish between different sessions.

    Cloudera Bug: DSE-2814

  • Pre-install validation checks have been improved to detect issues with iptables modules and Java settings.

    Cloudera Bug: DSE-2293

  • Fixed an issue with the cdsw status command output when TLS is enabled.

    Cloudera Bug: DSE-3182

  • CDS 2.2 Release 2 fixes the issue where a PySpark application could only be run once per active Workbench session.

    Cloudera Bug: CDH-58475

  • Fixed an issue that prevented Bokeh plots from rendering.

    Cloudera Bug: DSE-3134

  • Fixed an issue in Cloudera Data Science Workbench 1.2.2 that prevented WebSocket re-connections and caused console hangs.

    Cloudera Bug: DSE-3085

  • Improved CDSW service restart performance for CSD deployments.

    Cloudera Bug: DSE-2937

Incompatible Changes in Cloudera Data Science Workbench 1.3.0

Deploying Cloudera Data Science Workbench with Cloudera Director 2.7

While this is not a Cloudera Data Science Workbench change, you should note that Cloudera Director 2.7 includes a new instance-level setting that sets the mountAllUnmountedDisks property to false:
normalizationConfig {
 mountAllUnmountedDisks: false
}

This means Cloudera Director 2.7 (and higher) users no longer need to set lp.normalization.mountAllUnmountedDisksRequired to false in the Cloudera Director server's application.properties file. Note that Cloudera Director 2.6 still requires this setting.

Known Issues and Limitations in Cloudera Data Science Workbench 1.3.0

For a list of the current known issues and limitations, refer the documentation for version 1.3.x at Cloudera Data Science Workbench 1.3.x.

Cloudera Data Science Workbench 1.2.2

This section lists the release notes for Cloudera Data Science Workbench 1.2.2. The documentation for version 1.2.x can be found at Cloudera Data Science Workbench 1.2.x.

New Features and Changes in Cloudera Data Science Workbench 1.2.2

  • Added support for SUSE Linux Enterprise Server 12 SP2.
  • Added support for multi-homed networks.
  • Cloudera Director now allows you to deploy CSD-based Cloudera Data Science Workbench 1.2.x deployments on AWS. For more specifics on supported platforms, see Cloudera Altus Director Support (AWS and Azure Only).
  • Added a new environment variable called MAX_TEXT_LENGTH that allows you to set the maximum number of characters that can be displayed in a single text cell. By default, this value is set to 800,000 and any more characters will be truncated.

Engine Upgrade

Cloudera Data Science Workbench 1.2.2 (and later) ships version 4 of the base engine image which includes bug fixes related to Python development and Kerberos authentication. Engine 4 ships the following versions of R and Python:
  • R - 3.4.1
  • Python - 2.7.11, 3.6.1
For details about the packages included in the base engine, see Cloudera Data Science Workbench Engine Versions and Packaging.

Make sure you upgrade existing projects to Base Image v4 (Project Settings > Engine) to take advantage of these fixes.

The new engine also changes how you configure and use Conda in Python sessions and extended engines. For more details, see Using Conda with Cloudera Data Science Workbench.

Issues Fixed In Cloudera Data Science Workbench 1.2.2

  • Fixed an issue where Conda environmental variables were not being propagated to the Terminal correctly.

    Cloudera Bug: DSE-2256

  • Fixed an issue where GPUs were not being detected by Cloudera Data Science Workbench due to incorrect mount settings.

    Cloudera Bug: DSE-2957

  • Fixed an issue where jobs were failing due to Kerberos TGT renewal issues.

    Cloudera Bug: DSE-1007

  • Fixed an issue on Internet Explorer 10 and 11 where the browser would fail to render console output after launching too many interactive sessions.

    Cloudera Bug: DSE-2998, DSE-2979

  • Cloudera Data Science Workbench now correctly renders HTML that contains iFrames with the srcdoc attribute.

    Cloudera Bug: DSE-2034

  • Fixed an issue where logging in using LDAP/Active Directory would sometimes crash the Cloudera Data Science Workbench web application.

    Cloudera Bug: DSE-2672

  • The file tree in the Workbench now refreshes correctly when switching between sessions or launching a new session.

    Cloudera Bug: DSE-2829

  • Fixed a file descriptors leak that would cause the "Failed to get Kubernetes client configuration" error in Cloudera Manager.

    Cloudera Bug: DSE-2910

  • Fixed an issue where the host-controller process was consuming too much CPU. This was occurring due to a bug in the Kubernetes client-go library.

    Cloudera Bug: DSE-2993

Known Issues and Limitations in Cloudera Data Science Workbench 1.2.2

For a list of known issues and limitations, refer the documentation for version 1.2.x at Cloudera Data Science Workbench 1.2.x.

Cloudera Data Science Workbench 1.2.1

This section lists the release notes for Cloudera Data Science Workbench 1.2.1. The documentation for version 1.2.x can be found at Cloudera Data Science Workbench 1.2.x.

Issues Fixed In Cloudera Data Science Workbench 1.2.1

  • The Master Node IPv4 Address parameter has been added to Cloudera Manager's Add Service wizard and is now a required parameter for installation on AWS. This should fix any related installation issues for deployments on AWS.

    Cloudera Bug: DSE-2879

  • Fixed an issue with CSD-based deployments where certain operations would fail because the Prepare Node command was not installing all the required packages during First Run of the service. To see the updated list of packages that are now being installed by the Prepare Node command, refer the CSD install guide.

    Cloudera Bug: DSE-2869

  • Fixed an issue where the LD_LIBRARY_PATH environmental variable was not getting propagated to CUDA engines.

    Cloudera Bug: DSE-2828

  • Fixed an issue where stopping Cloudera Data Science Workbench on worker hosts resulted in the application hanging indefinitely.

    Cloudera Bug: DSE-2880

Incompatible Changes in Cloudera Data Science Workbench 1.2.1

Upgrading from Cloudera Data Science Workbench 1.2.0 to 1.2.1 on CSD-based deployments

After upgrading from Cloudera Data Science Workbench 1.2.0 to 1.2.1 on a CSD-based deployment, CLI commands might not work as expected due to missing binaries in the environment. Note that this issue does not affect fresh installs.

Known Issues and Limitations in Cloudera Data Science Workbench 1.2.1

For a list of known issues and limitations, refer the documentation for version 1.2.x at Cloudera Data Science Workbench 1.2.x.

Cloudera Data Science Workbench 1.2.0

This section lists the release notes for Cloudera Data Science Workbench 1.2.0. The documentation for version 1.2.x can be found at Cloudera Data Science Workbench 1.2.x.

New Features and Changes in Cloudera Data Science Workbench 1.2.0

  • Cloudera Data Science Workbench is now available as an add-on service for Cloudera Manager. To this end, Cloudera Data Science Workbench is now distributed in a parcel that integrates with Cloudera Manager using a Custom Service Descriptor (CSD). You can use Cloudera Manager to install, upgrade, and monitor Cloudera Data Science Workbench. Diagnostic data bundles can be generated and submitted to Cloudera through Cloudera Manager.
  • Cloudera Data Science Workbench now enables secure sharing of job and session consoles. Additionally, site administrators can disable anonymous sharing from the Site Administrator dashboard (Admin > Security). See Sharing Job and Session Console Outputs.
  • The Admin > Usage page now includes graphs for monitoring usage activity such as number of CPUs or GPUs used, memory usage, and total session runs, over customizable periods of time.
  • Cloudera Data Science Workbench now lets you configure session, job, and idle timeouts. These can be configured using environmental variables either for the entire deployment or per-project.
  • The cdsw enable and disable commands are no longer needed. The master host will now automatically detect the IP addresses of worker hosts joining or leaving Cloudera Data Science Workbench. See the revised Cloudera Data Science Workbench Command Line Reference.
  • The Kudu Python client is now included in the Cloudera Data Science Workbench base engine image.
  • Interactive session names can now be modified by project contributors and admins. By default, session names are set to 'Untitled Session'.
  • All-numeric usernames are now accepted.
  • Kubernetes has been upgraded to version 1.6.11.

Engine Upgrade

  • Cloudera Data Science Workbench 1.2.0 ships version 3 of the base engine image which includes matplotlib improvements and the Kudu client libraries. Engine 3 ships the following versions of R and Python:

    • R - 3.4.1
    • Python - 2.7.11, 3.6.1

    Make sure you upgrade existing projects to Base Image v3 (Project Settings > Engine) to take advantage of the new features and bug fixes included in the new engine.

Issues Fixed in Cloudera Data Science Workbench 1.2.0

Privilege Escalation and Database Exposure in Cloudera Data Science Workbench

Several web application vulnerabilities allowed malicious authenticated Cloudera Data Science Workbench (CDSW) users to escalate privileges in CDSW. In combination, such users could exploit these vulnerabilities to gain root access to CDSW hosts, gain access to the CDSW database which includes Kerberos keytabs of CDSW users and bcrypt hashed passwords, and obtain other privileged information such as session tokens, invitations tokens, and environmental variables.

Products affected: Cloudera Data Science Workbench

Releases affected: Cloudera Data Science Workbench 1.0.0, 1.0.1, 1.1.0, 1.1.1

Users affected: All users of Cloudera Data Science Workbench 1.0.0, 1.0.1, 1.1.0, 1.1.1

Date/time of detection: September 1, 2017

Detected by: NCC Group

Severity (Low/Medium/High): High

Impact: Privilege escalation and database exposure.

CVE: CVE-2017-15536

Addressed in release/refresh/patch: Cloudera Data Science Workbench 1.2.0 or higher.

Immediate action required: Upgrade to the latest version of Cloudera Data Science Workbench.

Other Notable Fixed Issues in Cloudera Data Science Workbench 1.2.0

  • Fixed an issue where the Workbench editor screen jumps unexpectedly when typing or scrolling.
  • Fixed auto-scroll behavior in the Workbench console. This was a browser compatibility issue that affected Chrome and Firefox, but not Safari.
  • Fixed an issue where if a user logged out of Cloudera Data Science Workbench, and logged back in as a different user, they may see a SecurityError message in the Workbench.
  • Fixed an issue that was preventing site administrators from uploading the SAML metadata file.
  • Fixed several issues related to plotting with matplotlib. If you have previously used any workarounds for plotting, you might consider removing them now.
  • Engines now use the same build of Kerberos utilities (ktutil, kinit, and klist) as the rest of Cloudera Data Science Workbench. This will improve logs obtained from kinit and make debugging Kerberos issues easier.
  • KRB5_TRACE is now included in the error logs obtained when you kinit.
  • Fixed an issue that was affecting health checks in deployments using AWS elastic load balancing.

Incompatible Changes in Cloudera Data Science Workbench 1.2.0

Proxy Configuration Change: If you are using a proxy server, you must ensure that the IP addresses for the web and Livelog services are skipped from the proxy.

Depending on your deployment (parcel or package), append the following IP addresses to either the No Proxy property in the Cloudera Manager CDSW service, or to the NO_PROXY parameter in cdsw.conf.
100.77.0.129
100.77.0.130

These have also been added to the installation instructions.

Known Issues and Limitations in Cloudera Data Science Workbench 1.2.0

For a list of known issues and limitations, refer the documentation for version 1.2.x at Cloudera Data Science Workbench 1.2.x.

Cloudera Data Science Workbench 1.1.1

This section lists the release notes for Cloudera Data Science Workbench 1.1.1. The documentation for version 1.1.x can be found at Cloudera Data Science Workbench 1.1.x.

New Features in Cloudera Data Science Workbench 1.1.1

  • Keytab Authentication - With version 1.1.1, you can now authenticate yourself to the CDH cluster by uploading your Kerberos keytab to Cloudera Data Science Workbench. To use this feature, go to the top-right dropdown menu, click Account settings > Hadoop Authentication, enter your Kerberos principal and click Upload Keytab.

Issues Fixed In Cloudera Data Science Workbench 1.1.1

  • Fixed an issue with airgapped installations where the installer could not pull the alpine 3.4 image into the airgapped environment.
  • Fixed an issue where Cloudera Data Science Workbench would fail to log a command trace when the Kerberos process exits.
  • Fixed authentication issues with older versions of MIT KDC.

Known Issues and Limitations in Cloudera Data Science Workbench 1.1.1

For a list of known issues and limitations, refer the documentation for version 1.1.x at Cloudera Data Science Workbench 1.1.x.

Cloudera Data Science Workbench 1.1.0

This section lists the release notes for Cloudera Data Science Workbench 1.1.0. The documentation for version 1.1.x can be found at Cloudera Data Science Workbench 1.1.x.

New Features and Changes in Cloudera Data Science Workbench 1.1.0

  • Added support for RHEL/CentOS 7.3 and Oracle Linux 7.3.

  • Cloudera Data Science Workbench now allows you to run GPU-based workloads. For more details, see Using NVIDIA GPUs for Cloudera Data Science Workbench Projects.

  • For Cloudera Manager and CDH clusters that are not connected to the Internet, Cloudera Data Science Workbench now supports fully offline installations. See the installation guide for more details.

  • Web UIs for processing frameworks such as Spark 2, Tensorflow, and Shiny, are now embedded in Cloudera Data Science Workbench and can be accessed directly from active sessions and jobs. For more details, see Accessing Web User Interfaces from Cloudera Data Science Workbench.

  • Added support for a Jobs REST API that lets you orchestrate jobs from 3rd party workflow tools. See Cloudera Data Science Workbench Jobs API.

  • DataFrames are now scrollable in the workbench session output pane. For examples, see the section on Grid Displays.

  • Added support for rich visualizations in Scala engine using Jupyter jvm-repr. For an example, see HTML Visualizations - Scala.

  • JAVA_HOME is now set in cdsw.conf, and not from the Site Administrator dashboard (Admin > Engines).

Engine Upgrade

Cloudera Data Science Workbench 1.1.0 ships version 2 of the base engine image that includes new versions of Pandas, seaborn, and assorted bug fixes. Engine 2 ships the following versions of R and Python:

  • R - 3.3.0
  • Python - 2.7.11, 3.6.1

Make sure you upgrade existing projects to Base Image v2 (Project Settings > Engine) to take advantage of the new features and bug fixes included in the new engine.

Issues Fixed in Cloudera Data Science Workbench 1.1.0

  • Improved support for dynamic data visualizations in Python, including Bokeh.

  • Fixed issues with the Python template project. The project now supports offline mode and will therefore work on airgapped clusters.

  • Fixed issues related to cached responses in Internet Explorer 11.

  • Fixed issues with Java symlinks outside of JAVA_HOME.

  • The cdsw status command can now be run on worker hosts.

  • Removed unauthenticated localhost access to Kubernetes.

  • Fixed Kerberos authentication issues with specific enc-types and Active Directory.

  • Removed restrictions on usernames with special characters for better compatibility with external authentication systems such as Active Directory.

  • Fixed issues with LDAP configuration validation that caused application crashes.

  • Improved LDAP test configuration form to avoid confusion on parameters being sent.

Incompatible Changes in Cloudera Data Science Workbench 1.1.0

  • Upgrading from version 1.0.x to 1.1.x

    During the upgrade process, you will encounter incompatibilities between the two versions of cdsw.conf. This is because even though you are installing the latest RPM, your previous configuration settings in cdsw.conf will remain unchanged. Depending on the release you are upgrading from, you will need to modify cdsw.conf to ensure it passes the validation checks run by the 1.1.x release.

    Key changes to note:
    • JAVA_HOME is now a required parameter. Make sure you add JAVA_HOME to cdsw.conf before you start Cloudera Data Science Workbench.
    • Previous versions allowed MASTER_IP to be set to a DNS hostname. If you are still using a DNS hostname, switch to an IP address.
  • Python engine updated in version 1.1.x

    Version 1.1.x includes an updated base engine image for Python which no longer uses the deprecated pylab mode in Jupyter to import the numpy and matplotlib functions into the global scope. With version 1.1.x, engines will now use built-in functions like any rather than the pylab counterpart, numpy.any. As a result of this change, you might see certain behavioral changes and differences in results between the two versions.

    Also note that Python projects originally created with engine 1 will be running pandas version 0.19, and will not auto-upgrade to version 0.20 by simply selecting engine 2. You will also need to manually install version 0.20.1 of pandas when you launch a project session.

Known Issues and Limitations in Cloudera Data Science Workbench 1.1.0

For a list of known issues and limitations, refer the documentation for version 1.1.x at Cloudera Data Science Workbench 1.1.x.

Cloudera Data Science Workbench 1.0.1

This section lists the release notes for Cloudera Data Science Workbench 1.0.1. The documentation for version 1.0.x can be found at Cloudera Data Science Workbench 1.0.x.

Issues Fixed in Cloudera Data Science Workbench 1.0.1

  • Fixed a random port conflict that could prevent Scala engines from running.

  • Improved formatting of validation, and visibility of some errors.

  • Fixed an issue with Firefox that was resulting in duplicate jobs on job creation.

  • Removed the Mathjax external dependency on CDN.

  • Improved PATH and JAVA_HOME handling that previously broke Hadoop CLIs.

  • Fixed an issue with Java security policy files that caused Kerberos issues.

  • Fixed an issue that caused git clone to fail on some repositories.

  • Fixed an issue where updating LDAP admin settings deactivated the local fallback login.

  • Fixed an issue where bad LDAP configuration crashed the application.

  • Fixed an issue where job environmental variable settings did not persist.

Known Issues and Limitations in Cloudera Data Science Workbench 1.0.x

For a list of known issues and limitations, refer the documentation for version 1.0.x at Cloudera Data Science Workbench 1.0.x.

Cloudera Data Science Workbench 1.0.0

Version 1.0 represents the first generally available (GA) release of Cloudera Data Science Workbench. For information about the main features and benefits of Cloudera Data Science Workbench, as well as an architectural overview of the product, see Cloudera Data Science Workbench Overview.