Cloudera Data Science Workbench Release Notes

Cloudera Data Science Workbench 1.3.0

This section lists the release notes for Cloudera Data Science Workbench 1.3.0.

New Features and Changes in Cloudera Data Science Workbench 1.3.0

  • Added support for SUSE Linux Enterprise Server 12 SP3.

  • Site administrators can now add template projects that are customized for their organization's use-cases.

  • Version 1.3 introduces a new environment variable for Python 3 sessions called PYSPARK3_PYTHON. Python 2 sessions will continue to use the default PYSPARK_PYTHON variable. This will allow you to configure distinct variables for Python 2 and Python 3 applications.

  • In the Cloudera Manager CDSW service, the Wildcard DNS Domain property has been renamed to Cloudera Data Science Workbench Domain.

  • Output for the cdsw version command now includes the type of deployment you are running – RPM or CSD.

  • Added log4j and spark-defaults sample configuration to the PySpark and Scala template projects.

Issues Fixed in Cloudera Data Science Workbench 1.3.0

  • Fixed an issue where the cdsw status command failed to run all the required system checks.

    Cloudera Bug: DSE-3070

  • Session lists now include additional metadata to help distinguish between different sessions.

    Cloudera Bug: DSE-2814

  • Pre-install validation checks have been improved to detect issues with iptables modules and Java settings.

    Cloudera Bug: DSE-2293

  • Fixed an issue with the cdsw status command output when TLS is enabled.

    Cloudera Bug: DSE-3182

  • Cloudera Distribution of Spark 2.2 Release 2 fixes the issue where a PySpark application could only be run once per active Workbench session.

    Cloudera Bug: CDH-58475

  • Fixed an issue that prevented Bokeh plots from rendering.

    Cloudera Bug: DSE-3134

  • Fixed an issue in Cloudera Data Science Workbench 1.2.2 that prevented WebSocket re-connections and caused console hangs.

    Cloudera Bug: DSE-3085

  • Improved CDSW service restart performance for CSD deployments.

    Cloudera Bug: DSE-2937

Incompatible Changes in Cloudera Data Science Workbench 1.3.0

Deploying Cloudera Data Science Workbench with Cloudera Director 2.7

While this is not a Cloudera Data Science Workbench change, you should note that Cloudera Director 2.7 includes a new instance-level setting that sets the mountAllUnmountedDisks property to false:
normalizationConfig {
 mountAllUnmountedDisks: false
}

This means Cloudera Director 2.7 (and higher) users no longer need to set lp.normalization.mountAllUnmountedDisksRequired to false in the Cloudera Director server's application.properties file. Note that Cloudera Director 2.6 still requires this setting.

Known Issues and Limitations in Cloudera Data Science Workbench 1.3.0

For a list of the current known issues and limitations in Cloudera Data Science Workbench 1.3.x, see Known Issues and Limitations in Cloudera Data Science Workbench 1.3.x.

Cloudera Data Science Workbench 1.2.2

This section lists the release notes for Cloudera Data Science Workbench 1.2.2. The documentation for version 1.2.x can be found at Cloudera Data Science Workbench 1.2.x.

New Features and Changes in Cloudera Data Science Workbench 1.2.2

  • Added support for SUSE Linux Enterprise Server 12 SP2.
  • Added support for multi-homed networks.
  • Cloudera Director now allows you to deploy CSD-based Cloudera Data Science Workbench 1.2.x deployments on AWS. For more specifics on supported platforms, see Cloudera Director Support (AWS Only).
  • Added a new environment variable called MAX_TEXT_LENGTH that allows you to set the maximum number of characters that can be displayed in a single text cell. By default, this value is set to 800,000 and any more characters will be truncated.

Engine Upgrade

Cloudera Data Science Workbench 1.2.2 (and later) ships version 4 of the base engine image which includes bug fixes related to Python development and Kerberos authentication. Engine 4 ships the following versions of R and Python:
  • R - 3.4.1
  • Python - 2.7.11, 3.6.1

Make sure you upgrade existing projects to Base Image v4 (Project Settings > Engine) to take advantage of these fixes.

The new engine also changes how you configure and use Conda in Python sessions and extended engines. For more details, see Using Conda with Cloudera Data Science Workbench.

Issues Fixed In Cloudera Data Science Workbench 1.2.2

  • Fixed an issue where Conda environmental variables were not being propagated to the Terminal correctly.

    Cloudera Bug: DSE-2256

  • Fixed an issue where GPUs were not being detected by Cloudera Data Science Workbench due to incorrect mount settings.

    Cloudera Bug: DSE-2957

  • Fixed an issue where jobs were failing due to Kerberos TGT renewal issues.

    Cloudera Bug: DSE-1007

  • Fixed an issue on Internet Explorer 10 and 11 where the browser would fail to render console output after launching too many interactive sessions.

    Cloudera Bug: DSE-2998, DSE-2979

  • Cloudera Data Science Workbench now correctly renders HTML that contains iFrames with the srcdoc attribute.

    Cloudera Bug: DSE-2034

  • Fixed an issue where logging in using LDAP/Active Directory would sometimes crash the Cloudera Data Science Workbench web application.

    Cloudera Bug: DSE-2672

  • The file tree in the Workbench now refreshes correctly when switching between sessions or launching a new session.

    Cloudera Bug: DSE-2829

  • Fixed a file descriptors leak that would cause the "Failed to get Kubernetes client configuration" error in Cloudera Manager.

    Cloudera Bug: DSE-2910

  • Fixed an issue where the host-controller process was consuming too much CPU. This was occurring due to a bug in the Kubernetes client-go library.

    Cloudera Bug: DSE-2993

Known Issues and Limitations in Cloudera Data Science Workbench 1.2.2

For a list of known issues and limitations, refer the documentation for version 1.2.x at Cloudera Data Science Workbench 1.2.x.

Cloudera Data Science Workbench 1.2.1

This section lists the release notes for Cloudera Data Science Workbench 1.2.1. The documentation for version 1.2.x can be found at Cloudera Data Science Workbench 1.2.x.

Issues Fixed In Cloudera Data Science Workbench 1.2.1

  • The Master Node IPv4 Address parameter has been added to Cloudera Manager's Add Service wizard and is now a required parameter for installation on AWS. This should fix any related installation issues for deployments on AWS.

    Cloudera Bug: DSE-2879

  • Fixed an issue with CSD-based deployments where certain operations would fail because the Prepare Node command was not installing all the required packages during First Run of the service. To see the updated list of packages that are now being installed by the Prepare Node command, refer the CSD install guide.

    Cloudera Bug: DSE-2869

  • Fixed an issue where the LD_LIBRARY_PATH environmental variable was not getting propagated to CUDA engines.

    Cloudera Bug: DSE-2828

  • Fixed an issue where stopping Cloudera Data Science Workbench on worker nodes resulted in the application hanging indefinitely.

    Cloudera Bug: DSE-2880

Incompatible Changes in Cloudera Data Science Workbench 1.2.1

Upgrading from Cloudera Data Science Workbench 1.2.0 to 1.2.1 on CSD-based deployments

After upgrading from Cloudera Data Science Workbench 1.2.0 to 1.2.1 on a CSD-based deployment, CLI commands might not work as expected due to missing binaries in the environment. Note that this issue does not affect fresh installs.

Known Issues and Limitations in Cloudera Data Science Workbench 1.2.1

For a list of known issues and limitations, refer the documentation for version 1.2.x at Cloudera Data Science Workbench 1.2.x.

Cloudera Data Science Workbench 1.2.0

This section lists the release notes for Cloudera Data Science Workbench 1.2.0. The documentation for version 1.2.x can be found at Cloudera Data Science Workbench 1.2.x.

New Features and Changes in Cloudera Data Science Workbench 1.2.0

  • Cloudera Data Science Workbench is now available as an add-on service for Cloudera Manager. To this end, Cloudera Data Science Workbench is now distributed in a parcel that integrates with Cloudera Manager using a Custom Service Descriptor (CSD). You can use Cloudera Manager to install, upgrade, and monitor Cloudera Data Science Workbench. Diagnostic data bundles can be generated and submitted to Cloudera through Cloudera Manager.
  • Cloudera Data Science Workbench now enables secure sharing of job and session consoles. Additionally, site administrators can disable anonymous sharing from the Site Administrator dashboard (Admin > Security). See Sharing Job and Session Console Outputs.
  • The Admin > Usage page now includes graphs for monitoring usage activity such as number of CPUs or GPUs used, memory usage, and total session runs, over customizable periods of time. See Monitoring Site Activity.
  • Cloudera Data Science Workbench now lets you configure session, job, and idle timeouts. These can be configured using environmental variables either for the entire deployment or per-project.
  • The cdsw enable and disable commands are no longer needed. The master node will now automatically detect the IP addresses of worker nodes joining or leaving Cloudera Data Science Workbench. See the revised Cloudera Data Science Workbench Command Line Reference.
  • The Kudu Python client is now included in the Cloudera Data Science Workbench base engine image.
  • Interactive session names can now be modified by project contributors and admins. By default, session names are set to 'Untitled Session'.
  • All-numeric usernames are now accepted.
  • Kubernetes has been upgraded to version 1.6.11.

Engine Upgrade

  • Cloudera Data Science Workbench 1.2.0 ships version 3 of the base engine image which includes matplotlib improvements and the Kudu client libraries. Engine 3 ships the following versions of R and Python:

    • R - 3.4.1
    • Python - 2.7.11, 3.6.1

    Make sure you upgrade existing projects to Base Image v3 (Project Settings > Engine) to take advantage of the new features and bug fixes included in the new engine.

Issues Fixed in Cloudera Data Science Workbench 1.2.0

Privilege Escalation and Database Exposure in Cloudera Data Science Workbench

Several web application vulnerabilities allowed malicious authenticated Cloudera Data Science Workbench (CDSW) users to escalate privileges in CDSW. In combination, such users could exploit these vulnerabilities to gain root access to CDSW nodes, gain access to the CDSW database which includes Kerberos keytabs of CDSW users and bcrypt hashed passwords, and obtain other privileged information such as session tokens, invitations tokens, and environmental variables.

Products affected: Cloudera Data Science Workbench

Releases affected: Cloudera Data Science Workbench 1.0.0, 1.0.1, 1.1.0, 1.1.1

Users affected: All users of Cloudera Data Science Workbench 1.0.0, 1.0.1, 1.1.0, 1.1.1

Date/time of detection: September 1, 2017

Detected by: NCC Group

Severity (Low/Medium/High): High

Impact: Privilege escalation and database exposure.

CVE: CVE-2017-15536

Addressed in release/refresh/patch: Cloudera Data Science Workbench 1.2.0 or higher.

Immediate action required: Upgrade to the latest version of Cloudera Data Science Workbench.

Other Notable Fixed Issues in Cloudera Data Science Workbench 1.2.0

  • Fixed an issue where the Workbench editor screen jumps unexpectedly when typing or scrolling.
  • Fixed auto-scroll behavior in the Workbench console. This was a browser compatibility issue that affected Chrome and Firefox, but not Safari.
  • Fixed an issue where if a user logged out of Cloudera Data Science Workbench, and logged back in as a different user, they may see a SecurityError message in the Workbench.
  • Fixed an issue that was preventing site administrators from uploading the SAML metadata file.
  • Fixed several issues related to plotting with matplotlib. If you have previously used any workarounds for plotting, you might consider removing them now.
  • Engines now use the same build of Kerberos utilities (ktutil, kinit, and klist) as the rest of Cloudera Data Science Workbench. This will improve logs obtained from kinit and make debugging Kerberos issues easier.
  • KRB5_TRACE is now included in the error logs obtained when you kinit.
  • Fixed an issue that was affecting health checks in deployments using AWS elastic load balancing.

Incompatible Changes in Cloudera Data Science Workbench 1.2.0

Proxy Configuration Change: If you are using a proxy server, you must ensure that the IP addresses for the web and Livelog services are skipped from the proxy.

Depending on your deployment (parcel or package), append the following IP addresses to either the No Proxy property in the Cloudera Manager CDSW service, or to the NO_PROXY parameter in cdsw.conf.
100.77.0.129
100.77.0.130

These have also been added to the installation instructions.

Known Issues and Limitations in Cloudera Data Science Workbench 1.2.0

For a list of known issues and limitations, refer the documentation for version 1.2.x at Cloudera Data Science Workbench 1.2.x.

Cloudera Data Science Workbench 1.1.1

This section lists the release notes for Cloudera Data Science Workbench 1.1.1. The documentation for version 1.1.x can be found at Cloudera Data Science Workbench 1.1.x.

New Features in Cloudera Data Science Workbench 1.1.1

  • Keytab Authentication - With version 1.1.1, you can now authenticate yourself to the CDH cluster by uploading your Kerberos keytab to Cloudera Data Science Workbench. To use this feature, go to the top-right dropdown menu, click Account settings > Hadoop Authentication, enter your Kerberos principal and click Upload Keytab.

Issues Fixed In Cloudera Data Science Workbench 1.1.1

  • Fixed an issue with airgapped installations where the installer could not pull the alpine 3.4 image into the airgapped environment.
  • Fixed an issue where Cloudera Data Science Workbench would fail to log a command trace when the Kerberos process exits.
  • Fixed authentication issues with older versions of MIT KDC.

Known Issues and Limitations in Cloudera Data Science Workbench 1.1.1

For a list of known issues and limitations, refer the documentation for version 1.1.x at Cloudera Data Science Workbench 1.1.x.

Cloudera Data Science Workbench 1.1.0

This section lists the release notes for Cloudera Data Science Workbench 1.1.0. The documentation for version 1.1.x can be found at Cloudera Data Science Workbench 1.1.x.

New Features and Changes in Cloudera Data Science Workbench 1.1.0

  • Added support for RHEL/CentOS 7.3 and Oracle Linux 7.3.

  • Cloudera Data Science Workbench now allows you to run GPU-based workloads. For more details, see Using GPUs for Cloudera Data Science Workbench Workloads.

  • For Cloudera Manager and CDH clusters that are not connected to the Internet, Cloudera Data Science Workbench now supports fully offline installations. See the installation guide for more details.

  • Web UIs for processing frameworks such as Spark 2, Tensorflow, and Shiny, are now embedded in Cloudera Data Science Workbench and can be accessed directly from active sessions and jobs. For more details, see Accessing Web User Interfaces from Cloudera Data Science Workbench.

  • Added support for a Jobs REST API that lets you orchestrate jobs from 3rd party workflow tools. See Cloudera Data Science Workbench Jobs API.

  • DataFrames are now scrollable in the workbench session output pane. For examples, see the section on Grid Displays.

  • Added support for rich visualizations in Scala engine using Jupyter jvm-repr. For an example, see HTML Visualizations - Scala.

  • JAVA_HOME is now set in cdsw.conf, and not from the Site Administrator dashboard (Admin > Engines).

Engine Upgrade

Cloudera Data Science Workbench 1.1.0 ships version 2 of the base engine image that includes new versions of Pandas, seaborn, and assorted bug fixes. Engine 2 ships the following versions of R and Python:

  • R - 3.3.0
  • Python - 2.7.11, 3.6.1

Make sure you upgrade existing projects to Base Image v2 (Project Settings > Engine) to take advantage of the new features and bug fixes included in the new engine.

Issues Fixed in Cloudera Data Science Workbench 1.1.0

  • Improved support for dynamic data visualizations in Python, including Bokeh.

  • Fixed issues with the Python template project. The project now supports offline mode and will therefore work on airgapped clusters.

  • Fixed issues related to cached responses in Internet Explorer 11.

  • Fixed issues with Java symlinks outside of JAVA_HOME.

  • The cdsw status command can now be run on worker nodes.

  • Removed unauthenticated localhost access to Kubernetes.

  • Fixed Kerberos authentication issues with specific enc-types and Active Directory.

  • Removed restrictions on usernames with special characters for better compatibility with external authentication systems such as Active Directory.

  • Fixed issues with LDAP configuration validation that caused application crashes.

  • Improved LDAP test configuration form to avoid confusion on parameters being sent.

Incompatible Changes in Cloudera Data Science Workbench 1.1.0

  • Upgrading from version 1.0.x to 1.1.x

    During the upgrade process, you will encounter incompatibilities between the two versions of cdsw.conf. This is because even though you are installing the latest RPM, your previous configuration settings in cdsw.conf will remain unchanged. Depending on the release you are upgrading from, you will need to modify cdsw.conf to ensure it passes the validation checks run by the 1.1.x release.

    Key changes to note:
    • JAVA_HOME is now a required parameter. Make sure you add JAVA_HOME to cdsw.conf before you start Cloudera Data Science Workbench.
    • Previous versions allowed MASTER_IP to be set to a DNS hostname. If you are still using a DNS hostname, switch to an IP address.
  • Python engine updated in version 1.1.x

    Version 1.1.x includes an updated base engine image for Python which no longer uses the deprecated pylab mode in Jupyter to import the numpy and matplotlib functions into the global scope. With version 1.1.x, engines will now use built-in functions like any rather than the pylab counterpart, numpy.any. As a result of this change, you might see certain behavioral changes and differences in results between the two versions.

    Also note that Python projects originally created with engine 1 will be running pandas version 0.19, and will not auto-upgrade to version 0.20 by simply selecting engine 2. You will also need to manually install version 0.20.1 of pandas when you launch a project session.

Known Issues and Limitations in Cloudera Data Science Workbench 1.1.0

For a list of known issues and limitations, refer the documentation for version 1.1.x at Cloudera Data Science Workbench 1.1.x.

Cloudera Data Science Workbench 1.0.1

This section lists the release notes for Cloudera Data Science Workbench 1.0.1. The documentation for version 1.0.x can be found at Cloudera Data Science Workbench 1.0.x.

Issues Fixed in Cloudera Data Science Workbench 1.0.1

  • Fixed a random port conflict that could prevent Scala engines from running.

  • Improved formatting of validation, and visibility of some errors.

  • Fixed an issue with Firefox that was resulting in duplicate jobs on job creation.

  • Removed the Mathjax external dependency on CDN.

  • Improved PATH and JAVA_HOME handling that previously broke Hadoop CLIs.

  • Fixed an issue with Java security policy files that caused Kerberos issues.

  • Fixed an issue that caused git clone to fail on some repositories.

  • Fixed an issue where updating LDAP admin settings deactivated the local fallback login.

  • Fixed an issue where bad LDAP configuration crashed the application.

  • Fixed an issue where job environmental variable settings did not persist.

Known Issues and Limitations in Cloudera Data Science Workbench 1.0.x

For a list of known issues and limitations, refer the documentation for version 1.0.x at Cloudera Data Science Workbench 1.0.x.

Cloudera Data Science Workbench 1.0.0

Version 1.0 represents the first generally available (GA) release of Cloudera Data Science Workbench. For information about the main features and benefits of Cloudera Data Science Workbench, as well as an architectural overview of the product, see Cloudera Data Science Workbench Overview.