Cloudera Data Science Workbench Release Notes

These release notes provide information on fixed issues, known issues, and limitations for all generally-available (GA) versions of Cloudera Data Science Workbench.

Cloudera Data Science Workbench 1.0.1

This section lists the issues fixed in Cloudera Data Science Workbench 1.0.1. For a list of known issues and limitations, see Known Issues and Limitations in Cloudera Data Science Workbench 1.0.x.

Issues Fixed in Cloudera Data Science Workbench 1.0.1

  • Fixed a random port conflict that could prevent Scala engines from running.

  • Improved formatting of validation, and visibility of some errors.

  • Fixed an issue with Firefox that was resulting in duplicate jobs on job creation.

  • Removed the Mathjax external dependency on CDN.

  • Improved PATH and JAVA_HOME handling that previously broke Hadoop CLIs.

  • Fixed an issue with Java security policy files that caused Kerberos issues.

  • Fixed an issue that caused git clone to fail on some repositories.

  • Fixed an issue where updating LDAP admin settings deactivated the local fallback login.

  • Fixed an issue where bad LDAP configuration crashed the application.

  • Fixed an issue where job environmental variable settings did not persist.

Cloudera Data Science Workbench 1.0.0

Version 1.0 represents the first generally available (GA) release of Cloudera Data Science Workbench. For information about the main features and benefits of Cloudera Data Science Workbench, as well as an architectural overview of the product, see About Cloudera Data Science Workbench.

Known Issues and Limitations in Cloudera Data Science Workbench 1.0.x

This section lists the currently known issues and limitations in Cloudera Data Science Workbench.

  • Upgrading to Version 1.0.1
    • If JAVA_HOME is in a non-standard location that cannot automatically be detected by Cloudera Data Science Workbench, sessions will fail to launch after the upgrade is complete.

      Workaround: Go to the Site Administration panel and click the Engines tab. Under the Environment Variables section, delete the custom value you previously set for JAVA_HOME. Now install Java in a location that Cloudera Data Science Workbench can detect automatically using the bigtop-detect-javahome utility.

  • Crashes and Hangs
    • High I/O utilization on the application block device can cause the application to stall or become unresponsive. Users should read and write data directly from HDFS rather than staging it in their project directories.

    • Installing ipywidgets or a Jupyter notebook into a project can cause Python engines to hang due to an unexpected configuration. The issue can be resolved by deleting the installed libraries from the R engine terminal.

  • Security
    • The container runtime and application data storage is not fully secure from untrusted users who have SSH access to the gateway nodes. Therefore, SSH access to the gateway nodes for untrusted users should be disabled for security and resource utilization reasons.

    • Self-signed certificates are not supported for TLS termination.

    • The TLS_KEY is not password protected.

    • PowerBroker-equipped Active Directory is not supported.

    • Using Kerberos plugin modules in krb5.conf is not supported.

    • LDAP group search filters are currently not supported. To limit access to Cloudera Data Science Workbench to certain groups, use "memberOf" or the equivalent user attribute in LDAP User Filter.

  • Usability
    • When using conda to install Python packages, you must specify the Python version to match the Python versions shipped in the engine image (2.7.11 and 3.6.1). If not specified, the conda-installed Python version will not be used within a project. Pip (pip and pip3) does not face this issue.

    • In a scenario where 100s of users are logged in and creating processes, the nproc and nofile limits of the system may be reached. Use ulimits or other methods to increase the maximum number of processes and open files that can be created by a user on the system.

    • When rebooting, Cloudera Data Science Workbench nodes can take a significant amount of time (about 30 minutes) to become ready.

    • Long-running operations such as fork and clone can time out when projects are large or connections outlast the HTTP timeouts of reverse proxies.

    • The Scala kernel does not support autocomplete features in the editor.

    • Scala and R code can sometimes indent incorrectly in the workbench editor.

    • Some Jupyter visualization libraries such as Bokeh do not always render as expected. Generating a static HTML visualization and using the engines /cdn feature can help work around the issue.

    • Spark SQL syntax does not work correctly in the Scala Spark shell. Using import spark.implicits_ to enable the use of Spark SQL syntax in the shell fails with the following error:
      import spark.implicits._
      Name: Compile Error
      Message: <console>:19: error: stable identifier required, but this.$line7$read.spark.implicits found.
      This is because spark is defined as a var which is not a stable identifier. You can work around this issue by overwriting the shell's spark definition and defining it as a val. You should then be able to import the implicits object:
      val spark = SparkSession.builder().getOrCreate()
      import spark.implicits._