Upgrading Cloudera Data Science Workbench 1.5.x Using Packages

Before you start upgrading Cloudera Data Science Workbench, read the Cloudera Data Science Workbench Release Notes relevant to the version you are upgrading to.

  1. Depending on the version you are upgrading from, perform one of the following steps to stop Cloudera Data Science Workbench:
    • (Required for Upgrades from CDSW 1.4.2 or lower) Safely stop Cloudera Data Science Workbench. To avoid running into the data loss issue described in TSB-346, run the cdsw_protect_stop_restart.sh script on the master host and follow the sequence of steps as instructed by the script.

      The script will first back up your project files to the specified target folder. It will then temporarily move your project files aside to protect against the data loss condition. At that point, it is safe to stop Cloudera Data Science Workbench. To stop Cloudera Data Science Workbench, run the following command on all Cloudera Data Science Workbench hosts (master and workers):
      cdsw reset

      After Cloudera Data Science Workbench has stopped, press enter to continue running the script as instructed. It will then move your project files back into place.

      OR

    • (Upgrading from CDSW 1.4.3 or higher) Run the following command on all Cloudera Data Science Workbench hosts (master and workers) to stop Cloudera Data Science Workbench.
      cdsw reset
  2. (Strongly Recommended) On the master host, backup all your application data that is stored in the /var/lib/cdsw directory.
    To create the backup, run the following command on the master host:
    tar -cvzf cdsw.tar.gz -C /var/lib/cdsw/ .
  3. Save a backup of the Cloudera Data Science Workbench configuration file at /etc/cdsw/config/cdsw.conf.
  4. (Required for Upgrades from CDSW 1.4.0 - RedHat only) Cloudera Data Science Workbench 1.4.2 (and higher) includes a fix for a slab leak issue found in RedHat kernels. To have this fix go into effect, RedHat users must reboot all Cloudera Data Science Workbench hosts before proceeding with an upgrade from CDSW 1.4.0.

    As a precaution, consult your cluster/IT administrator before you start rebooting hosts.

  5. Uninstall the previous release of Cloudera Data Science Workbench. Perform this step on the master host, as well as all the worker hosts.
    yum remove cloudera-data-science-workbench 
  6. Install the latest version of Cloudera Data Science Workbench on the master host and on all the worker hosts. During the installation process, you might need to resolve certain incompatibilities in cdsw.conf. Even though you will be installing the latest RPM, your previous configuration settings in cdsw.conf will remain unchanged. Depending on the release you are upgrading from, you will need to modify cdsw.conf to ensure it passes the validation checks run by the 1.5.x release.

    To install the latest version of Cloudera Data Science Workbench, follow the same process to install the package as you would for a fresh installation.

    1. Install Cloudera Data Science Workbench on the Master Host
    2. (Optional) Install Cloudera Data Science Workbench on Worker Hosts.
  7. Upgrade Projects to Use the Latest Base Engine Images

    If the release you have just upgraded to includes a new version of the base engine image (see release notes), you will need to manually configure existing projects to use the new engine. Cloudera recommends you do so to take advantage of any new features and bug fixes included in the newly released engine.

    To upgrade a project to the new engine, go to the project's Settings > Engine page and select the new engine from the dropdown. If any of your projects are using custom extended engines, you will need to modify them to use the new base engine image.

    Note that this is a required step if you have upgraded to using Cloudera Data Science Workbench on CDH 6.

    The base engine image you use must be compatible with the version of CDH you are running. This is especially important if you are running workloads on Spark. Older base engines (v5 and lower) cannot support the latest versions of CDH 6. That is because these engines were configured to point to the Spark 2 parcel. However, on C6 clusters, Spark is now packaged as part of CDH 6 and the separate add-on Spark 2 parcel is no longer supported. If you want to use Spark on C6, you must upgrade your projects to base engine 7 (or higher).

    CDSW Base Engine Compatibility for Spark Workloads on CDH 5 and CDH 6
    Base Engine Versions CDH 5 CDH 6
    Base engines 6 (and lower) Yes No
    Base engines 7 (and higher) Yes Yes