Upgrading Cloudera Data Science Workbench 1.5.x Using Cloudera Manager

  1. Before you begin the upgrade process, make sure you read the Cloudera Data Science Workbench Release Notes relevant to the version you are upgrading to/from.

  2. Depending on the version you are upgrading from, perform one of the following steps to stop Cloudera Data Science Workbench:
    • (Required for Upgrades from CDSW 1.4.2 or lower) Safely stop Cloudera Data Science Workbench. To avoid running into the data loss issue described in TSB-346, run the cdsw_protect_stop_restart.sh script on the master host and follow the sequence of steps as instructed by the script.

      The script will first back up your project files to the specified target folder. It will then temporarily move your project files aside to protect against the data loss condition. At that point, it is safe to stop the CDSW service in Cloudera Manager.

      After Cloudera Data Science Workbench has stopped, press enter to continue running the script as instructed. It will then move your project files back into place.

      OR

    • (Upgrading from CDSW 1.4.3 or higher) Stop the Cloudera Data Science Workbench service in Cloudera Manager.

  3. (Strongly Recommended) On the master host, backup all your application data that is stored in the /var/lib/cdsw directory.
    To create the backup, run the following command on the master host:
    tar -cvzf cdsw.tar.gz -C /var/lib/cdsw/ .
  4. (Required for Upgrades from CDSW 1.4.0 - RedHat only) Cloudera Data Science Workbench 1.4.2 (and higher) includes a fix for a slab leak issue found in RedHat kernels. To have this fix go into effect, RedHat users must reboot all Cloudera Data Science Workbench hosts before proceeding with an upgrade from CDSW 1.4.0.

    As a precaution, consult your cluster/IT administrator before you start rebooting hosts.

  5. Deactivate the existing Cloudera Data Science Workbench parcel. Go to the Cloudera Manager Admin Console. In the top navigation bar, click Hosts > Parcels.

    Locate the current active CDSW parcel and click Deactivate. On the confirmation pop-up, select Deactivate Only and click OK.

  6. Download and save the latest Cloudera Data Science Workbench CSD to the Cloudera Manager Server host.
    1. Download the latest Cloudera Data Science Workbench CSD.
      Version Link to CSD
      Cloudera Data Science Workbench 1.5.0

      CDH 6 - CLOUDERA_DATA_SCIENCE_WORKBENCH_CDH6_1.5.0.jar

      CDH 5 - CLOUDERA_DATA_SCIENCE_WORKBENCH_CDH5_1.5.0.jar

    2. Log on to the Cloudera Manager Server host, and place the CSD file under /opt/cloudera/csd, which is the default location for CSD files.
    3. Delete any CSD files belonging to older versions of Cloudera Data Science Workbench from /opt/cloudera/csd.

      This is required because older versions of the CSD will not work with the latest Cloudera Data Science Workbench 1.4 parcel. Make sure your CSD and parcel are always the same version.

    4. Set the CSD file ownership to cloudera-scm:cloudera-scm with permission 644.
    5. Restart the Cloudera Manager Server:
      service cloudera-scm-server restart
    6. Log into the Cloudera Manager Admin Console and restart the Cloudera Management Service.
      1. Select Clusters > Cloudera Management Service.
      2. Select Actions > Restart.
  7. Distribute and activate the new parcel on your cluster.
    1. Log into the Cloudera Manager Admin Console.
    2. Click Hosts > Parcels in the main navigation bar.
    3. If the latest CDSW parcel is already available in this list, you can skip this step.
      Add the Cloudera Data Science Workbench parcel repository URL to Cloudera Manager.
      1. On the Parcels page, click Configuration.
      2. In the Remote Parcel Repository URLs list, click the addition symbol to create a new row.
      3. Enter the path to the repository.
        Version Remote Parcel Repository URL
        Cloudera Data Science Workbench 1.5.0 https://archive.cloudera.com/p/cdsw1/1.5.0/parcels/
      4. Click Save Changes.
    4. Go back to the Hosts > Parcels page. The latest parcel should now appear in the set of parcels available for download. Click Download. Once the download is complete, click Distribute to distribute the parcel to all the CDH hosts in your cluster. Then click Activate. On the pop-up screen, select Activate Only and click OK. For more detailed information on each of these tasks, see Managing Parcels.
  8. Run the Prepare Node command on all Cloudera Data Science Workbench hosts.
    1. Before you run Prepare Node, you must make sure that the command is allowed to install all the required packages on your cluster. This is controlled by the Install Required Packages property.

      1. Navigate to the CDSW service.
      2. Click Configuration.
      3. Search for the Install Required Packages property. If this property is enabled, you can move on to the next step and run Prepare Node.
        However, if the property has been disabled, you must either enable it or manually install the following packages on all Cloudera Data Science Workbench gateway hosts.
        nfs-utils
        libseccomp
        lvm2
        bridge-utils
        libtool-ltdl
        iptables   
        rsync 
        policycoreutils-python 
        selinux-policy-base 
        selinux-policy-targeted 
        ntp 
        ebtables 
        bind-utils 
        nmap-ncat  
        openssl 
        e2fsprogs 
        redhat-lsb-core 
        socat
    2. Run the Prepare Node command.
      1. In Cloudera Manager, navigate to the Cloudera Data Science Workbench service.
      2. Click the Instances tab.
      3. Use the checkboxes to select all host instances and click Actions for Selected (x).
      4. Click Prepare Node. Once again, click Prepare Node to confirm the action.
  9. Log into the Cloudera Manager Admin Console and restart the Cloudera Data Science Workbench service.
    1. On the Home > Status tab, click to the right of the CDSW service and select Restart from the dropdown.
    2. Confirm your choice on the next screen. Note that a complete restart of the service will take time. Even though the CDSW service status shows Good Health, the application itself will take some more time to get ready.
  10. Upgrade Projects to Use the Latest Base Engine Images

    If the release you have just upgraded to includes a new version of the base engine image (see release notes), you will need to manually configure existing projects to use the new engine. Cloudera recommends you do so to take advantage of any new features and bug fixes included in the newly released engine.

    To upgrade a project to the new engine, go to the project's Settings > Engine page and select the new engine from the dropdown. If any of your projects are using custom extended engines, you will need to modify them to use the new base engine image.

    Note that this is a required step if you have upgraded to using Cloudera Data Science Workbench on CDH 6.

    The base engine image you use must be compatible with the version of CDH you are running. This is especially important if you are running workloads on Spark. Older base engines (v5 and lower) cannot support the latest versions of CDH 6. That is because these engines were configured to point to the Spark 2 parcel. However, on C6 clusters, Spark is now packaged as part of CDH 6 and the separate add-on Spark 2 parcel is no longer supported. If you want to use Spark on C6, you must upgrade your projects to base engine 7 (or higher).

    CDSW Base Engine Compatibility for Spark Workloads on CDH 5 and CDH 6
    Base Engine Versions CDH 5 CDH 6
    Base engines 6 (and lower) Yes No
    Base engines 7 (and higher) Yes Yes