Installing or Upgrading CDS 2 Powered by Apache Spark

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

CDS 2 Powered by Apache Spark is distributed as two files: a CSD file and a parcel, both of which need to be installed on the cluster.

Install CDS 2 Powered by Apache Spark

Follow the steps to install Spark 2:

  1. Check that all the software prerequisites are satisfied. If not, you might need to upgrade or install other software components first. See Spark 2 Requirements for details.
  2. Install the Spark 2 CSD into Cloudera Manager.
    1. To download the Spark 2 CSD file, in the Version Information table in Spark 2 CSD, click the CSD link for the version of Spark 2 you want to install.
    2. Log on to the Cloudera Manager Server host, and place the Spark 2 CSD file in the location configured for CSD files.
    3. Set the file ownership of the CSD file to cloudera-scm:cloudera-scm with permission 644.
    4. Restart the Cloudera Manager Server with the following command:
      service cloudera-scm-server restart
  3. In the Cloudera Manager Admin Console, add the Spark2 parcel repository to the Remote Parcel Repository URLs in Parcel Settings as described in remote repository URLs.
  4. Download the Spark 2 parcel, distribute the parcel to the hosts in your cluster, and activate the parcel. See Managing Parcels.
  5. Add the Spark 2 service to your cluster.
    1. In the step #1, select a dependency option:
      • HDFS, YARN, ZooKeeper: Choose this option if you do not need access to a Hive service.
      • HDFS, Hive, YARN, ZooKeeper: Hive is an optional dependency for the Spark service. If you have a Hive service and want to access Hive tables from your Spark applications, choose this option to include Hive as a dependency and have the Hive client configurations always available to Spark applications.
    2. In the step #2, when customizing the role assignments for Spark 2, add a gateway role to every host.
    3. Note that the History Server port is 18089 instead of the usual 18088.
    4. Complete the steps to add the Spark 2 service.
  6. Return to the Home page by clicking the Cloudera Manager logo.
  7. Click to restart the cluster.

Upgrading to CDS 2.1 Powered by Apache Spark

If you are already using CDS 2.0 Powered by Apache Spark, here are the steps to upgrade to CDS 2.1 Powered by Apache Spark, while keeping any non-default configurations for Spark 2 that have already been applied:

  • Remove the CSD JAR for CDS 2.0 Powered by Apache Spark from /opt/cloudera/csd. Refer to CDS Powered by Apache Spark Version and Packaging Information for the names of the JAR files corresponding to each version.

  • Add the CSD JAR for CDS 2.1 Powered by Apache Spark to /opt/cloudera/csd. Set correct permissions and ownership.

  • Restart the cloudera-scm-server service.

  • In Cloudera Manager, deactivate the parcel corresponding to CDS 2.0.

  • In Cloudera Manager, activate the parcel corresponding to CDS 2.1.

  • Restart services and deploy the client configurations.