CDS Powered by Apache Spark Requirements

The following sections describe software requirements for CDS Powered by Apache Spark.

CDH Versions

Supported versions of CDH are described below.

A Hive compatibility issue in CDS 2.0 Release 1 affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the CDS 2.0 Release 2 or higher parcel, to avoid Spark 2 job failures when using Hive functionality.

CDS Powered by Apache Spark Version CDH Version
2.2 Release 4 CDH 5.8 and any higher CDH 5.x versions
2.2 Release 3 CDH 5.8 and any higher CDH 5.x versions
2.2 Release 2 CDH 5.8 and any higher CDH 5.x versions
2.2 Release 1 CDH 5.8, CDH 5.9, CDH 5.10, CDH 5.11, CDH 5.12, CDH 5.13
2.1 Release 4 CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10, CDH 5.11, CDH 5.12, and any higher CDH 5.x versions
2.1 Release 3 CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10, CDH 5.11, CDH 5.12, and any higher CDH 5.x versions
2.1 Release 2 CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10, CDH 5.11, CDH 5.12, and any higher CDH 5.x versions
2.1 Release 1 CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10, CDH 5.11, CDH 5.12
2.0 Release 2 CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10, CDH 5.11
2.0 Release 1 CDH 5.7 up to 5.7.5, CDH 5.8 up to 5.8.4, CDH 5.9 up to 5.9.1, CDH 5.10.0. Spark 2.0 Release 2 is required for any higher maintenance releases in any of these CDH versions.

A Spark 1.6 service (included in CDH 5.7 and higher) can co-exist on the same cluster as Spark 2 (installed as a separate parcel). The two services are configured to not conflict, and both run on the same YARN service. Spark 2 uses the external shuffle service from the CDH installation if Spark 1 is already installed, or installs the shuffle service itself if necessary. Only the external shuffle service classes from the CDH installation can be used.

Although Spark 1 and Spark 2 can coexist in the same CDH cluster, you cannot use multiple Spark 2 versions simultaneously in the same Cloudera Manager instance. All CDH clusters managed by the same Cloudera Manager Server must use exactly the same version of CDS Powered by Apache Spark. For example, you cannot use the built-in CDH Spark service, a CDS 2.1 service, and a CDS 2.2 service. You must choose only one CDS 2 Powered by Apache Spark release. Make sure to install or upgrade the CDS 2 service descriptor and parcels across all machines of all clusters at the same time.

Cloudera Manager Versions

Applicable versions of Cloudera Manager for Spark 2 are described below.

CDS Powered by Apache Spark Version Cloudera Manager Version
2.2 Release 4 Cloudera Manager 5.8.3, 5.9 and higher
2.2 Release 3 Cloudera Manager 5.8.3, 5.9 and higher
2.2 Release 2 Cloudera Manager 5.8.3, 5.9 and higher
2.2 Release 1 Cloudera Manager 5.8.3, 5.9 and higher
2.1 Release 4 Cloudera Manager 5.8.3, 5.9 and higher
2.1 Release 3 Cloudera Manager 5.8.3, 5.9 and higher
2.1 Release 2 Cloudera Manager 5.8.3, 5.9 and higher
2.1 Release 1 Cloudera Manager 5.8.3, 5.9 and higher
2.0 Release 2 Cloudera Manager 5.8.3, 5.9 and higher
2.0 Release 1 Cloudera Manager 5.8.3, 5.9 and higher

Scala 2.11 Requirement

Spark 2 does not work with Scala 2.10. Use Scala 2.11 only.

Python Requirement

CDS Powered by Apache Spark requires one of the following Python versions:

  • Python 2.7 or higher, when using Python 2.
  • Python 3.4 or higher, when using Python 3. (CDS 2.0 only supports Python 3.4 and 3.5; CDS 2.1 and 2.2 include support for Python 3.6 and higher.)

JDK 8 Requirement

CDS 2.2 and higher require JDK 8 only. If you are using CD 2.2 or higher, you must remove JDK 7 from all cluster and gateway hosts to ensure proper operation.

Check the supported JDK versions and see Java Development Kit Installation for the installation steps.