CDS Powered By Apache Spark Version and Packaging Information

CDS Powered By Apache Spark Version Information

Version Information
Version CSD Parcel
2.3 Release 3 SPARK2_ON_YARN-2.3.0.cloudera3.jar

The exact parcel name is dependent on the OS. You can find all the parcels at http://archive.cloudera.com/spark2/parcels/2.3.0.cloudera3/.

2.3 Release 2 SPARK2_ON_YARN-2.3.0.cloudera2.jar

The exact parcel name is dependent on the OS. You can find all the parcels at http://archive.cloudera.com/spark2/parcels/2.3.0.cloudera2/.

2.3 Release 1 Never officially released; if downloaded, do not use Never officially released; if downloaded, do not use
2.2 Release 2 SPARK2_ON_YARN-2.2.0.cloudera2.jar

The exact parcel name is dependent on the OS. You can find all the parcels at http://archive.cloudera.com/spark2/parcels/2.2.0.cloudera2/.

2.2 Release 1 SPARK2_ON_YARN-2.2.0.cloudera1.jar

The exact parcel name is dependent on the OS. You can find all the parcels at http://archive.cloudera.com/spark2/parcels/2.2.0.cloudera1/.

2.1 Release 2 SPARK2_ON_YARN-2.1.0.cloudera2.jar

The exact parcel name is dependent on the OS. You can find all the parcels at http://archive.cloudera.com/spark2/parcels/2.1.0.cloudera2/.

2.1 Release 1 SPARK2_ON_YARN-2.1.0.cloudera1.jar

The exact parcel name is dependent on the OS. You can find all the parcels at http://archive.cloudera.com/spark2/parcels/2.1.0.cloudera1/.

2.0 Release 2 SPARK2_ON_YARN-2.0.0.cloudera2.jar

The exact parcel name is dependent on the OS. You can find all the parcels at http://archive.cloudera.com/spark2/parcels/2.0.0.cloudera2/.

2.0 Release 1 SPARK2_ON_YARN-2.0.0.cloudera1.jar

The exact parcel name is dependent on the OS. You can find all the parcels at http://archive.cloudera.com/spark2/parcels/2.0.0.cloudera1/.

CDS Powered By Apache Spark Maven Artifacts

The following pom fragment shows how to access a CDS Powered By Apache Spark artifact from a Maven POM. For information on how to use Spark Maven artifacts, see Using the CDH 5 Maven Repository.

<dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.3.0.cloudera2</version>
      <scope>provided</scope>
</dependency>

When building Spark 2 applications, always change 2.10 to 2.11, and set the version to the Spark 2 version being used (for example, 2.0.0.cloudera1, 2.0.0.cloudera2, 2.2.0.cloudera2, or 2.3.0.cloudera2). These version names correspond to the name of the corresponding CDS parcel installed on the cluster.

Use this dependency definition to update pom.xml of the example described in Developing and Running a Spark WordCount Application. To account for changes in the Spark 2 API, before building the example, make the following updates to com.cloudera.sparkwordcount.JavaWordCount:
  • Add import java.util.Iterator;
  • Replace all instances of Iterable with Iterator.
  • Perform the following replacements:
    • return Arrays.asList(s.split(" ")); to return Arrays.asList(s.split(" ")).iterator();
    • return chars; to return chars.iterator();