This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Apache Spark Incompatible Changes

  • As of CDH 5.1, before you can run Spark in standalone mode, you must set the spark.master property in /etc/spark/conf/spark-defaults.conf, as follows:
    spark.master=spark://MASTER_IP:MASTER_PORT
    where MASTER_IP is the IP address of the host the Spark master is running on and MASTER_PORT is the port.

    This setting means that all jobs will run in standalone mode by default; you can override the default on the command line.

  • This release of Spark includes changes that will enable Spark to avoid breaking compatibility in the future. As a result, most applications will require a recompile to run against Spark 1.0, and some will require changes in source code. The details are as follows:
    • There are two changes in the core Scala API:
      • The cogroup and groupByKey operators now return Iterators over their values instead of Seqs. This change means that the set of values corresponding to a particular key need not all reside in memory at the same time.
      • SparkContext.jarOfClass now returns Option[String] instead of Seq[String] .
    • Spark’s Java APIs have been updated to accommodate Java 8 lambdas. See Migrating from pre-1.0 Versions of Spark for more information.
        Note:

      CDH 5.1 does not support Java 8.

Page generated September 3, 2015.