This is the documentation for Cloudera 5.4.x. Documentation for other versions is available at Cloudera Documentation.

Managing Spark

The Spark service is available in two versions: Spark and Spark (Standalone). The previously available Spark service, which runs Spark in standalone mode, has been renamed Spark (Standalone). The Spark (Standalone) service has its own runtime roles: Master and Worker. The current Spark service runs Spark as a YARN application. Both services have a History Server role. In secure clusters, Spark applications can only run on YARN. Cloudera recommends that you use the Spark service.

You can install Spark through the Cloudera Manager Installation wizard using parcels and have the Spark service added and started as part of the Installation wizard. See Installing Spark.

If you elect not to add the Spark service using the Installation wizard, you can use the Add Service wizard to create the service. The wizard automatically configures dependent services and the Spark service. See Adding a Service for instructions. See also Running Spark Applications on YARN.

When you upgrade from Cloudera Manager 5.1 or lower to Cloudera 5.2 or higher, Cloudera Manager does not migrate an existing Spark service, which runs Spark in standalone mode, to a Spark on YARN service.

Testing the Spark Service

To test the Spark service, start the Spark shell, spark-shell, on one of the hosts. Within the Spark shell, you can run a word count application. For example:
val file = sc.textFile("hdfs://namenode:8020/path/to/input")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)

To submit Spark applications to YARN, use the --master yarn flag when you start spark-shell. To see information about the running Spark shell application, go to Spark History Server UI at http://spark_history_server:18088 or the YARN applications page in the Cloudera Manager Admin Console.

If you are running the Spark (Standalone) service, you can see the Spark shell application, and its executors, and logs in the Spark Master UI, by default at http://spark_master:18080.

For more information on running Spark applications, see Running Spark Applications.

Adding the Spark History Server Role

Required Role:

By default, the Spark (Standalone) service is not created with a History Server. To add the History Server:
  1. Go to the Spark service.
  2. Click the Instances tab.
  3. Click the Add Role Instances button.
  4. Select a host in the column under History Server, then click OK.
  5. Click Continue.
  6. Check the checkbox next to the History Server role.
  7. Select Actions for Selected > Start and click Start.
  8. Click Close when the action completes.