Configuring and Running Spark (Standalone Mode)
- Edit the following portion of /etc/spark/conf/spark-env.sh to point to
the host where the spark-master
### ### === IMPORTANT === ### Change the following to specify a real cluster's Master host ### export STANDALONE_SPARK_MASTER_HOST=`hostname`Change 'hostaname' in the last line to the actual hostname of the host where the Spark master will run.
You can change other elements of the default configuration by modifying /etc/spark/conf/spark-env.sh. You can change the following:
- SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
- SPARK_WORKER_CORES, to set the number of cores to use on this machine
- SPARK_WORKER_MEMORY, to set how much memory to use (for example 1000MB, 2GB)
- SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT
- SPARK_WORKER_INSTANCE, to set the number of worker processes per node
- SPARK_WORKER_DIR, to set the working directory of worker processes
Configuring the Spark History Server
$ sudo -u hdfs hadoop fs -mkdir /user/spark $ sudo -u hdfs hadoop fs -mkdir /user/spark/applicationHistory $ sudo -u hdfs hadoop fs -chown -R spark:spark /user/spark $ sudo -u hdfs hadoop fs -chmod 1777 /user/spark/applicationHistory
- Create /etc/spark/conf/spark-defaults.conf on the Spark
cp /etc/spark/conf/spark-defaults.conf.template /etc/spark/conf/spark-defaults.conf
- Add the following to /etc/spark/conf/spark-defaults.conf:
In addition, if you want the YARN ResourceManager to link directly to the Spark History Server, you can set the spark.yarn.historyServer.address property in /etc/spark/conf/spark-defaults.conf:
Starting, Stopping, and Running Spark
- To start Spark, proceed as follows:
- On one node in the cluster, start the
$ sudo service spark-master start
- On one node in the cluster, start the master:
- On all the other nodes, start the
$ sudo service spark-worker start
- On one node, start the history
$ sudo service spark-history-server start
- To stop Spark, use the following commands on the appropriate
$ sudo service spark-worker stop $ sudo service spark-master stop $ sudo service spark-history-server stop
Service logs are stored in /var/log/spark.
You can use the GUI for the Spark master at <master_host>:18080.
Testing the Spark Service
To test the Spark service, start spark-shell on one of the nodes. You can, for example, run a word count application:
val file = sc.textFile("hdfs://namenode:8020/path/to/input") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://namenode:8020/output")
You can see the application by going to the Spark Master UI, by default at http://spark-master:18080, to see the Spark Shell application, its executors and logs.
Running Spark ApplicationsFor details on running Spark applications in the YARN Client and Cluster modes, see Running Spark Applications.