Managing Spark Standalone Using the Command Line
This section describes how to configure and start Spark Standalone services.
For information on installing Spark using the command line, see Setting Up Apache Spark Using the Command Line. For information on configuring and starting the Spark History Server, see Configuring and Running the Spark History Server Using the Command Line.
For information on Spark applications, see Spark Application Overview.
Configuring Spark Standalone
Before running Spark Standalone, do the following on every host in the cluster:
- Edit /etc/spark/conf/spark-env.sh and change hostname in the last line to the name of the host where the Spark
Master will run:
### ### === IMPORTANT === ### Change the following to specify the Master host ### export STANDALONE_SPARK_MASTER_HOST=`hostname`
- Optionally, edit other configuration options:
- SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT and SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports
- SPARK_WORKER_CORES, to set the number of cores to use on this machine
- SPARK_WORKER_MEMORY, to set how much memory to use (for example: 1000 MB, 2 GB)
- SPARK_WORKER_INSTANCE, to set the number of worker processes per node
- SPARK_WORKER_DIR, to set the working directory of worker processes
Starting and Stopping Spark Standalone Clusters
- On one host in the cluster, start the Spark Master:
$ sudo service spark-master start
You can access the Spark Master UI at spark_master:18080.
- On all the other hosts, start the workers:
$ sudo service spark-worker start
$ sudo service spark-worker stop $ sudo service spark-master stop
Service logs are stored in /var/log/spark.