Spark Authentication

Minimum Required Role: Security Administrator (also provided by Full Administrator)

Spark currently support two methods of authentication. Authentication can be configured using Kerberos or using a shared secret. When using Spark on YARN, Cloudera recommends using Kerberos authentication since it is stronger security measure.

Configuring Kerberos Authentication for Spark

Create the Spark Principal and Keytab File

  1. Create the spark principal and spark.keytab file:
    kadmin: addprinc -randkey spark/fully.qualified.domain.name@YOUR-REALM.COM
    kadmin: xst -k spark.keytab spark/fully.qualified.domain.name
  2. Move the file into the Spark configuration directory and restrict its access exclusively to the spark user:
    $ mv spark.keytab /etc/spark/conf/
    $ chown spark /etc/spark/conf/spark.keytab
    $ chmod 400 /etc/spark/conf/spark.keytab
    For more details on creating Kerberos principals and keytabs, see Step 4: Create and Deploy the Kerberos Principals and Keytab Files.

Configure the Spark History Server to Use Kerberos

Using Cloudera Manager

If you are using Cloudera Manager, use the following steps to edit the spark-env.sh file.
  1. Open the Cloudera Manager Administration Console and navigate to the Spark service.
  2. Click the Configuration tab.
  3. Select Scope > History Server.
  4. Select Category > Advanced.
  5. Edit the History Server Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh property to add the following properties:
    SPARK_HISTORY_OPTS=-Dspark.history.kerberos.enabled=true \
    -Dspark.history.kerberos.principal=spark/fully.qualified.domain.name@YOUR-REALM.COM \
    -Dspark.history.kerberos.keytab=/etc/spark/conf/spark.keytab
  6. Click Save Changes to commit the changes.

Using the Command Line

If you are using the command-line, open the Spark configuration file /etc/spark/conf/spark-env.sh file and add the following properties:
SPARK_HISTORY_OPTS=-Dspark.history.kerberos.enabled=true \
-Dspark.history.kerberos.principal=spark/fully.qualified.domain.name@YOUR-REALM.COM \
-Dspark.history.kerberos.keytab=/etc/spark/conf/spark.keytab

Running Spark Applications on a Secure Cluster

You can submit compiled Spark applications with the spark-submit script. Specify the following additional command-line options when running the spark-submit script on a secure cluster using the form: --option value.

Option Description
--keytab The full path to the file that contains the keytab for the principal. This keytab is copied to the node running the ApplicationMaster using the Secure Distributed Cache, for periodically renewing the login tickets and the delegation tokens. For information on setting up the principal and keytab, see Configuring a Cluster with Custom Kerberos Principalsand Spark Authentication.
--principal Principal to be used to log in to the KDC, while running on secure HDFS.
--proxy-user This property allows you to use the spark-submit script to impersonate client users when submitting jobs.

Configuring Spark Authentication Using a Shared Secret

Authentication using a shared secret can be configured using the spark.authenticate configuration parameter. The authentication process is essentially a handshake between Spark and the other party to ensure they have the same shared secret and can be allowed to communicate. If the shared secret does not match, they will not be allowed to communicate.

If you are using Spark on YARN, setting spark.authenticate parameter to true will generate and distribute the shared secret to all applications communicating with Spark. For Cloudera Manager deployments, use the following instructions:
  1. Go to the Spark Service > Configuration tab.
  2. In the Search field, type spark authenticate to find the Spark Authentication settings.
  3. Check the checkbox for the Spark Authentication property.
  4. Click Save Changes.