Spark Encryption

Spark supports the following means of encrypting Spark data at rest, and data in transit.

Enabling Spark Encryption Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

  1. Open the Cloudera Manager Admin Console and go to the Spark service.
  2. Click the Configuration tab.
  3. (Prerequisite) Search for the Spark Authentication property and make sure it has been enabled. If this property is not set, the following settings to enable encryption will not work.
  4. Search for the Enable Network Encryption property. Use the checkbox to enable encrypted communication between Spark processes belonging to the same application.
  5. Search for the Enable I/O Encryption property. Use the checkbox to enabled encryption for temporary shuffle and cache files stored by Spark on local disks.
  6. Click Save Changes to commit the changes.
  7. Redeploy client configuration.
  8. Restart stale services (if indicated by Cloudera Manager).

Enabling Spark Encryption on an Unmanaged Cluster

Prerequisite - Before enabling encryption, make sure spark.authenticate is set to true. Without authentication enabled, the following settings to enable encryption will not work.

Enabling Encryption for Shuffle and Cache Files

Configure the following properties to enable encrypted shuffle for Spark on YARN.

Property Description

spark.shuffle.encryption.enabled

Enable encryption of temporary shuffle and cache files.

spark.shuffle.encryption.keySizeBits

Shuffle file encryption key size in bits. The valid numbers include 128, 192, and 256.

Enabling Encryption for Spark RPCs

Configure the following property to enable encryption for Spark RPCs.
Property Default Value Description

spark.authenticate.enableSaslEncryption

false

Enable encryption for Spark RPCs.
If you are using an external shuffle service, configure the following property in the shuffle service configuration to disable unencrypted connections. Note that the external shuffle service is enabled by default in CDH 5.5 and higher.
Property Default Value Description

spark.network.sasl.serverAlwaysEncrypt

false Disable unencrypted connections for the external shuffle service.