Configuring Spark on YARN for Long-Running Applications

For long-running applications, such as Spark Streaming jobs, to write to HDFS, you must configure Kerberos authentication for Spark for Spark, and pass the Spark principal and keytab to the spark-submit script using the --principal and --keytab parameters. The keytab is copied to the host running the ApplicationMaster, and the Kerberos login is renewed periodically by using the principal and keytab to generate the required delegation tokens for communication with HDFS.

To make sure the Spark keytab is delivered to the ApplicationMaster host securely, configure TLS/SSL communication for YARN and HDFS encryption on your cluster.