Configuring YARN for Long-running Applications

Long-running applications such as Spark Streaming jobs will need additional configuration since the default settings only allow the hdfs user's delegation tokens a maximum lifetime of 7 days which is not always sufficient.

You can work around this by configuring the ResourceManager as a proxy user for the corresponding HDFS NameNode so that the ResourceManager can request new tokens when the existing ones are past their maximum lifetime. YARN will then be able to continue performing localization and log-aggregation on behalf of the hdfs user.

Configure the proxy user in Cloudera Manager as follows:
  1. Go to the Cloudera Manager Admin Console.
  2. Using the Clusters tab, go to the YARN service.
  3. Click the Configuration tab.
  4. Select Scope > Resource Manager.
  5. Select Category > Advanced.
  6. Check the Enable ResourceManager Proxy User Privileges property checkbox to give the ResourceManager proxy user privileges.
  7. Click Save Changes.
  8. Using the Clusters tab, go to the HDFS service.
  9. Click the Configuration tab.
  10. Select Scope > HDFS (Service-Wide).
  11. Select Category > Advanced.
  12. Add the following string to the Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml property.
    <property>
    <name>hadoop.proxyuser.yarn.hosts</name>
    <value>*</value>
    </property>
    
    <property>
    <name>hadoop.proxyuser.yarn.groups</name>
    <value>*</value>
    </property>
  13. Click Save Changes.
  14. Restart the YARN and HDFS services.