Configuring YARN for Long-running Applications

Long-running applications such as Spark Streaming jobs will need additional configuration since the default settings only allow the hdfs user's delegation tokens a maximum lifetime of 7 days which is not always sufficient.

You can work around this by configuring the ResourceManager as a proxy user for the corresponding HDFS NameNode so that the ResourceManager can request new tokens when the existing ones are past their maximum lifetime. YARN will then be able to continue performing localization and log-aggregation on behalf of the hdfs user.

Configure the proxy user in Cloudera Manager as follows:
  1. Go to the Cloudera Manager Admin Console.
  2. Using the Clusters tab, navigate to the YARN service.
  3. Click Configuration.
  4. Under the ResourceManager Default Group > Advanced category, add the following string to the ResourceManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml property.
    <property> 
    <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>
    <value>true</value>
    </property>
  5. Click Save Changes.
  6. Using the Clusters tab, navigate to the HDFS service.
  7. Click Configuration.
  8. Under the Service-Wide > Advanced category, add the following string to the Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml property.
    <property> 
    <name>hadoop.proxyuser.yarn.hosts</name>
    <value>*</value>
    </property>
    
    <property>
    <name>hadoop.proxyuser.yarn.groups</name>
    <value>*</value>
    </property>
  9. Click Save Changes.
  10. Restart the YARN and HDFS services.