Flume Account Requirements

This section provides an overview of the account and credential requirements for Flume to write to a Kerberized HDFS. Note the distinctions between the Flume agent machine, DataNode machine, and NameNode machine, as well as the flume Unix user account versus the flume Hadoop/Kerberos user account.

  • Each Flume agent machine that writes to HDFS (using a configured HDFS sink) needs a Kerberos principal of the form:

    where fully.qualified.domain.name is the fully qualified domain name of the given Flume agent host machine, and YOUR-REALM.COM is the Kerberos realm.

  • Each Flume agent machine that writes to HDFS does not need to have a flume Unix user account to write files owned by the flume Hadoop/Kerberos user. Only the keytab for the flume Hadoop/Kerberos user is required on the Flume agent machine.
  • DataNode machines do not need Flume Kerberos keytabs and also do not need the flume Unix user account.
  • TaskTracker (MRv1) or NodeManager (YARN) machines need a flume Unix user account if and only if MapReduce jobs are being run as the flume Hadoop/Kerberos user.
  • The NameNode machine needs to be able to resolve the groups of the flume user. The groups of the flume user on the NameNode machine are mapped to the Hadoop groups used for authorizing access.
  • The NameNode machine does not need a Flume Kerberos keytab.