Hadoop Users (user:group) and Kerberos Principals

During the Cloudera Manager/CDH installation process, several Linux user accounts and groups are created by default. These are listed in the table below. Integrating the cluster to use Kerberos for authentication requires creating Kerberos principals and keytabs for these user accounts.

Users and Groups

Component (Version)

Unix User ID Groups Functionality
Cloudera Manager (all versions) cloudera-scm cloudera-scm Clusters managed by Cloudera Manager run Cloudera Manager Server, monitoring roles, and other Cloudera Server processes as cloudera-scm.

Requires keytab file named cmf.keytab because name is hard-coded in Cloudera Manager.

Apache Accumulo (Accumulo 1.4.3 and higher) accumulo accumulo Accumulo processes run as this user.
Apache Avro ~ ~ No special user:group.
Apache Flume flume flume The sink that writes to HDFS as user must have write privileges.
Apache HBase hbase hbase The Master and the RegionServer processes run as this user.
HDFS hdfs hdfs, hadoop The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.
Apache Hive hive hive

The HiveServer2 process and the Hive Metastore processes run as this user.

A user must be defined for Hive access to its Metastore DB (for example, MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml.

Apache HCatalog hive hive

The WebHCat service (for REST access to Hive functionality) runs as the hive user.

HttpFS httpfs httpfs

The HttpFS service runs as this user. See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.

Hue hue hue

Hue services run as this user.

Hue Load Balancer (Cloudera Manager 5.5 and higher) apache apache The Hue Load balancer has a dependency on the apache2 package that uses the apache user name. Cloudera Manager does not run processes using this user ID.
Impala impala impala, hive Impala services run as this user.
Apache Kafka (CDK 1.2.0 Powered By Apache Kafka) kafka kafka Kafka brokers and mirror makers run as this user.
Java KeyStore KMS (CDH 5.2.1 and higher) kms kms The Java KeyStore KMS service runs as this user.
Key Trustee KMS (CDH 5.3 and higher) kms kms The Key Trustee KMS service runs as this user.
Key Trustee Server (CDH 5.4 and higher) keytrustee keytrustee The Key Trustee Server service runs as this user.
Kudu kudu kudu Kudu services run as this user.
Llama llama llama Llama runs as this user.
Apache Mahout ~ ~ No special users.
MapReduce mapred mapred, hadoop Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos.
Apache Oozie oozie oozie The Oozie service runs as this user.
Parquet ~ ~ No special users.
Apache Pig ~ ~ No special users.
Cloudera Search solr solr The Solr processes run as this user.
Apache Spark spark spark The Spark History Server process runs as this user.
Apache Sentry sentry sentry The Sentry service runs as this user.
Apache Sqoop sqoop sqoop This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.
Apache Sqoop2 sqoop2 sqoop, sqoop2 The Sqoop2 service runs as this user.
Apache Whirr ~ ~ No special users.
YARN yarn yarn, hadoop Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos.
Apache ZooKeeper zookeeper zookeeper The ZooKeeper processes run as this user. It is not configurable.

Keytabs and Keytab File Permissions

Linux user accounts, such as hdfs, flume, or mapred are mapped to the username portion of the Kerberos principal names, as follows:
username/fqdn.example.com@YOUR-REALM.COM
For example, the Kerberos principal for Apache Flume would be:
flume/fqdn.example.com@YOUR-REALM.COM

Keytabs that contain multiple principals are merged automatically from individual keytabs by Cloudera Manager. If you do not use Cloudera Manager, you must merge the keytabs manually.

The table below lists the usernames to use for Kerberos principal names.

Clusters Managed by Cloudera Manager
Component (Unix User ID) Service Kerberos Principals Filename (*.keytab) Keytab File Owner Keytab File Group File Permission (octal)
Cloudera Manager (cloudera-scm) NA cloudera-scm cmf cloudera-scm cloudera-scm 600
Cloudera Management Service (cloudera-scm) cloudera-mgmt- REPORTSMANAGER hdfs headlamp cloudera-scm cloudera-scm 600
Cloudera Management Service (cloudera-scm) cloudera-mgmt- SERVICEMONITOR, cloudera-mgmt- ACTIVITYMONITOR hue cmon cloudera-scm cloudera-scm 600
Cloudera Management Service (cloudera-scm) cloudera-mgmt- HOSTMONITOR N/A N/A N/A N/A N/A
Apache Accumulo (accumulo) accumulo16-ACCUMULO16_MASTER accumulo accumulo16 cloudera-scm cloudera-scm 600
accumulo16-ACCUMULO16_TRACER
accumulo16-ACCUMULO16_MONITOR
accumulo16-ACCUMULO16_GC
accumulo16-ACCUMULO16_TSERVER
Flume (flume) flume-AGENT flume flume cloudera-scm cloudera-scm 600
HBase (hbase) hbase-HBASETHRIFTSERVER HTTP HTTP cloudera-scm cloudera-scm 600
hbase-REGIONSERVER hbase hbase
hbase-HBASERESTSERVER
hbase-MASTER
HDFS (hdfs) hdfs-NAMENODE hdfs, HTTP hdfs cloudera-scm cloudera-scm 600
hdfs-DATANODE
hdfs- SECONDARYNAMENODE
Hive (hive) hive-HIVESERVER2 hive hive cloudera-scm cloudera-scm 600
hive-WEBHCAT HTTP HTTP
hive-HIVEMETASTORE hive hive
HttpFS (httpfs) hdfs-HTTPFS httpfs httpfs cloudera-scm cloudera-scm 600
Hue (hue) hue-KT_RENEWER hue hue cloudera-scm cloudera-scm 600
Impala (impala) impala-STATESTORE impala impala cloudera-scm cloudera-scm 600
impala-CATALOGSERVER
impala-IMPALAD
Java KeyStore KMS (kms) kms-KMS HTTP kms cloudera-scm cloudera-scm 600
Apache Kafka (kafka) kafka-KAFKA_BROKER kafka kafka kafka kafka 600
Apache Kafka (kafka) kafka-KAFKA_MIRROR_MAKER kafka_mirror_maker kafka kafka kafka 600
Key Trustee KMS (kms) keytrustee-KMS_KEYTRUSTEE HTTP keytrustee cloudera-scm cloudera-scm 600
Llama (llama) impala-LLAMA llama, HTTP llama cloudera-scm cloudera-scm 600
MapReduce (mapred) mapreduce-JOBTRACKER mapred, HTTP mapred cloudera-scm cloudera-scm 600
mapreduce- TASKTRACKER
Oozie (oozie) oozie-OOZIE_SERVER oozie, HTTP oozie cloudera-scm cloudera-scm 600
Search (solr) solr-SOLR_SERVER solr, HTTP solr cloudera-scm cloudera-scm 600
Sentry (sentry) sentry-SENTRY_SERVER sentry sentry cloudera-scm cloudera-scm 600
Spark (spark) spark_on_yarn- SPARK_YARN_HISTORY_SERVER spark spark cloudera-scm cloudera-scm 600
YARN (yarn) yarn-NODEMANAGER yarn, HTTP yarn cloudera-scm cloudera-scm 644
yarn- RESOURCEMANAGER 600
yarn-JOBHISTORY 600
ZooKeeper (zookeeper) zookeeper-server zookeeper zookeeper cloudera-scm cloudera-scm 600
CDH Clusters Not Managed by Cloudera Manager
Component (Unix User ID) Service Kerberos Principals Filename (*.keytab) Keytab File Owner Keytab File Group File Permission (octal)
Apache Accumulo (accumulo) accumulo16-ACCUMULO16_MASTER accumulo accumulo16 accumulo accumulo 600
accumulo16-ACCUMULO16_TRACER
accumulo16-ACCUMULO16_MONITOR
accumulo16-ACCUMULO16_GC
accumulo16-ACCUMULO16_TSERVER
Flume (flume) flume-AGENT flume flume flume flume 600
HBase (hbase) hbase-HBASETHRIFTSERVER HTTP HTTP hbase hbase 600
hbase-REGIONSERVER hbase hbase
hbase-HBASERESTSERVER
hbase-MASTER
HDFS (hdfs) hdfs-NAMENODE hdfs, HTTP hdfs hdfs hdfs 600
hdfs-DATANODE
hdfs- SECONDARYNAMENODE
Hive (hive) hive-HIVESERVER2 hive hive hive hive 600
hive-WEBHCAT HTTP HTTP
hive-HIVEMETASTORE hive hive
HttpFS (httpfs) hdfs-HTTPFS httpfs httpfs httpfs httpfs 600
Hue (hue) hue-KT_RENEWER hue hue hue hue 600
Impala (impala) impala-STATESTORE impala impala impala impala 600
impala-CATALOGSERVER
impala-IMPALAD
Llama (llama) impala-LLAMA llama, HTTP llama llama llama 600
Java KeyStore KMS (kms) kms-KMS HTTP kms kms kms 600
Apache Kafka (kafka) kafka-KAFKA_BROKER kafka kafka kafka kafka 600
Apache Kafka (kafka) kafka-MIRROR_MAKER kafka_mirror_maker kafka kafka kafka 600
Key Trustee KMS (kms) kms-KEYTRUSTEE HTTP kms kms kms 600
MapReduce (mapred) mapreduce-JOBTRACKER mapred, HTTP mapred mapred hadoop 600
mapreduce- TASKTRACKER
Oozie (oozie) oozie-OOZIE_SERVER oozie, HTTP oozie oozie oozie 600
Search (solr) solr-SOLR_SERVER solr, HTTP solr solr solr 600
Sentry (sentry) sentry-SENTRY_SERVER sentry sentry sentry sentry 600
Spark (spark) spark_on_yarn- SPARK_YARN_HISTORY_SERVER spark spark spark spark 600
YARN (yarn) yarn-NODEMANAGER yarn, HTTP yarn yarn hadoop 644
yarn- RESOURCEMANAGER 600
yarn-JOBHISTORY 600
ZooKeeper (zookeeper) zookeeper-server zookeeper zookeeper zookeeper zookeeper 600