This is the documentation for Cloudera Manager 5.0.x. Documentation for other versions is available at Cloudera Documentation.

Hadoop Users in Cloudera Manager

A number of special users are created by default when installing and using CDH & Cloudera Manager. Given below is a list of users and groups as of the latest Cloudera Manager 5.0.x release. Also listed are the corresponding Kerberos principals and keytab files that should be created when you configure Kerberos security on your cluster.

Table 1. Cloudera Manager Users & Groups

Project

Unix User ID

Group

Group Members

Notes

Cloudera Manager cloudera-scm cloudera-scm

Cloudera Manager processes such as the CM Server and the monitoring daemons run as this user. It is not configurable.

The Cloudera Manager keytab file must be named cmf.keytab since that name has been hard-coded in Cloudera Manager.
  Note: Applicable to clusters managed by Cloudera Manager only.
Apache Avro  

No special users.

Apache Flume flume flume

The sink that writes to HDFS as this user must have write privileges.

Apache HBase hbase hbase

The Master and the RegionServer processes run as this user.

HDFS hdfs hdfs impala

The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.

The hdfs user is also part of the hadoop group.

Apache Hive hive hive impala

The HiveServer2 process and the Hive Metastore processes run as this user.

A user must be defined for Hive access to its Metastore DB (e.g. MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml.

Apache HCatalog hive hive

The WebHCat service (for REST access to Hive functionality) runs as the hive user. It is not configurable.

HttpFS httpfs httpfs

The HttpFS service runs as this user.

*See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.

Hue hue hue

Hue runs as this user. It is not configurable.

Cloudera Impala impala impala

An interactive query tool. The impala user also belongs to the hive and hdfs groups.

Llama llama llama  

Llama runs as this user.

Apache Mahout  

No special users.

MapReduce mapred mapred

Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos. It would be complicated to use a different user ID.

Apache Oozie oozie oozie

The Oozie service runs as this user.

Parquet  

No special users.

Apache Pig  

No special users.

Cloudera Search solr solr

The Solr process runs as this user. It is not configurable.

Apache Spark spark spark

The Spark process runs as this user. It is not configurable.

Apache Sentry (incubating)  

No special users.

Apache Sqoop sqoop sqoop

This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.

Apache Sqoop2 sqoop2 sqoop

The Sqoop2 service runs as this user.

Apache Whirr  

No special users.

YARN yarn yarn

Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos. It would be complicated to use a different user ID.

The yarn user also belongs to the hadoop group.

Apache ZooKeeper zookeeper zookeeper

The ZooKeeper process runs as this user. It is not configurable.

Other hadoop yarn, hdfs, mapred

This is a group with no associated Unix user ID or keytab.

  Note:

The Kerberos principal names should be of the format, username/fully.qualified.domain.name@YOUR-REALM.COM, where the term username refers to the username of an existing UNIX account, such as hdfs or mapred. The table below lists the usernames to be used for the Kerberos principal names. For example, the Kerberos principal for Apache Flume would be flume/fully.qualified.domain.name@YOUR-REALM.COM.

Table 2. Cloudera Manager Keytabs & Keytab File Permissions
Project (UNIX ID) Service Kerberos Principal Primary Filename (.keytab) Keytab File Owner Keytab File Group File Permission (octal)
Cloudera Manager (cloudera-scm) NA cloudera-scm cmf cloudera-scm cloudera-scm 600
Cloudera Management Service (cloudera-scm) cloudera-mgmt- REPORTSMANAGER cloudera-scm hdfs cloudera-scm cloudera-scm 600
cloudera-mgmt- ACTIVITYMONITOR
cloudera-mgmt- SERVICEMONITOR
cloudera-mgmt- HOSTMONITOR
Flume (flume) flume-AGENT flume flume cloudera-scm cloudera-scm 600
HBase (hbase) hbase-REGIONSERVER hbase hbase cloudera-scm cloudera-scm 600
hbase- HBASETHRIFTSERVER
hbase- HBASERESTSERVER
hbase-MASTER
HDFS (hdfs) hdfs-NAMENODE hdfs hdfs

Secondary: Merge hdfs and HTTP

cloudera-scm cloudera-scm 600
hdfs-DATANODE
hdfs- SECONDARYNAMENODE
Hive (hive) hive-HIVESERVER2 hive hive cloudera-scm cloudera-scm 600
hive-WEBHCAT HTTP HTTP
hive-HIVEMETASTORE hive hive
HttpFS (httpfs) hdfs-HTTPFS httpfs httpfs cloudera-scm cloudera-scm 600
Hue (hue) hue-KT_RENEWER hue hue cloudera-scm cloudera-scm 600
Impala (impala) impala-STATESTORE impala impala cloudera-scm cloudera-scm 600
impala-CATALOGSERVER
impala-IMPALAD
Llama (llama)            
MapReduce (mapred) mapreduce-JOBTRACKER mapred mapred

Secondary: Merge mapred and HTTP

cloudera-scm cloudera-scm 600
mapreduce- TASKTRACKER
Oozie (oozie) oozie-OOZIE_SERVER oozie oozie

Secondary: Merge oozie and HTTP

cloudera-scm cloudera-scm 600
Search (solr) solr-SOLR_SERVER solr solr

Secondary: Merge solr and HTTP

cloudera-scm cloudera-scm 600
Sentry (sentry)            
Spark (spark) spark_on_yarn-SPARK_YARN_HISTORY_SERVER spark spark cloudera-scm cloudera-scm 600
Sqoop (sqoop)            
Sqoop2 (sqoop2)            
YARN (yarn) yarn-NODEMANAGER yarn yarn

Secondary: Merge yarn and HTTP

cloudera-scm cloudera-scm 644
yarn- RESOURCEMANAGER 600
yarn-JOBHISTORY 600
ZooKeeper (zookeeper) zookeeper-server zookeeper zookeeper cloudera-scm cloudera-scm 600
Page generated September 3, 2015.