Recommended Cluster Hosts and Role Distribution

When you install CDH using the Cloudera Manager installation wizard, Cloudera Manager attempts to spread the roles among cluster hosts (except for roles assigned to gateway hosts) based on the resources available in the hosts. You can change these assignments on the Customize Role Assignments page that appears in the wizard. You can also change and add roles at a later time using Cloudera Manager. See Role Instances.

If your cluster uses data-at-rest encryption, see Allocating Hosts for Key Trustee Server and Key Trustee KMS.

For information about where to locate various databases that are required for Cloudera Manager and other services, see Step 4: Install and Configure Databases.

CDH Cluster Hosts and Role Assignments

Cluster hosts can be broadly described as the following types:
  • Master hosts run Hadoop master processes such as the HDFS NameNode and YARN Resource Manager.
  • Utility hosts run other cluster processes that are not master processes such as Cloudera Manager and the Hive Metastore.
  • Gateway hosts are client access points for launching jobs in the cluster. The number of gateway hosts required varies depending on the type and size of the workloads.
  • Worker hosts primarily run DataNodes and other distributed processes such as Impalad.

3 - 10 Worker Hosts without High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • YARN ResourceManager
  • JobHistory Server
  • ZooKeeper
  • Kudu master
  • Spark History Server
One host for all Utility and Gateway roles:
  • Secondary NameNode
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Hive Metastore
  • HiveServer2
  • Impala Catalog Server
  • Impala StateStore
  • Hue
  • Oozie
  • Flume
  • Gateway configuration
3 - 10 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server

3 - 20 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • JobHistory Server
  • Spark History Server
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 3:
  • Kudu master (Kudu requires an odd number of masters for HA.)
Utility Host 1:
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
  • ZooKeeper (requires dedicated disk)
  • JournalNode (requires dedicated disk)
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
3 - 20 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server

20 - 80 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 3:
  • ZooKeeper
  • JournalNode
  • JobHistory Server
  • Spark History Server
  • Kudu master
Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Cloudera Manager Management Service
  • Hive Metastore
  • Impala Catalog Server
  • Oozie
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
20 - 80 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server

80 - 200 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 3:
  • ZooKeeper
  • JournalNode
  • JobHistory Server
  • Spark History Server
  • Kudu master
Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Activity Monitor
Utility Host 4:
  • Host Monitor
Utility Host 5:
  • Navigator Audit Server
Utility Host 6:
  • Navigator Metadata Server
Utility Host 7:
  • Reports Manager
Utility Host 8:
  • Service Monitor
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
80 - 200 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)

200 - 500 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 3:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Kudu master
Master Host 4:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
Master Host 5:
  • JobHistory Server
  • Spark History Server
  • ZooKeeper
  • JournalNode

We recommend no more than three Kudu masters.

Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Activity Monitor
Utility Host 4:
  • Host Monitor
Utility Host 5:
  • Navigator Audit Server
Utility Host 6:
  • Navigator Metadata Server
Utility Host 7:
  • Reports Manager
Utility Host 8:
  • Service Monitor
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
200 - 500 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)

500 -1000 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 3:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Kudu master
Master Host 4:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
Master Host 5:
  • JobHistory Server
  • Spark History Server
  • ZooKeeper
  • JournalNode

We recommend no more than three Kudu masters.

Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Activity Monitor
Utility Host 4:
  • Host Monitor
Utility Host 5:
  • Navigator Audit Server
Utility Host 6:
  • Navigator Metadata Server
Utility Host 7:
  • Reports Manager
Utility Host 8:
  • Service Monitor
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
500 - 1000 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)

Allocating Hosts for Key Trustee Server and Key Trustee KMS

If you are enabling data-at-rest encryption for a CDH cluster, Cloudera recommends that you isolate the Key Trustee Server from other enterprise data hub (EDH) services by deploying the Key Trustee Server on dedicated hosts in a separate cluster managed by Cloudera Manager. Cloudera also recommends deploying Key Trustee KMS on dedicated hosts in the same cluster as the EDH services that require access to Key Trustee Server. This architecture enables multiple clusters to share the same Key Trustee Server and avoids having to restart the Key Trustee Server when restarting a cluster.

For more information about encrypting data at rest in an EDH, see Encrypting Data at Rest.

For production environments in general, or if you have enabled high availability for HDFS and are using data-at-rest encryption, Cloudera recommends that you enable high availability for Key Trustee Server and Key Trustee KMS.