Hardware Requirements Guide

To assess the hardware and resource allocations for your cluster, you need to analyze the types of workloads you want to run on your cluster, and the CDH components you will be using to run these workloads. You should also consider the size of data to be stored and processed, the frequency of workloads, the number of concurrent jobs that need to run, and the speed required for your applications.

As you create the architecture of your cluster, you will need to allocate Cloudera Manager and CDH roles among the hosts in the cluster to maximize your use of resources. Cloudera provides some guidelines about how to assign roles to cluster hosts. See Recommended Cluster Hosts and Role Distribution. When multiple roles are assigned to hosts, add together the total resource requirements (memory, CPUs, disk) for each role on a host to determine the required hardware.

For information about how workloads affect sizing decisions, see the following blog post: How-to: Select the Right Hardware for Your New Hadoop Cluster.

For more information about sizing for a particular component, see the following minimum requirements:

Cloudera Manager

Cloudera Manager Server Storage Requirements

Component	Storage	Notes
Partition hosting /usr	1 GB
Cloudera Manager Database	5 GB	If the Cloudera Manager Database shares a host with the Service Monitor and Host Monitor, more storage space is required to meet the requirements for those components.

Host Based Cloudera Manager Server Requirements

Number of Cluster Hosts	Database Host Configuration	Heap Size	Logical Processors	Cloudera Manager Server Storage Local Directory
Very small (≤10)	Shared	2 GB	4	5 GB minimum
Small (≤20)	Shared	4 GB	6	20 GB minimum
Medium (≤200)	Dedicated	8 GB	6	200 GB minimum
Large (≤500)	Dedicated	10 GB	8	500 GB minimum
Extra Large (>500)	Dedicated	16 GB	16	1 TB minimum

Service Monitor Requirements

The requirements for the Service Monitor are based on the number of monitored entities. To see the number of monitored entities, perform the following steps:

Open the Cloudera Manager Admin Console and click Clusters > Cloudera Management Service.
Find the Cloudera Management Service Monitored Entities chart. If the chart does not exist, add it from the Chart Library.

For more information about Cloudera Manager entities, see Cloudera Manager Entity Types.

Clusters with HDFS, YARN, or Impala

Use the recommendations in this table for clusters where the only services with worker roles are HDFS, YARN, or Impala.

Number of Monitored Entities	Number of Hosts	Required Java Heap Size	Recommended Non-Java Heap Size
0-2,000	0-100	1 GB	6 GB
2,000-4,000	100-200	1.5 GB	6 GB
4,000-8,000	200-400	1.5 GB	12 GB
8,000-16,000	400-800	2.5 GB	12 GB
16,000-20,000	800-1,000	3.5 GB	12 GB

Clusters with HBase, Solr, Kafka, or Kudu

Use these recommendations when services such as HBase, Solr, Kafka, or Kudu are deployed in the cluster. These services typically have larger quantities of monitored entities.

Number of Monitored Entities	Number of Hosts	Required Java Heap Size	Recommended Non-Java Heap Size
0-30,000	0-100	2 GB	12 GB
30,000-60,000	100-200	3 GB	12 GB
60,000-120,000	200-400	3.5 GB	12 GB
120,000-240,000	400-800	8 GB	20 GB

Host Monitor

The requirements for the Host Monitor are based on the number of monitored entities.

To see the number of monitored entities, perform the following steps:

Open the Cloudera Manager Admin Console and click Clusters > Cloudera Management Service.
Find the Cloudera Management Service Monitored Entities chart. If the chart does not exist, add it from the Chart Library.

For more information about Cloudera Manager entities, see Cloudera Manager Entity Types.

Number of Hosts	Number of Monitored Entities	Heap Size	Non-Java Heap Size
0-200	<6k	1 GB	2 GB
200-800	6k-24k	2 GB	6 GB
800-1000	24k-30k	3 GB	6 GB

Ensure that you have at least 25 GB of disk space available for the Host Monitor, Service Monitor, Reports Manager, and Events Server databases.

For more information refer to Host Monitor and Service Monitor Memory Configuration.

Reports Manager

The Reports Manager fetches the fsimage from the NameNode at regular intervals. It reads the fsimage and creates a Lucene index for it. To improve the indexing performance, Cloudera recommends provisioning a host as powerful as possible and dedicating an SSD disk to the Reports Manager.

Reports Manager
Component	Java Heap	CPU	Disk
Reports Manager	3-4 times the size of the fsimage.	Minimum: 8 cores Recommended: 16 cores (32 cores, with hyperthreading enabled.)	1 dedicated disk that is at least 20 times the size of the fsimage. Cloudera strongly recommends using SSD disks.

Agent Hosts

An unpacked parcel requires approximately three times the space of the packed parcel that is stored on the Cloudera Manager Server.

The /var/log partition on each host should have a minimum of 2GB of disk space allocated per role running on the host.

Event Server

The following table lists the minimum requirements for the Event Server:

CPU	RAM	Storage
1 core	256 MB	5 GB for the Event Database 20 GB for the Event Server Index Directory. The location of this directory is set by the Event Server Index Directory Event Server configuration property.

Alert Publisher

The following table lists the minimum requirements for the Alert Publisher:

CPU	RAM	Storage
1 core	1 GB	Minimum of 1 disk for log files

Cloudera Navigator

The sizing of Navigator components varies heavily depending on the size of the cluster and the number of audit events generated. Refer to Minimum Recommended Memory and Disk Space for more information.

Cloudera Navigator
Component	Java Heap / Memory	CPU	Disk
Navigator Audit Server	Minimum: 2-3 GB of Java Heap Configure this value using the Java Heap Size of Auditing Server in Bytes configuration property.	Minimum: 1 core	The database used by the Navigator Audit Server must be able to accommodate hundreds of gigabytes (or tens of millions of rows per day). The database size may reach a terabyte. Ideally, the database should not be shared with other services because the audit insertion rate can overwhelm the database server making other services using same database less responsive. See Storage Space Planning for Cloudera Navigator.
Navigator Metadata Server	Minimum: 10 GB of Java Heap Recommended: 20 GB Java Heap Add 20 GB for operating system buffer cache, however memory requirements can be much higher on a busy cluster and could require provisioning a dedicated host. Navigator logs include estimates based on the number of objects it is tracking. Configure this value using the Java Heap Size of Navigator Metadata Server in Bytes configuration property.	Minimum: 1 core	Minimum: 50 GB Recommended: 100-200 GB Note: Data stored by the Metadata server grows indefinitely unless you run the purge function. If you have not set up the purge function to run at a scheduled interval, Cloudera recommends that you run the purge function to reclaim disk space and keep data growth (and the corresponding memory requirement) in check. See Administration (Navigator Console).

Cloudera Data Science Workbench

Hardware Component	Requirement	Notes
CPU	16+ CPU (vCPU) cores	Allocate at least 1 CPU core per session. 1 CPU core is often adequate for light workloads.
Memory	32 GB RAM	As a general guideline, Cloudera recommends nodes with RAM between 60GB and 256GB Allocating less than 2 GB of RAM can lead to out-of-memory errors for many applications.
Disk	Root Volume: 100 GB Application Block Device or Mount Point (Master Host Only): 1 TB Docker Image Block Device: 1 TB	SSDs are strongly recommended for application data storage.

For more information on scaling guidelines and storage requirements for cloud providers such as AWS and Azure, see Requirements and Supported Platforms in the Cloudera Data Science Workbench documentation.

CDH

Accumulo

Component	Java Heap	CPU	Disk
Master	Minimum: 1 GiB 100-1000 tablets: 4 GB 10,000 or more tablets with 200 or more Tablet Servers: 8 GB 10,000 or more tablets with 300 or more Tablet Servers: 12 GB Set this value using the Tablet Server Max Heapsize Accumulo configuration property. See Flume Memory Consumption	2 Cores. Add more for large clusters or bulk load.	1 disk for local logs
Tablet Server	Minimum: 512 MB Java Heap Recommended: 8 GB Medium scale production (200 or more tablets per Tablet Server): 16 GB Set this value using the Tablet Server Max Heapsize Accumulo configuration property.	Minimum 4 dedicated cores. You can add more cores for larger clusters, when using replication, or for bulk loads.	4 or more spindles for each HDFS DataNode 1 disk for local logs (this disk can be shared with the operating system and/or other Hadoop logs
Tracer	1 - 2 GB, depending on cluster workloads Set this value using the Tracer Max Heapsize Accumulo configuration property.	2 or more dedicated cores, depending on cluster size and workloads	1 disk for local logs, which can be shared with the operating system and/or other Hadoop logs
GC Role	1-2 GB Set this value using the Garbage Collector Max Heapsize Accumulo configuration property.	2 or more dedicated cores, depending on cluster size	1 disk for local logs, which can be shared with the operating system and/or other Hadoop logs
Monitor Role	1-2 GB Set this value using the Monitor Max Heapsize Accumulo configuration property.	2 or more dedicated cores, depending on cluster size and workloads	1 disk for local logs, which can be shared with the operating system and/or other Hadoop logs

For additional information, see Apache Accumulo on CDH Installation Guide.

Flume

Component	Java Heap	CPU	Disk
Flume	Minimum: 1 GB Maximum 4 GB Java Heap size should be greater than the maximum channel capacity Set this value using the Java Heap Size of Agent in Bytes Flume configuration property. See Flume Memory Consumption	Calculate the number of cores using the following formula: (Number of sources + Number of sinks ) / 2	Multiple disks are recommended for file channels, either a JBOD setup or RAID10 (preferred due to increased reliability).

Component

Java Heap

CPU

Disk

Flume

Minimum: 1 GB
Maximum 4 GB
Java Heap size should be greater than the maximum channel capacity

Set this value using the Java Heap Size of Agent in Bytes Flume configuration property.

See Flume Memory Consumption

Calculate the number of cores using the following formula:

(Number of sources + Number of sinks ) / 2

Multiple disks are recommended for file channels, either a JBOD setup or RAID10 (preferred due to increased reliability).

HDFS

HDFS
Component	Memory	CPU	Disk
JournalNode	1 GB (default) Set this value using the Java Heap Size of JournalNode in Bytes HDFS configuration property.	1 core minimum	1 dedicated disk
NameNode	Minimum: 1 GB (for proof-of-concept deployments) Add an additional 1 GB for each additional 1,000,000 blocks Snapshots and encryption can increase the required heap memory. See Sizing NameNode Heap Memory Set this value using the Java Heap Size of NameNode in Bytes HDFS configuration property.	Minimum of 4 dedicated cores; more may be required for larger clusters	Minimum of 2 dedicated disks for metadata 1 dedicated disk for log files (This disk may be shared with the operating system.) Maximum disks: 4
DataNode	Minimum: 4 GB Increase the memory for higher replica counts or a higher number of blocks per DataNode. When increasing the memory, Cloudera recommends an additional 1 GB of memory for every 1 million replicas above 4 million on the DataNodes. For example, 5 million replicas require 5 GB of memory. Set this value using the Java Heap Size of DataNode in Bytes HDFS configuration property.	Minimum: 4 cores. Add more cores for highly active clusters.	Minimum: 4 Maximum: 24 The maximum acceptable size will vary depending upon how large average block size is. The DN’s scalability limits are mostly a function of the number of replicas per DN, not the overall number of bytes stored. That said, having ultra-dense DNs will affect recovery times in the event of machine or rack failure. Cloudera does not support exceeding 100 TB per data node. You could use 12 x 8 TB spindles or 24 x 4TB spindles. Cloudera does not support drives larger than 8 TB.

HBase

Component	Java Heap	CPU	Disk
Master	100-10,000 regions: 4 GB 10,000 or more regions with 200 or more Region Servers: 8 GB 10,000 or more regions with 300 or more Region Servers: 12 GB Set this value using the Java Heap Size of HBase Master in Bytes HBase configuration property.	Minimum 4 dedicated cores. You can add more cores for larger clusters, when using replication, or for bulk loads.	1 disk for local logs, which can be shared with the operating system and/or other Hadoop logs
Region Server	Minimum: 8 GB Medium-scale production: 16 GB Heap larger than 16 GB requires special Garbage Collection tuning. See Configuring the HBase BlockCache Set this value using the Java Heap Size of HBase RegionServer in Bytes HBase configuration property.	Minimum: 4 dedicated cores	4 disks for each DataNode
Thrift Server	1 GB - 4 GB Set this value using the Java Heap Size of HBase Thrift Server in Bytes HBase configuration property.	Minimum 2 dedicated cores.	1 disk for local logs, which can be shared with the operating system and other Hadoop logs.

Component

Java Heap

CPU

Disk

Master

100-10,000 regions: 4 GB
10,000 or more regions with 200 or more Region Servers: 8 GB
10,000 or more regions with 300 or more Region Servers: 12 GB

Set this value using the Java Heap Size of HBase Master in Bytes HBase configuration property.

Minimum 4 dedicated cores. You can add more cores for larger clusters, when using replication, or for bulk loads.

1 disk for local logs, which can be shared with the operating system and/or other Hadoop logs

Region Server

Minimum: 8 GB
Medium-scale production: 16 GB
Heap larger than 16 GB requires special Garbage Collection tuning. See Configuring the HBase BlockCache

Set this value using the Java Heap Size of HBase RegionServer in Bytes HBase configuration property.

Minimum: 4 dedicated cores

4 disks for each DataNode

Thrift Server

1 GB - 4 GB

Set this value using the Java Heap Size of HBase Thrift Server in Bytes HBase configuration property.

Minimum 2 dedicated cores.

1 disk for local logs, which can be shared with the operating system and other Hadoop logs.

Hive

Component	Java Heap		CPU	Disk
HiveServer 2	Single Connection	4 GB	Minimum 4 dedicated cores	Minimum 1 disk This disk is required for the following: HiveServer2 log files `stdout` and `stderr` output files Configuration files Operation logs stored in the `operation_logs_dir` directory, which is configurable Any temporary files that might be created by local map tasks under the `/tmp` directory
	2-10 connections	4-6 GB
	11-20 connections	6-12 GB
	21-40 connections	12-16 GB
	41 to 80 connections	16-24 GB
	Cloudera recommends splitting HiveServer2 into multiple instances and load balancing them once you start allocating more than 12 GB to HiveServer2. The objective is to adjust the size to reduce the impact of Java garbage collection on active processing by the service.
	Set this value using the Java Heap Size of HiveServer2 in Bytes Hive configuration property. For more information, see Tuning Hive in CDH.
Hive Metastore	Single Connection	4 GB	Minimum 4 dedicated cores	Minimum 1 disk This disk is required so that the Hive metastore can store the following artifacts: Logs Configuration files Backend database that is used to store metadata if the database server is also hosted on the same node
	2-10 connections	4-10 GB
	11-20 connections	10-12 GB
	21-40 connections	12-16 GB
	41 to 80 connections	16-24 GB
	Set this value using the Java Heap Size of Hive Metastore Server in Bytes Hive configuration property. For more information, see Tuning Hive in CDH.
Beeline CLI	Minimum: 2 GB		N/A	N/A

Hive on Spark Executor Nodes

Component	Memory	CPU	Disk
Hive-on-Spark	Minimum: 16 GB Recommended: 32 GB for larger data sizes Individual executor heaps should be no larger than 16 GB so machines with more RAM can use multiple executors.	Minimum: 4 cores Recommended: 8 cores for larger data sizes	Disk space requirements are driven by scratch space requirements for Spark spill.
For more information on how to reserve YARN cores and memory that will be used by Spark executors, refer to Tuning Apache Hive on Spark in CDH.

Component

Memory

CPU

Disk

Hive-on-Spark

Minimum: 16 GB
Recommended: 32 GB for larger data sizes

Individual executor heaps should be no larger than 16 GB so machines with more RAM can use multiple executors.

Minimum: 4 cores
Recommended: 8 cores for larger data sizes

Disk space requirements are driven by scratch space requirements for Spark spill.

For more information on how to reserve YARN cores and memory that will be used by Spark executors, refer to Tuning Apache Hive on Spark in CDH.

HSM KMS

Component	Memory	CPU	Disk
Navigator HSM KMS	16 GB RAM	Minimum: 2 GHz 64-bit quad core	40 GB, using moderate to high-performance drives.

Hue

Component	Memory	CPU	Disk
Hue Server	Minimum: 4 GB Maximum 10 GB If the cluster uses the Hue load balancer, add additional memory	Minimum: 1 Core to run Django When Hue is configured for high availability, add additional cores	Minimum: 10 GB for the database, which grows proportionally according to the cluster size and workloads. When Hue is configured for high availability, add temp space is required

For more information about Hue high availability, see How to Add a Hue Load Balancer.

Impala

Sizing requirements for Impala can vary significantly depending on the size and types of workloads using Impala.

Component	Native Memory	JVM Heap	CPU	Disk
Impala Daemon	Set this value using the Impala Daemon Memory Limit configuration property. Minimum: 32 GB Recommended: 128 GB	Use the `JAVA_TOOL_OPTIONS` environment variable to set the maximum heap size for the Coordinator Impala Daemons. Minimum: 4 GB Recommended: 8 GB	Minimum: 4 Recommended: 16 or more CPU instruction set: AVX2	Minimum: 1 disk Recommended: 8 or more
Catalog Server	Set the maximum heap size for the Catalog Server: Minimum: 4 GB Recommended: 8 GB In Cloudera Manager, use the Java Heap Size of Catalog Server in Bytes configuration property (Cloudera Manager 5.7 and higher), or Impala Catalog Server Environment Advanced Configuration Snippet (Safety Valve) (Cloudera Manager 5.6 and lower). If not using Cloudera Manager, use the `JAVA_TOOL_OPTIONS` environment variable. For example, to set it to 8 GB: `JAVA_TOOL_OPTIONS= "-Xmx8g"`	Minimum: 4 Recommended: 16 or more CPU instruction set: AVX2	Minimum and Recommended: 1 disk

Component

Native Memory

JVM Heap

CPU

Disk

Impala Daemon

Set this value using the Impala Daemon Memory Limit configuration property.

Minimum: 32 GB
Recommended: 128 GB

Use the JAVA_TOOL_OPTIONS environment variable to set the maximum heap size for the Coordinator Impala Daemons.

Minimum: 4 GB
Recommended: 8 GB

Minimum: 4
Recommended: 16 or more

CPU instruction set: AVX2

Minimum: 1 disk
Recommended: 8 or more

Catalog Server

Set the maximum heap size for the Catalog Server:

Minimum: 4 GB
Recommended: 8 GB

In Cloudera Manager, use the Java Heap Size of Catalog Server in Bytes configuration property (Cloudera Manager 5.7 and higher), or Impala Catalog Server Environment Advanced Configuration Snippet (Safety Valve) (Cloudera Manager 5.6 and lower).

If not using Cloudera Manager, use the JAVA_TOOL_OPTIONS environment variable. For example, to set it to 8 GB: JAVA_TOOL_OPTIONS= "-Xmx8g"

Minimum: 4
Recommended: 16 or more

CPU instruction set: AVX2

Minimum and Recommended: 1 disk

For the networking topology for multi-rack cluster, Leaf-Spine is recommended for the optimal performance.

Kafka

Kafka requires a fairly small amount of resources, especially with some configuration tuning. By default, Kafka, can run on as little as 1 core and 1GB memory with storage scaled based on requirements for data retention.

CPU is rarely a bottleneck because Kafka is I/O heavy, but a moderately-sized CPU with enough threads is still important to handle concurrent connections and background tasks.

Kafka brokers tend to have a similar hardware profile to HDFS data nodes. How you build them depends on what is important for your Kafka use cases. Use the following guidelines:

To affect performance of these features:	Adjust these parameters:
Message Retention	Disk size
Client Throughput (Producer & Consumer)	Network capacity
Producer throughput	Disk I/O
Consumer throughput	Memory

A common choice for a Kafka node is as follows:

Component	Memory/Java Heap	CPU	Disk
Broker	RAM: 64 GB Recommended Java heap: 4 GB Set this value using the Java Heap Size of Broker Kafka configuration property. See	12- 24 cores	1 HDD For operating system 1 HDD for Zookeeper dataLogDir 10- HDDs, using Raid 10, for Kafka data
MirrorMaker	1 GB heap Set this value using the Java Heap Size of MirrorMaker Kafka configuration property.	1 core per 3-4 streams	No disk space needed on MirrorMaker instance. Destination brokers should have sufficient disk space to store the topics being copied over.

Component

Memory/Java Heap

CPU

Disk

Broker

RAM: 64 GB
Recommended Java heap: 4 GB

Set this value using the Java Heap Size of Broker Kafka configuration property.

See

12- 24 cores

1 HDD For operating system
1 HDD for Zookeeper dataLogDir
10- HDDs, using Raid 10, for Kafka data

MirrorMaker

1 GB heap

Set this value using the Java Heap Size of MirrorMaker Kafka configuration property.

1 core per 3-4 streams

No disk space needed on MirrorMaker instance. Destination brokers should have sufficient disk space to store the topics being copied over.

Networking requirements: Gigabit Ethernet or 10 Gigabit Ethernet

Key Trustee Server

Component	Memory	CPU	Disk
Key Trustee Server Note: KTS requires a additional dedicated resources. For more information, refer to Data at Rest Encryption Requirements.	8 GB	1 GHz 64-bit quad core	20 GB, using moderate to high-performance drives

Key Trustee KMS

Component	Memory	CPU	Disk
Key Trustee KMS Note: Cloudera recommends using machines with capabilities equivalent to your NameNode hosts, with Intel CPUs that support AES-NI for optimum performance.	16 GB	2 GHz 64-bit quad core	40 GB, using moderate to high-performance drives

Kudu

Component	Memory	CPU	Disk
Tablet Server	Minimum: 4 GB Recommended: 10 GB Additional hardware may be required, depending on the workloads running in the cluster. If you are using Impala, see the Impala sizing guidelines.		1 disk for write-ahead log (WAL). Using an SSD drive may improve performance.
Master	Minimum: 256 MB Recommended: 1 GB		1 disk

Component

Memory

CPU

Disk

Tablet Server

Minimum: 4 GB
Recommended: 10 GB

Additional hardware may be required, depending on the workloads running in the cluster. If you are using Impala, see the Impala sizing guidelines.

1 disk for write-ahead log (WAL). Using an SSD drive may improve performance.

Master

Minimum: 256 MB
Recommended: 1 GB

1 disk

For more information, see Kudu Server Management.

Oozie

Component	Java Heap	CPU	Disk
Oozie	Minimum: 1 GB (this is the default set by Cloudera Manager). This is sufficient for less than 10 simultaneous workflows, without forking. If you notice excessive garbage collection, or out-of-memory errors, increase the heap size to 4 GB for medium-size production clusters or to 8 GB for large-size production clusters. Set this value using the Java Heap Size of Oozie Server in Bytes Oozie configuration property.	No resources required	No resources required

Additional tuning:

For workloads with many coordinators that run with complex workflows (a max concurrency reached! warning appears in the log and the Oozie admin -queuedump command shows a large queue):

Increase the value of the oozie.service.CallableQueueService.callable.concurrency property to 50.
Increase the value of the oozie.service.CallableQueueService.threads property to 200.

Do not use a Derby database as a backend database for Oozie.

Search

Component	Java Heap	CPU	Disk
Solr	Small workloads, or evaluations: 16 GB Smaller production environments: 32 GB Larger production environments: 96 GB is sufficient for most clusters. Set this value using the Java Heap Size of Solr Server in Bytes Solr configuration property. See	Minimum: 4 Recommended: 16 for production workloads	No requirement. Solr uses HDFS for storage.

Component

Java Heap

CPU

Disk

Solr

Small workloads, or evaluations: 16 GB
Smaller production environments: 32 GB
Larger production environments: 96 GB is sufficient for most clusters.

Set this value using the Java Heap Size of Solr Server in Bytes Solr configuration property.

See

Minimum: 4
Recommended: 16 for production workloads

No requirement. Solr uses HDFS for storage.

Note the following considerations for determining the optimal amount of heap memory:

Size of searchable material: The more searchable material you have, the more memory you need. All things being equal, 10 TB of searchable data requires more memory than 1 TB of searchable data.
Content indexed in the searchable material: Indexing all fields in a collection of logs, email messages, or Wikipedia entries requires more memory than indexing only the Date Created field.
The level of performance required: If the system must be stable and respond quickly, more memory may help. If slow responses are acceptable, you may be able to use less memory.

For more information refer to Deployment Planning for Cloudera Search.

Sentry

Component	Java Heap	CPU	Disk
Sentry Server	Minimum: 576 MB Recommended: 2.5 GB per million objects in the Hive database (servers, databases, tables, partitions, columns, URIs, and views) Set this value with the Java Heap Size of Sentry Server in Bytes Sentry configuration property.	Minimum: 4

Component

Java Heap

CPU

Disk

Sentry Server

Minimum: 576 MB
Recommended: 2.5 GB per million objects in the Hive database (servers, databases, tables, partitions, columns, URIs, and views)

Set this value with the Java Heap Size of Sentry Server in Bytes Sentry configuration property.

Minimum: 4

For more information about Sentry requirements, see Before You Install Sentry.

Spark

Component	Java Heap	CPU	Disk
Spark History Server	Minimum: 512 MB Set this value using the Java Heap Size of History Server in Bytes Spark configuration property.	1 Important: Cloudera recommends that you adjust the number of CPUs and memory for the Spark History Server based on your specific cluster usage patterns.	Minimum 1 disk for log files.

YARN

Component	Java Heap	CPU	Other Recommendations
Job History Server	Minimum: 1 GB Increase memory by 1.6 GB for each 100,000 tasks kept in memory. For example: 5 jobs @ 100, 000 mappers + 20,000 reducers = 600,000 total tasks requiring 9.6 GB of heap. See the Other Recommendations column for additional tuning suggestions. Set this value using the Java Heap Size of JobHistory Server in Bytes YARN configuration property.	Minimum: 1 core	Set the `mapreduce.jobhistory.jhist.format` property to binary (history files will load about 2x-3x faster with this setting). Available in CDH 5.5.0 or higher only. Set the `mapreduce.jobhistory.loadedtasks.cache.size` property to a total loaded task count. Using the example in the Java Heap column to the left, of 650,000 total tasks, you can set it to 700,000 to allow for some safety margin. This should also prevent the JobHistoryServer from hanging during garbage collection, since the job count limit does not have a task limit. It is strongly recommended to enable G1GC if the heap is large (larger than 10 GB). It is strongly recommended to set `mapreduce.jobhistory.loadedjob.tasks.max` so that jobs that have too many tasks will not be loaded and cached in memory. Such jobs can be retrieved by using the command line: `mapred job -history <jobId>`
NodeManager	Minimum: 1 GB Configure additional heap memory for the following conditions: Large number of containers Large shuffle sizes in Spark or MapReduce Set this value using the Java Heap Size of NodeManager in Bytes YARN configuration property.	Minimum: 8-16 cores Recommended: 32-64 cores	Disks: Minimum: 8 disks Recommended: 12 or more disks Networking: Minimum: Dual 1Gbps or faster Recommended: Single/Dual 10 Gbps or faster
ResourceManager	Minimum: 1 GB Configure additional heap memory for the following conditions: More jobs Larger cluster size Number of retained finished applications (configured with the `yarn.resourcemanager.max-completed-applications` property. Scheduler configuration Set this value using the Java Heap Size of ResourceManager in Bytes YARN configuration property.	Minimum: 1 core
Other Settings	Set the ApplicationMaster Memory YARN configuration property to 512 MB Set the Container Memory Minimum YARN configuration property to 1 GB.	N/A	N/A

For more information, see Tuning YARN.

ZooKeeper

Component	Java Heap	CPU	Disk
ZooKeeper Server	Minimum: 1 GB Increase heap size when watching 10,000 - 100,000 ephemeral znodes and are using 1,000 or more clients. Set this value using the Java Heap Size of ZooKeeper Server in Bytes ZooKeeper configuration property.	Minimum: 4 cores	ZooKeeper was not designed to be a low-latency service and does not benefit from the use of SSD drives. The ZooKeeper access patterns – append-only writes and sequential reads – were designed with spinning disks in mind. Therefore Cloudera recommends using HDD drives.

Component

Java Heap

CPU

Disk

ZooKeeper Server

Minimum: 1 GB
Increase heap size when watching 10,000 - 100,000 ephemeral znodes and are using 1,000 or more clients.

Set this value using the Java Heap Size of ZooKeeper Server in Bytes ZooKeeper configuration property.

Minimum: 4 cores

ZooKeeper was not designed to be a low-latency service and does not benefit from the use of SSD drives. The ZooKeeper access patterns – append-only writes and sequential reads – were designed with spinning disks in mind. Therefore Cloudera recommends using HDD drives.

Additional information:

Requirements and Supported Versions

Version and Download Information