Cloudera Enterprise 5.2.x | Other versions

MapReduce Properties in CDH 5.0.0

Role groups:

failovercontrollerdefaultgroup
gatewaydefaultgroup
jobtrackerdefaultgroup
service_wide
tasktrackerdefaultgroup

failovercontrollerdefaultgroup

Advanced

Display Name	Description	Related Name	Default Value	API Name	Required
Java Configuration Options for Failover Controller	These arguments will be passed as part of the Java command line. Commonly, garbage collection flags or extra debugging flags would be passed here.			`failover_controller_java_opts`	false
Failover Controller Advanced Configuration Snippet (Safety Valve) for mapred-site.xml	For advanced use only, a string to be inserted into mapred-site.xml for this role only.			`fc_config_safety_valve`	false
Failover Controller Logging Advanced Configuration Snippet (Safety Valve)	For advanced use only, a string to be inserted into log4j.properties for this role only.			`log4j_safety_valve`	false
Heap Dump Directory	Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it doesn't exist. However, if this directory already exists, role user must have write access to this directory. If this directory is shared amongst multiple roles, it should have 1777 permissions. Note that the heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role.		/tmp	`oom_heap_dump_dir`	false
Dump Heap When Out of Memory	When set, generates heap dump file when java.lang.OutOfMemoryError is thrown.		false	`oom_heap_dump_enabled`	true
Kill When Out of Memory	When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown.		true	`oom_sigkill_enabled`	true
Automatically Restart Process	When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure.		false	`process_auto_restart`	true

Logs

Display Name	Description	Related Name	Default Value	API Name	Required
Failover Controller Log Directory	Directory where Failover Controller will place its log files.		/var/log/hadoop-0.20-mapreduce	`failover_controller_log_dir`	false
Failover Controller Logging Threshold	The minimum log level for Failover Controller logs		INFO	`log_threshold`	false
Failover Controller Maximum Log File Backups	The maximum number of rolled log files to keep for Failover Controller logs. Typically used by log4j.		10	`max_log_backup_index`	false
Failover Controller Max Log Size	The maximum size, in megabytes, per log file for Failover Controller logs. Typically used by log4j.		200 MiB	`max_log_size`	false

Monitoring

Display Name	Description	Related Name	Default Value	API Name	Required
Enable Health Alerts for this Role	When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold		true	`enable_alerts`	false
Enable Configuration Change Alerts	When set, Cloudera Manager will send alerts when this entity's configuration changes.		false	`enable_config_alerts`	false
File Descriptor Monitoring Thresholds	The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit.		Warning: 50.0 %, Critical: 70.0 %	`failovercontroller_fd_thresholds`	false
Failover Controller Host Health Test	When computing the overall Failover Controller health, consider the host's health.		true	`failovercontroller_host_health_enabled`	false
Failover Controller Process Health Test	Enables the health test that the Failover Controller's process state is consistent with the role configuration		true	`failovercontroller_scm_health_enabled`	false
Log Directory Free Space Monitoring Absolute Thresholds	The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory.		Warning: 10 GiB, Critical: 5 GiB	`log_directory_free_space_absolute_thresholds`	false
Log Directory Free Space Monitoring Percentage Thresholds	The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured.		Warning: Never, Critical: Never	`log_directory_free_space_percentage_thresholds`	false
Rules to Extract Events from Log Files	This file contains the rules which govern how log messages are turned into events by the custom log4j appender that this role loads. It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. Each rule has some or all of the following fields: `alert` - whether or not events generated from this rule should be promoted to alerts. A value of "true" will cause alerts to be generated. If not specified, the default is "false". `rate` (mandatory) - the maximum number of log messages matching this rule that may be sent as events every minute. If more than rate matching log messages are received in a single minute, the extra messages are ignored. If rate is less than 0, the number of messages per minute is unlimited. `periodminutes` - the number of minutes during which the publisher will only publish rate events or fewer. If not specified, the default is one minute `threshold` - apply this rule only to messages with this log4j severity level or above. An example is "WARN" for warning level messages or higher. `content` - match only those messages whose contents match this regular expression. `exceptiontype` - match only those messages which are part of an exception message. The exception type must match this regular expression. Example:`{"alert": false, "rate": 10, "exceptiontype": "java.lang.StringIndexOutOfBoundsException"}`This rule will send events to Cloudera Manager for every `StringIndexOutOfBoundsException`, up to a maximum of 10 every minute.		version: 0, rules: [ alert: false, rate: 1, periodminutes: 1, threshold:FATAL, alert: false, rate: 1, periodminutes: 2, exceptiontype: .*, alert: false, rate: 1, periodminutes: 1, threshold:WARN ]	`log_event_whitelist`	false
Role Triggers	The configured triggers for this role. This is a JSON formatted list of triggers. These triggers are evaluated as part as the health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has all of the following fields: `triggerName` (mandatory) - the name of the trigger. This value must be unique for the specific role. `triggerExpression` (mandatory) - a tsquery expression representing the trigger. `streamThreshold` (optional) - the maximum number of streams that can satisfy a condition of a trigger before the condition fires. By default set to 0, and any stream returned causes the condition to fire. `enabled` (optional) - by default set to 'true'. If set to 'false' the trigger will not be evaluated. For example, here is a JSON formatted trigger configured for a DataNode that fires if the DataNode has more than 1500 file-descriptors opened:`[{"triggerName": "sample-trigger", "triggerExpression": "IF (SELECT fd_open WHERE roleName=$ROLENAME and last(fd_open) > 1500) DO health:bad", "streamThreshold": 0, "enabled": "true"}]`Consult the trigger rules documentation for more details on how to write triggers using tsquery.The JSON format is evolving and may change in the future and as a result backward compatibility is not guaranteed between releases at this time.		[]	`role_triggers`	true
Unexpected Exits Thresholds	The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role.		Warning: Never, Critical: Any	`unexpected_exits_thresholds`	false
Unexpected Exits Monitoring Period	The period to review when computing unexpected exits.		5 minute(s)	`unexpected_exits_window`	false

Performance

Display Name	Description	Related Name	Default Value	API Name	Required
Maximum Process File Descriptors	If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value.			`rlimit_fds`	false

Ports and Addresses

Display Name	Description	Related Name	Default Value	API Name	Required
Failover controller port	The ZooKeeper failover controller port.	`mapred.ha.zkfc.port`	8018	`mapred_ha_zkfc_port`	false

Resource Management

Display Name	Description	Related Name	Default Value	API Name	Required
Java Heap Size of Failover Controller in Bytes	Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.		256 MiB	`failover_controller_java_heapsize`	false
Cgroup CPU Shares	Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager.	`cpu.shares`	1024	`rm_cpu_shares`	true
Cgroup I/O Weight	Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager.	`blkio.weight`	500	`rm_io_weight`	true
Cgroup Memory Hard Limit	Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit.	`memory.limit_in_bytes`	-1 MiB	`rm_memory_hard_limit`	true
Cgroup Memory Soft Limit	Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit.	`memory.soft_limit_in_bytes`	-1 MiB	`rm_memory_soft_limit`	true

Stacks Collection

Display Name	Description	Related Name	Default Value	API Name	Required
Stacks Collection Data Retention	The amount of stacks data that will be retained. After the retention limit is reached, the oldest data will be deleted.	`stacks_collection_data_retention`	100 MiB	`stacks_collection_data_retention`	false
Stacks Collection Directory	The directory in which stacks logs will be placed. If not set, stacks will be logged into a `stacks` subdirectory of the role's log directory.	`stacks_collection_directory`		`stacks_collection_directory`	false
Stacks Collection Enabled	Whether or not periodic stacks collection is enabled.	`stacks_collection_enabled`	false	`stacks_collection_enabled`	true
Stacks Collection Frequency	The frequency with which stacks will be collected.	`stacks_collection_frequency`	5.0 second(s)	`stacks_collection_frequency`	false
Stacks Collection Method	The method that will be used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected that HTTP endpoint is periodically scraped.	`stacks_collection_method`	jstack	`stacks_collection_method`	false

gatewaydefaultgroup

Advanced

Display Name	Description	Related Name	Default Value	API Name	Required
Deploy Directory	The directory where the client configs will be deployed		/etc/hadoop	`client_config_root_dir`	true
MapReduce Client Advanced Configuration Snippet (Safety Valve) for mapred-site.xml	For advanced use only, a string to be inserted into the client configuration for mapred-site.xml.			`mapreduce_client_config_safety_valve`	false
MapReduce Client Environment Advanced Configuration Snippet for hadoop-env.sh (Safety Valve)	For advanced use only, key-value pairs (one on each line) to be inserted into the client configuration for hadoop-env.sh			`mapreduce_client_env_safety_valve`	false
Client Java Configuration Options	These are Java command line arguments. Commonly, garbage collection flags or extra debugging flags would be passed here.		-Djava.net.preferIPv4Stack=true	`mapreduce_client_java_opts`	false

Compression

Display Name	Description	Related Name	Default Value	API Name	Required
Use Compression on Map Outputs	If enabled, uses compression on the map outputs before they are sent across the network. Will be part of generated client configuration.	`mapred.compress.map.output`	true	`mapred_compress_map_output`	false
Compression Codec of MapReduce Map Output	For MapReduce map outputs that are compressed, specify the compression codec to use. Will be part of generated client configuration.	`mapred.map.output.compression.codec`	org.apache.hadoop.io.compress.SnappyCodec	`mapred_map_output_compression_codec`	false
Compress MapReduce Job Output	Compress the output of MapReduce jobs. Will be part of generated client configuration.	`mapred.output.compress`	false	`mapred_output_compress`	false
Compression Codec of MapReduce Job Output	For MapReduce job outputs that are compressed, specify the compression codec to use. Will be part of generated client configuration.	`mapred.output.compression.codec`	org.apache.hadoop.io.compress.DefaultCodec	`mapred_output_compression_codec`	false
Compression Type of MapReduce Job Output	For MapReduce job outputs that are compressed as SequenceFiles, you can select one of these compression type options: NONE, RECORD or BLOCK. Cloudera recommends BLOCK. Will be part of generated client configuration.	`mapred.output.compression.type`	BLOCK	`mapred_output_compression_type`	false
Compression Level of Codecs	Compression level for the codec used to compress MapReduce outputs. Default compression is a balance between speed and compression ratio.	`zlib.compress.level`	DEFAULT_COMPRESSION	`zlib_compress_level`	false

Jobs

Display Name	Description	Related Name	Default Value	API Name	Required
Number of Tasks to Run per JVM	Number of tasks to run per JVM. If set to -1, there is no limit. Will be part of generated client configuration.	`mapred.job.reuse.jvm.num.tasks`	1	`mapred_job_reuse_jvm_num_tasks`	false
Map Tasks Speculative Execution	If enabled, multiple instances of some map tasks may be executed in parallel.	`mapred.map.tasks.speculative.execution`	false	`mapred_map_tasks_speculative_execution`	false
Number of Map Tasks to Complete Before Reduce Tasks	Fraction of the number of map tasks in the job which should be completed before reduce tasks are scheduled for the job.	`mapred.reduce.slowstart.completed.maps`	0.8	`mapred_reduce_slowstart_completed_maps`	false
Default Number of Reduce Tasks per Job	The default number of reduce tasks per job. Will be part of generated client configuration.	`mapred.reduce.tasks`	1	`mapred_reduce_tasks`	false
Reduce Tasks Speculative Execution	If enabled, multiple instances of some reduce tasks may be executed in parallel.	`mapred.reduce.tasks.speculative.execution`	false	`mapred_reduce_tasks_speculative_execution`	false
Maximum Time to Retain User Logs	The maximum time, in hours, to retain the user logs after job completion.	`mapred.userlog.retain.hours`	1 day(s)	`mapred_userlog_retain_hours`	false

Monitoring

Display Name	Description	Related Name	Default Value	API Name	Required
Enable Configuration Change Alerts	When set, Cloudera Manager will send alerts when this entity's configuration changes.		false	`enable_config_alerts`	false

Other

Display Name	Description	Related Name	Default Value	API Name	Required
Alternatives Priority	The priority level that the client configuration will have in the Alternatives system on the hosts. Higher priority levels will cause Alternatives to prefer this configuration over any others.		91	`client_config_priority`	true
Mapreduce Submit Replication	The replication level for submitted job files.	`mapred.submit.replication`	10	`mapred_submit_replication`	false
Mapreduce Task Timeout	The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string.	`mapred.task.timeout`	10 minute(s)	`mapred_task_timeout`	false

Performance

Display Name	Description	Related Name	Default Value	API Name	Required
I/O Sort Factor	The number of streams to merge at the same time while sorting files. That is, the number of sort heads to use during the merge sort on the reducer side. This determines the number of open file handles. Merging more files in parallel reduces merge sort iterations and improves run time by eliminating disk I/O. Note that merging more files in parallel uses more memory. If 'io.sort.factor' is set too high or the maximum JVM heap is set too low, excessive garbage collection will occur. The Hadoop default is 10, but Cloudera recommends a higher value. Will be part of generated client configuration.	`io.sort.factor`	64	`io_sort_factor`	false
I/O Sort Memory Buffer (MiB)	The total amount of memory buffer, in megabytes, to use while sorting files. Note that this memory comes out of the user JVM heap size (meaning total user JVM heap - this amount of memory = total user usable heap space. Note that Cloudera's default differs from Hadoop's default; Cloudera uses a bigger buffer by default because modern machines often have more RAM. The smallest value across all TaskTrackers will be part of generated client configuration.	`io.sort.mb`	256 MiB	`io_sort_mb`	false
I/O Sort Record Percent	The percentage of 'io.sort.mb' dedicated to tracking record boundaries. If this value is represented as 'r', and 'io.sort.mb' is represented as 'x', then the maximum number of records collected before the collection thread must block is equal to (r * x) / 4. The syntax is in decimal units; the default is 5% and is formatted 0.05. Will be part of generated client configuration.	`io.sort.record.percent`	0.05	`io_sort_record_percent`	false
I/O Sort Spill Percent	The soft limit in either the buffer or record collection buffers. When this limit is reached, a thread will begin to spill the contents to disk in the background. Note that this does not imply any chunking of data to the spill. A value less than 0.5 is not recommended. The syntax is in decimal units; the default is 80% and is formatted 0.8. Will be part of generated client configuration.	`io.sort.spill.percent`	0.8	`io_sort_spill_percent`	false
MapReduce Child Java Opts Base	Java opts for the TaskTracker child processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp pass a value of: "-verbose:gc -Xloggc:/tmp/@taskid@.gc". The configuration variable 'mapred.child.ulimit' can be used to control the maximum virtual memory of the child processes. Note that unlike Hadoop, Cloudera Manager separates the child options into this setting and a separate setting just for the maximum heap size. Will be part of generated client configuration.	`mapred.child.java.opts`	-Djava.net.preferIPv4Stack=true	`mapred_child_java_opts_base`	false
Map Task Java Opts Base	Java opts for the TaskTracker child map processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp pass a value of: "-verbose:gc -Xloggc:/tmp/@taskid@.gc". The configuration variable 'Map Task Maximum Virtual Memory' can be used to control the maximum virtual memory of the map processes. This takes precedence over the generic 'mapred.child.java.opts'. Will be part of generated client configuration.	`mapred.map.child.java.opts`		`mapred_map_task_java_opts`	false
Default Number of Parallel Transfers During Shuffle	The default number of parallel transfers run by reduce during the copy (shuffle) phase. This number should be between sqrt(nodesnumber_of_map_slots_per_node) and nodesnumber_of_map_slots_per_node/2. Will be part of generated client configuration.	`mapred.reduce.parallel.copies`	10	`mapred_reduce_parallel_copies`	false
Reduce Task Java Opts Base	Java opts for the TaskTracker child reduce processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp pass a value of: "-verbose:gc -Xloggc:/tmp/@taskid@.gc". The configuration variable 'Reduce Task Maximum Virtual Memory' can be used to control the maximum virtual memory of the reduce processes. This takes precedence over the generic 'mapred.child.java.opts'. Will be part of generated client configuration.	`mapred.reduce.child.java.opts`		`mapred_reduce_task_java_opts`	false

Resource Management

Display Name	Description	Related Name	Default Value	API Name	Required
MapReduce Child Java Maximum Heap Size	The maximum heap size, in bytes, of the Java child process. This number will be formatted and concatenated with the 'base' setting for 'mapred.child.java.opts' to pass to Hadoop. Will be part of generated client configuration.		1 GiB	`mapred_child_java_opts_max_heap`	false
MapReduce Maximum Virtual Memory (KiB)	The maximum virtual memory, in KiB, of a process launched by the MapReduce framework. This can be used to control both the MapReduce tasks and applications using Hadoop Pipes, Hadoop Streaming, and so on. By default, it is left unspecified to allow administrators to control it via 'limits.conf' and other mechanisms. Note: 'mapred.child.ulimit' must be greater than or equal to approximately 1.5 times the -Xmx passed to JavaVM, or else the VM might not start. Will be part of generated client configuration.	`mapred.child.ulimit`		`mapred_child_ulimit`	false
Map Task Maximum Heap Size	The maximum heap size, in bytes, of the child map processes. This number will be formatted and concatenated with 'Map Task Java Opts Base' to pass to Hadoop. Will be part of generated client configuration.			`mapred_map_task_max_heap`	false
Map Task Maximum Virtual Memory (KiB)	The maximum virtual memory, in KiB, available to map tasks. Note: this must be greater than or equal to the -Xmx passed to the JavaVM via 'Map Task Java Opts', or else the VM might not start. This takes precedence over the generic 'mapred.child.ulimit'. Will be part of generated client configuration.	`mapred.map.child.ulimit`		`mapred_map_task_ulimit`	false
Reduce Task Maximum Heap Size	The maximum heap size, in bytes, of the child reduce processes. This number will be formatted and concatenated with 'Reduce Task Java Opts Base' to pass to Hadoop. Will be part of generated client configuration.			`mapred_reduce_task_max_heap`	false
Reduce Task Maximum Virtual Memory (KiB)	The maximum virtual memory, in KiB, available to reduce tasks. Note: this must be greater than or equal to the -Xmx passed to the JavaVM via 'Map Task Java Opts', or else the VM might not start. This takes precedence over the generic 'mapred.child.ulimit'. Will be part of generated client configuration.	`mapred.reduce.child.ulimit`		`mapred_reduce_task_ulimit`	false
Client Java Heap Size in Bytes	Maximum size in bytes for the Java process heap memory. Passed to Java -Xmx.		256 MiB	`mapreduce_client_java_heapsize`	false

jobtrackerdefaultgroup

Advanced

Display Name	Description	Related Name	Default Value	API Name	Required
Hadoop Metrics Advanced Configuration Snippet (Safety Valve)	Advanced Configuration Snippet (Safety Valve) for Hadoop Metrics. Properties will be inserted into hadoop-metrics.properties for this role only. Note that Cloudera Manager tunes hadoop-metrics.properties to work optimally with its Service Monitoring features. By overriding the default, Cloudera Manager might not be able to provide accurate monitoring information, health tests or alerts.			`hadoop_metrics_safety_valve`	false
JobTracker Advanced Configuration Snippet (Safety Valve) for mapred-site.xml	For advanced use only, a string to be inserted into mapred-site.xml for this role only.			`jobtracker_config_safety_valve`	false
JobTracker Advanced Configuration Snippet (Safety Valve) for mapred_hosts_allow.txt	For advanced use only, a string to be inserted into mapred_hosts_allow.txt for this role only.			`jobtracker_hosts_allow_safety_valve`	false
JobTracker Advanced Configuration Snippet (Safety Valve) for mapred_hosts_exclude.txt	For advanced use only, a string to be inserted into mapred_hosts_exclude.txt for this role only.			`jobtracker_hosts_exclude_safety_valve`	false
Java Configuration Options for JobTracker	These arguments will be passed as part of the Java command line. Commonly, garbage collection flags or extra debugging flags would be passed here.		-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled	`jobtracker_java_opts`	false
JobTracker Logging Advanced Configuration Snippet (Safety Valve)	For advanced use only, a string to be inserted into log4j.properties for this role only.			`log4j_safety_valve`	false
JobTracker Client Connection Retries	The maximum number of times to retry between failovers.	`mapred.client.failover.connection.retries`	0	`mapred_client_failover_connection_retries`	false
JobTracker Client Max Retries	The maximum number of times to retry on timeouts between failovers.	`mapred.client.failover.connection.retries.on.timeouts`	0	`mapred_client_failover_connection_retries_on_timeouts`	false
JobTracker Client Max Failover Attempt	The maximum number of times a client of JobTracker tries to fail over.	`mapred.client.failover.max.attempts`	15	`mapred_client_failover_max_attempts`	false
JobTracker Client Base Sleep	The time in milliseconds to wait before the first failover.	`mapred.client.failover.sleep.base.millis`	500 millisecond(s)	`mapred_client_failover_sleep_base_millis`	false
JobTracker Client Maximum Sleep	The maximum amount of time in milliseconds to wait between failovers (for exponential backoff).	`mapred.client.failover.sleep.max.millis`	1.5 second(s)	`mapred_client_failover_sleep_max_millis`	false
Heap Dump Directory	Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it doesn't exist. However, if this directory already exists, role user must have write access to this directory. If this directory is shared amongst multiple roles, it should have 1777 permissions. Note that the heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role.		/tmp	`oom_heap_dump_dir`	false
Dump Heap When Out of Memory	When set, generates heap dump file when java.lang.OutOfMemoryError is thrown.		false	`oom_heap_dump_enabled`	true
Kill When Out of Memory	When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown.		true	`oom_sigkill_enabled`	true
Automatically Restart Process	When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure.		false	`process_auto_restart`	true

Classes

Display Name	Description	Related Name	Default Value	API Name	Required
Hadoop Socket Factory for Job Submission	Socket Factory to use to connect to a MapReduce master (JobTracker). If null or empty, then use hadoop.rpc.socket.factory.class.default.	`hadoop.rpc.socket.factory.class.JobSubmissionProtocol`		`hadoop_rpc_socket_factory_class_job_submission_protocol`	false
Task Scheduler	The class responsible for scheduling tasks. Cloudera recommends the Fair Scheduler. The JobQueueTaskScheduler is often referred to as the FIFO scheduler.	`mapred.jobtracker.taskScheduler`	org.apache.hadoop.mapred.FairScheduler	`mapred_jobtracker_taskScheduler`	false

Jobs

Display Name	Description	Related Name	Default Value	API Name	Required
Capacity Scheduler Configuration	Enter an XML string that represents the Capacity Scheduler configuration.		<?xml version=1.0?> <!-- This is the configuration file for the resource manager in Hadoop. --> <!-- You can configure various scheduling parameters related to queues. --> <!-- The properties for a queue follow a naming convention, such as, --> <!-- mapred.capacity-scheduler.queue.<queue-name>.property-name. --> <configuration> <property> <name>mapred.capacity-scheduler.queue.default.capacity</name> <value>100</value> <description>Percentage of the number of slots in the cluster that are to be available for jobs in this queue. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.maximum-capacity</name> <value>-1</value> <description> maximum-capacity defines a limit beyond which a queue cannot use the capacity of the cluster. This provides a means to limit how much excess capacity a queue can use. By default, there is no limit. The maximum-capacity of a queue can only be greater than or equal to its minimum capacity. Default value of -1 implies a queue can use complete capacity of the cluster. This property could be to curtail certain jobs which are long running in nature from occupying more than a certain percentage of the cluster, which in the absence of pre-emption, could lead to capacity guarantees of other queues being affected. One important thing to note is that maximum-capacity is a percentage , so based on the cluster's capacity the max capacity would change. So if large no of nodes or racks get added to the cluster , max Capacity in absolute terms would increase accordingly. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.supports-priority</name> <value>false</value> <description>If true, priorities of jobs will be taken into account in scheduling decisions. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.minimum-user-limit-percent</name> <value>100</value> <description> Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if there is competition for them. This user limit can vary between a minimum and maximum value. The former depends on the number of users who have submitted jobs, and the latter is set to this property value. For example, suppose the value of this property is 25. If two users have submitted jobs to a queue, no single user can use more than 50% of the queue resources. If a third user submits a job, no single user can use more than 33% of the queue resources. With 4 or more users, no user can use more than 25% of the queue's resources. A value of 100 implies no user limits are imposed. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.maximum-initialized-jobs-per-user</name> <value>2</value> <description>The maximum number of jobs to be pre-initialized for a user of the job queue. </description> </property> <!-- The default configuration settings for the capacity task scheduler --> <!-- The default values would be applied to all the queues which don't have --> <!-- the appropriate property for the particular queue --> <property> <name>mapred.capacity-scheduler.default-supports-priority</name> <value>false</value> <description>If true, priorities of jobs will be taken into account in scheduling decisions by default in a job queue. </description> </property> <property> <name>mapred.capacity-scheduler.default-minimum-user-limit-percent</name> <value>100</value> <description>The percentage of the resources limited to a particular user for the job queue at any given point of time by default. </description> </property> <property> <name>mapred.capacity-scheduler.default-maximum-initialized-jobs-per-user</name> <value>2</value> <description>The maximum number of jobs to be pre-initialized for a user of the job queue. </description> </property> <!-- Capacity scheduler Job Initialization configuration parameters --> <property> <name>mapred.capacity-scheduler.init-poll-interval</name> <value>5000</value> <description>The amount of time in miliseconds which is used to poll the job queues for jobs to initialize. </description> </property> <property> <name>mapred.capacity-scheduler.init-worker-threads</name> <value>5</value> <description>Number of worker threads which would be used by Initialization poller to initialize jobs in a set of queue. If number mentioned in property is equal to number of job queues then a single thread would initialize jobs in a queue. If lesser then a thread would get a set of queues assigned. If the number is greater then number of threads would be equal to number of job queues. </description> </property> </configuration>	`mapred_capacity_scheduler_configuration`	false
Fair Scheduler Allocation	Enter an XML string that represents the Fair Scheduler allocation pools.		<?xml version=1.0?> <allocations> </allocations>	`mapred_fairscheduler_allocation`	false
Fair Scheduler Allow Undeclared Pools	Enable job submission to pools not declared in the allocation file.	`mapred.fairscheduler.allow.undeclared.pools`	true	`mapred_fairscheduler_allow_undeclared_pools`	false
Fair Scheduler Assign Multiple Tasks	Allows the Fair Scheduler to assign both a map task and a reduce task on each Cloudera Agent heartbeat, which improves cluster throughput when there are many small tasks to run.	`mapred.fairscheduler.assignmultiple`	true	`mapred_fairscheduler_assignmultiple`	false
Fair Scheduler Pool Name Property	Specify the 'jobconf' property that determines the pool that a job belongs in. The default is 'user.name' (one pool for each user). If you want to use MapReduce's "queue" system to enable authorization for the Fair Scheduler, specify 'mapred.job.queue.name'. This requires adding the Fair Scheduler's pool names to 'mapred.queue.names' and users to submit jobs using the 'mapred.job.queue.name' property instead of the 'mapred.fairscheduler.pool' property. Note that 'mapred.fairscheduler.poolnameproperty' is used only for jobs in which 'mapred.fairscheduler.pool' is not explicitly set.	`mapred.fairscheduler.poolnameproperty`	user.name	`mapred_fairscheduler_poolnameproperty`	false
Fair Scheduler Preemption	Enables Fair Scheduler preemption. If a pool's minimum share is not met for some period of time, the Fair Scheduler optionally supports preemption of jobs in other pools. The pool will be allowed to kill tasks from other pools to make room to run. Preemption can be used to guarantee that production jobs are not starved while also allowing the Hadoop cluster to be used for experimental and research jobs. In addition, a pool can also be allowed to preempt tasks if it is below half of its fair share for a configurable timeout (generally set larger than the minimum share preemption timeout). When choosing tasks to kill, the Fair Scheduler picks the most-recently launched tasks from over-allocated jobs, to minimize wasted computation. Preemption does not cause the preempted jobs to fail because Hadoop jobs tolerate losing tasks; it only makes them take longer to finish.	`mapred.fairscheduler.preemption`	false	`mapred_fairscheduler_preemption`	false
Fair Scheduler Weight Adjuster	An extension point that lets you specify a class to adjust the weights of running jobs. This class should implement the WeightAdjuster interface. There is currently one example implementation - NewJobWeightBooster, which increases the weight of jobs for their first 5 minutes to let short jobs finish faster. To use it, set the weightadjuster property to the full classname, org.apache.hadoop.mapred.NewJobWeightBooster. NewJobWeightBooster itself provides two parameters for setting the duration and boost factor. mapred.newjobweightbooster.factor: Factor by which new jobs weight should be boosted. Default is 3. mapred.newjobweightbooster.duration: Boost duration in milliseconds. Default is 300000 for 5 minutes.	`mapred.fairscheduler.weight.adjuster`		`mapred_fairscheduler_weight_adjuster`	false
Persist JobTracker Job Status	If enabled, job status information is persisted.	`mapred.job.tracker.persist.jobstatus.active`	false	`mapred_job_tracker_persist_jobstatus_active`	false
Directory for JobTracker Job Status Persistence	The HDFS directory in which job status information is kept persistently. The directory must exist and be owned by the mapred user.	`mapred.job.tracker.persist.jobstatus.dir`	/jobtracker/jobsInfo	`mapred_job_tracker_persist_jobstatus_dir`	false
Time Limit of JobTracker Job Status Persistence	The number of hours job status information is persisted in HDFS. The job status information will be available after it drops out of the memory queue and between JobTracker restarts. If zero is specified for this property, the job status information is not persisted.	`mapred.job.tracker.persist.jobstatus.hours`	0	`mapred_job_tracker_persist_jobstatus_hours`	false
Maximum Completed User Jobs	The maximum number of completed jobs per user to retain before delegating them to the job history.	`mapred.jobtracker.completeuserjobs.maximum`	5	`mapred_jobtracker_completeuserjobs_maximum`	false
Enable Job Recovery Upon Restart	Enables job recovery upon restart. If the property is set to true, then if and when the JobTracker stops while a job is running, it will resubmit the job on restart.	`mapred.jobtracker.restart.recover`	false	`mapred_jobtracker_restart_recover`	false
JobTracker Retire Job Interval (milliseconds)	Number of milliseconds job history objects are kept.	`mapred.jobtracker.retirejob.interval`	86400000	`mapred_jobtracker_retirejob_interval`	false
MapReduce Queue Names	Comma separated list of queues configured for the JobTracker in this service instance. Jobs are added to queues. Schedulers can configure different scheduling properties for the queues specified in this list. You can configure queue properties that are common to all schedulers, by using the naming convention 'mapred.queue.$QUEUE-NAME.$PROPERTY-NAME' in this property (for example, 'mapred.queue.default.submit-job-acl'). The number of queues configured in this property depends on the type of scheduler specified in 'mapred.jobtracker.taskScheduler'. The default scheduler JobQueueTaskScheduler supports a single queue only. Before adding more queues to this property, make sure that the scheduler in 'mapred.jobtracker.taskScheduler' supports multiple queues. This property can also be populated with the Fair Scheduler's pool names to enable authorization of the Fair Scheduler. This requires setting 'mapred.fairscheduler.poolnameproperty' to 'mapred.job.queue.name' and users to submit jobs to the right queue by setting the 'mapred.job.queue.name' property in their jobs.	`mapred.queue.names`	default	`mapred_queue_names_list`	false

Logs

Display Name	Description	Related Name	Default Value	API Name	Required
JobTracker Log Directory	Directory where JobTracker will place its log files.	`hadoop.log.dir`	/var/log/hadoop-0.20-mapreduce	`jobtracker_log_dir`	false
JobTracker Logging Threshold	The minimum log level for JobTracker logs		INFO	`log_threshold`	false
JobTracker Maximum Log File Backups	The maximum number of rolled log files to keep for JobTracker logs. Typically used by log4j.		10	`max_log_backup_index`	false
JobTracker Max Log Size	The maximum size, in megabytes, per log file for JobTracker logs. Typically used by log4j.		200 MiB	`max_log_size`	false

Metrics

Display Name	Description	Related Name	Default Value	API Name	Required
Hadoop Metrics Class	Implementation daemons will use to report some internal statistics. The default (NoEmitMetricsContext) will display metrics on /metrics on the status port. The GangliaContext and GangliaContext31 classes will report metrics to your specified Ganglia Monitoring Daemons (gmond). The ganglia wire format changed incompatibly at version 3.1.0. If you are running any version of ganglia 3.1.0 or newer, use the GangliaContext31 metric class; otherwise, use the GangliaContext metric class.		org.apache.hadoop.metrics.spi.NoEmitMetricsContext	`hadoop_metrics_class`	false
Hadoop Metrics Output Directory	If using FileContext, directory to write metrics to.		/tmp/metrics	`hadoop_metrics_dir`	false
Hadoop Metrics Ganglia Servers	If using GangliaContext, a comma-delimited list of host:port pairs pointing to 'gmond' servers you would like to publish metrics to. In practice, this set of 'gmond' should match the set of 'gmond' in your 'gmetad' datasource list for the cluster.			`hadoop_metrics_ganglia_servers`	false

Monitoring

Display Name	Description	Related Name	Default Value	API Name	Required
Enable Health Alerts for this Role	When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold		true	`enable_alerts`	false
Enable Configuration Change Alerts	When set, Cloudera Manager will send alerts when this entity's configuration changes.		false	`enable_config_alerts`	false
File Descriptor Monitoring Thresholds	The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit.		Warning: 50.0 %, Critical: 70.0 %	`jobtracker_fd_thresholds`	false
Garbage Collection Duration Thresholds	The health test thresholds for the weighted average time spent in Java garbage collection. Specified as a percentage of elapsed wall clock time.		Warning: 30.0, Critical: 60.0	`jobtracker_gc_duration_thresholds`	false
Garbage Collection Duration Monitoring Period	The period to review when computing the moving average of garbage collection time.		5 minute(s)	`jobtracker_gc_duration_window`	false
JobTracker Host Health Test	When computing the overall JobTracker health, consider the host's health.		true	`jobtracker_host_health_enabled`	false
JobTracker Process Health Test	Enables the health test that the JobTracker's process state is consistent with the role configuration		true	`jobtracker_scm_health_enabled`	false
Health Check Startup Tolerance	The amount of time allowed after this role is started that failures of health checks that rely on communication with this role will be tolerated.		5 minute(s)	`jobtracker_startup_tolerance`	false
Web Metric Collection	Enables the health test that the Cloudera Manager Agent can successfully contact and gather metrics from the web server.		true	`jobtracker_web_metric_collection_enabled`	false
Web Metric Collection Duration	The health test thresholds on the duration of the metrics request to the web server.		Warning: 10 second(s), Critical: Never	`jobtracker_web_metric_collection_thresholds`	false
Log Directory Free Space Monitoring Absolute Thresholds	The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory.		Warning: 10 GiB, Critical: 5 GiB	`log_directory_free_space_absolute_thresholds`	false
Log Directory Free Space Monitoring Percentage Thresholds	The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured.		Warning: Never, Critical: Never	`log_directory_free_space_percentage_thresholds`	false
Rules to Extract Events from Log Files	This file contains the rules which govern how log messages are turned into events by the custom log4j appender that this role loads. It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. Each rule has some or all of the following fields: `alert` - whether or not events generated from this rule should be promoted to alerts. A value of "true" will cause alerts to be generated. If not specified, the default is "false". `rate` (mandatory) - the maximum number of log messages matching this rule that may be sent as events every minute. If more than rate matching log messages are received in a single minute, the extra messages are ignored. If rate is less than 0, the number of messages per minute is unlimited. `periodminutes` - the number of minutes during which the publisher will only publish rate events or fewer. If not specified, the default is one minute `threshold` - apply this rule only to messages with this log4j severity level or above. An example is "WARN" for warning level messages or higher. `content` - match only those messages whose contents match this regular expression. `exceptiontype` - match only those messages which are part of an exception message. The exception type must match this regular expression. Example:`{"alert": false, "rate": 10, "exceptiontype": "java.lang.StringIndexOutOfBoundsException"}`This rule will send events to Cloudera Manager for every `StringIndexOutOfBoundsException`, up to a maximum of 10 every minute.		version: 0, rules: [ alert: false, rate: 1, periodminutes: 1, threshold:FATAL, alert: false, rate: 0, threshold:WARN, content: .* is deprecated. Instead, use ., alert: false, rate: 0, threshold:WARN, content: . is deprecated. Use .* instead, alert: false, rate: 0, exceptiontype: java.io.IOException, alert: false, rate: 0, exceptiontype: java.net.SocketException, alert: false, rate: 0, exceptiontype: java.net.SocketClosedException, alert: false, rate: 0, exceptiontype: java.io.EOFException, alert: false, rate: 0, exceptiontype: java.nio.channels.CancelledKeyException, alert: false, rate: 1, periodminutes: 2, exceptiontype: ., alert: false, rate: 0, threshold:WARN, content:Unknown job [^ ]+ being deleted., alert: false, rate: 0, threshold:WARN, content:Error executing shell command .+ No such process.+, alert: false, rate: 0, threshold:WARN, content:.attempt to override final parameter.+, alert: false, rate: 0, threshold:WARN, content:[^ ]+ is a deprecated filesystem name. Use., alert: false, rate: 5, periodminutes: 1, threshold:WARN ]	`log_event_whitelist`	false
Role Triggers	The configured triggers for this role. This is a JSON formatted list of triggers. These triggers are evaluated as part as the health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has all of the following fields: `triggerName` (mandatory) - the name of the trigger. This value must be unique for the specific role. `triggerExpression` (mandatory) - a tsquery expression representing the trigger. `streamThreshold` (optional) - the maximum number of streams that can satisfy a condition of a trigger before the condition fires. By default set to 0, and any stream returned causes the condition to fire. `enabled` (optional) - by default set to 'true'. If set to 'false' the trigger will not be evaluated. For example, here is a JSON formatted trigger configured for a DataNode that fires if the DataNode has more than 1500 file-descriptors opened:`[{"triggerName": "sample-trigger", "triggerExpression": "IF (SELECT fd_open WHERE roleName=$ROLENAME and last(fd_open) > 1500) DO health:bad", "streamThreshold": 0, "enabled": "true"}]`Consult the trigger rules documentation for more details on how to write triggers using tsquery.The JSON format is evolving and may change in the future and as a result backward compatibility is not guaranteed between releases at this time.		[]	`role_triggers`	true
Unexpected Exits Thresholds	The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role.		Warning: Never, Critical: Any	`unexpected_exits_thresholds`	false
Unexpected Exits Monitoring Period	The period to review when computing unexpected exits.		5 minute(s)	`unexpected_exits_window`	false

Other

Display Name	Description	Related Name	Default Value	API Name	Required
JobTracker Logical Name	For High Availability, this is the logical name for the JobTracker active-standby pair. This name is serialized as part of the path of the ZooKeeper node storing high availibility data. Renaming the JobTracker requires re-initializating the ZooKeeper state.		logicaljt	`job_tracker_name`	false
JobTracker Local Data Directory	Directory on the local filesystem where the JobTracker stores job configuration data. Directories that do not exist are ignored. A single directory is sufficient; a list of multiple directories will not cause problems.	`mapred.local.dir`		`jobtracker_mapred_local_dir_list`	true
Job History Files Cleaner Interval	Time interval for history cleaner to check for files to delete. Files are only deleted if they are older than mapreduce.jobhistory.max-age-ms.	`mapreduce.jobhistory.cleaner.interval`	1 day(s)	`mapreduce_jobhistory_cleaner_interval`	false
Job History Files Maximum Age	Job history files older than this time duration will deleted when the history cleaner runs.	`mapreduce.jobhistory.max-age-ms`	7 day(s)	`mapreduce_jobhistory_max_age_ms`	false

Paths

Display Name	Description	Related Name	Default Value	API Name	Required
Running Job History Location	Location to store the job history files of running jobs. This is a path on the host where the JobTracker is running.	`hadoop.job.history.location`	/var/log/hadoop-0.20-mapreduce/history	`hadoop_job_history_dir`	false
Completed Job History Location	Location to store the job history files of completed jobs. If a location is not specified, the job history files of completed jobs are stored in a subdirectory of the 'Running Job History Location'. If set, completed jobs will be moved into this directory in HDFS.	`mapred.job.tracker.history.completed.location`		`mapred_job_tracker_history_completed_dir`	false
MapReduce JobTracker Staging Root Directory	The root HDFS directory of the staging area for users' MapReduce jobs; for example /user. The staging directories are always named after the user.	`mapreduce.jobtracker.staging.root.dir`	/user	`mapreduce_jobtracker_staging_root_dir`	false

Performance

Display Name	Description	Related Name	Default Value	API Name	Required
Hue Thrift Server Max Threadcount	Maximum number of running threads for the Hue Thrift server running on the Jobtracker	`dfs.thrift.threads.max`	20	`dfs_thrift_threads_max`	false
Hue Thrift Server Min Threadcount	Minimum number of running threads for the Hue Thrift server running on the Jobtracker	`dfs.thrift.threads.min`	10	`dfs_thrift_threads_min`	false
Hue Thrift Server Timeout	Timeout in seconds for the Hue Thrift server running on the Jobtracker	`dfs.thrift.timeout`	60	`dfs_thrift_timeout`	false
JobTracker Handler Count	The number of server threads for the JobTracker. This should be approximately 20 * ln(the number of TaskTracker nodes).	`mapred.job.tracker.handler.count`	10	`mapred_job_tracker_handler_count`	false
Maximum Tasks per Job	The maximum number of tasks for a single job. Use a value of -1 B to specify no maximum. Note that allowing jobs with a large number of tasks increases memory usage by the JobTracker.	`mapred.jobtracker.maxtasks.per.job`		`mapred_jobtracker_maxtasks_per_job`	false
User JobConf Limit	The maximum allowed size of the user jobconf.	`mapred.user.jobconf.limit`	5 MiB	`mapred_user_jobconf_limit`	false
JobTracker MetaInfo Maxsize	The maximum permissible size of the split metainfo file. The JobTracker won't attempt to read split metainfo files bigger than the configured value. No limits if set to -1.	`mapreduce.jobtracker.split.metainfo.maxsize`	10000000	`mapreduce_jobtracker_split_metainfo_maxsize`	false
Maximum Process File Descriptors	If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value.			`rlimit_fds`	false

Plugins

Display Name	Description	Related Name	Default Value	API Name	Required
Enable JobTracker Plugins Required for Hue	If enabled, adds 'org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin' to the 'mapred.jobtracker.plugins' configuration. This property must be enabled to allow Hue to operate.		true	`hue_jobtracker_plugin`	false
MapReduce JobTracker Plugins	mapred.jobtracker.plugins: Comma-separated list of JobTracker plugins to be activated. If one plugin cannot be loaded, all plugins are ignored. Note that there are separate controls below to enable the Hue Thrift plugin.	`mapred.jobtracker.plugins`		`mapred_jobtracker_plugins_list`	false

Ports and Addresses

Display Name	Description	Related Name	Default Value	API Name	Required
JobTracker Port for HA	Port of the High Availability service protocol for the JobTracker. The JobTracker listens on a separate port for High Availability operations which is why this property exists in addition to 'mapred.job.tracker'.	`mapred.ha.job.tracker`	8023	`ha_job_tracker_port`	false
Bind JobTracker to Wildcard Address	If enabled, the JobTracker binds to the wildcard address ("0.0.0.0") on all of its ports.		false	`job_tracker_bind_wildcard`	false
JobTracker Port	Port for the internal JobTracker protocol.	`mapred.job.tracker`	8021	`job_tracker_port`	false
JobTracker HTTP Server Address	The address where the JobTracker HTTP server listens. The default address, 0.0.0.0, binds to all interfaces.		0.0.0.0	`mapred_job_tracker_http_host`	false
JobTracker HTTP Server Port	The port where the JobTracker HTTP server listens. If the port is 0, the server starts on a free port.	`mapred.job.tracker.http.address`	50030	`mapred_job_tracker_http_port`	false
Hue Thrift Plugin Port	Port to use for 'org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin' that is used by Hue's NameNode plugin.	`jobtracker.thrift.address`	9290	`mapred_jobtracker_hue_thrift_plugin_port`	false

Resource Management

Display Name	Description	Related Name	Default Value	API Name	Required
Java Heap Size of Jobtracker in Bytes	Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.		1 GiB	`jobtracker_java_heapsize`	false
Cgroup CPU Shares	Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager.	`cpu.shares`	1024	`rm_cpu_shares`	true
Cgroup I/O Weight	Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager.	`blkio.weight`	500	`rm_io_weight`	true
Cgroup Memory Hard Limit	Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit.	`memory.limit_in_bytes`	-1 MiB	`rm_memory_hard_limit`	true
Cgroup Memory Soft Limit	Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit.	`memory.soft_limit_in_bytes`	-1 MiB	`rm_memory_soft_limit`	true

Security

Display Name	Description	Related Name	Default Value	API Name	Required
Enable MapReduce ACLs	Specifies whether ACLs should be checked for authorization of users who are doing various queue and job-level operations. ACLs are disabled by default. If enabled, the JobTracker and TaskTracker perform access control checks when users make requests for queue and job operations. Examples of queue operations are submitting a job to the queue and killing a job in the queue. Examples of job operations are viewing the job details (mapreduce.job.acl-view-job), modifying the job (mapreduce.job.acl-modify-job), or using MapReduce APIs, RPCs, or the console and web user interfaces.	`mapred.acls.enabled`	false	`mapred_acls_enabled`	false
MapReduce Queue ACLs	String representing an XML file that controls, per queue, which users are allowed to submit and administrate jobs in that queue. The default setting is that all users and groups are allowed to submit jobs to queue 'default' and no users or groups are allowed to administer jobs other than their own that are submitted to queue 'default'.		<?xml version=1.0?> <?xml-stylesheet type=text/xsl href=configuration.xsl?> <configuration> <property> <name>mapred.queue.default.acl-submit-job</name> <value>*</value> </property> <property> <name>mapred.queue.default.acl-administer-jobs</name> <value> </value> </property> </configuration>	`mapred_queue_acls`	false
Web Interface Private Actions	If enabled, administrative actions such as 'kill job' will be displayed in the JobTracker's web interface. These actions can then be triggered by anyone who has access to the web interface.	`webinterface.private.actions`	false	`webinterface_private_actions`	false

Stacks Collection

Display Name	Description	Related Name	Default Value	API Name	Required
Stacks Collection Data Retention	The amount of stacks data that will be retained. After the retention limit is reached, the oldest data will be deleted.	`stacks_collection_data_retention`	100 MiB	`stacks_collection_data_retention`	false
Stacks Collection Directory	The directory in which stacks logs will be placed. If not set, stacks will be logged into a `stacks` subdirectory of the role's log directory.	`stacks_collection_directory`		`stacks_collection_directory`	false
Stacks Collection Enabled	Whether or not periodic stacks collection is enabled.	`stacks_collection_enabled`	false	`stacks_collection_enabled`	true
Stacks Collection Frequency	The frequency with which stacks will be collected.	`stacks_collection_frequency`	5.0 second(s)	`stacks_collection_frequency`	false
Stacks Collection Method	The method that will be used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected that HTTP endpoint is periodically scraped.	`stacks_collection_method`	jstack	`stacks_collection_method`	false

service_wide

Advanced

Display Name	Description	Related Name	Default Value	API Name	Required
System User's Home Directory	The home directory of the system user on the local filesystem. This setting must reflect the system's configured value - only changing it here will not change the actual home directory.		/var/lib/hadoop-mapreduce	`hdfs_user_home_dir`	true
MapReduce Service Advanced Configuration Snippet (Safety Valve) for core-site.xml	For advanced use only, a string to be inserted into core-site.xml. Applies to configurations of all roles in this service except client configuration.			`mapreduce_core_site_safety_valve`	false
MapReduce Service Advanced Configuration Snippet (Safety Valve) for hadoop-policy.xml	For advanced use only, a string to be inserted into hadoop-policy.xml. Applies to configurations of all roles in this service except client configuration.			`mapreduce_hadoop_policy_config_safety_valve`	false
MapReduce Service Advanced Configuration Snippet (Safety Valve) for mapred-site.xml	For advanced use only, a string to be inserted into mapred-site.xml. Applies to configurations of all roles in this service except client configuration.			`mapreduce_service_config_safety_valve`	false
MapReduce Service Environment Advanced Configuration Snippet (Safety Valve)	For advanced use only, key-value pairs (one on each line) to be inserted into a role's environment. Applies to configurations of all roles in this service except client configuration.			`mapreduce_service_env_safety_valve`	false
MapReduce Service Advanced Configuration Snippet (Safety Valve) for ssl-client.xml	For advanced use only, a string to be inserted into ssl-client.xml. Applies to configurations of all roles in this service except client configuration.			`mapreduce_ssl_client_safety_valve`	false
MapReduce Service Advanced Configuration Snippet (Safety Valve) for ssl-server.xml	For advanced use only, a string to be inserted into ssl-server.xml. Applies to configurations of all roles in this service except client configuration.			`mapreduce_ssl_server_safety_valve`	false
System Group	The group that this service's processes should run as.		hadoop	`process_groupname`	true
System User	The user that this service's processes should run as.		mapred	`process_username`	true

Monitoring

Display Name	Description	Related Name	Default Value	API Name	Required
Enable Log Event Capture	When set, each role identifies important log events and forwards them to Cloudera Manager.		true	`catch_events`	false
Enable Service Level Health Alerts	When set, Cloudera Manager will send alerts when the health of this service reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold		true	`enable_alerts`	false
Enable Configuration Change Alerts	When set, Cloudera Manager will send alerts when this entity's configuration changes.		false	`enable_config_alerts`	false
Failover Controllers Healthy	Enables the health check that verifies that the failover controllers associated with this service are healthy and running.		true	`failover_controllers_healthy_enabled`	false
Activity Duration Rules	To generate an event when certain activities are running slowly, enter rules for the activities in this setting. The syntax for a rule is '`regex`=`number`' where `number` is in minutes. Enter one rule per line in this text box. When a new activity starts, each `regex` expression is tested against the name of the activity for a match. The first rule that matches is used. If an activity matches a rule and runs longer than the `number` of minutes, an event will be sent.			`firehose_activity_duration_rules`	false
Alert on Activity Failure	If enabled, an alert will be generated when any activity fails.		true	`firehose_activity_failure_alert`	false
Alert on Slow Activities	If enabled, an alert will be generated when an activity has been running longer than the duration specified in the 'Activity Duration Rules' setting.		true	`firehose_activity_slow_alert`	false
Log Event Retry Frequency	The frequency in which the log4j event publication appender will retry sending undelivered log events to the Event server, in seconds		30	`log_event_retry_frequency`	false
Active JobTracker Detection Window	The tolerance window that will be used in Mapreduce service tests that depend on detection of the active JobTracker.		3 minute(s)	`mapreduce_active_jobtracker_detecton_window`	false
JobTracker Activation Startup Tolerance	The amount of time after JobTracker(s) start that the lack of an active JobTracker will be tolerated. This is intended to allow either the auto-failover daemon to make a JobTracker active, or a specifically issued failover command to take effect. This is an advanced option that does not often need to be changed.		3 minute(s)	`mapreduce_jobtracker_activation_startup_tolerance`	false
JobTracker Role Health Test	When computing the overall MapReduce cluster health, consider the JobTracker's health		true	`mapreduce_jobtracker_health_enabled`	false
Standby JobTracker Health Test	When computing the overall cluster health, consider the health of the standby JobTracker.		true	`mapreduce_standby_jobtrackers_health_enabled`	false
Healthy TaskTracker Monitoring Thresholds	The health test thresholds of the overall TaskTracker health. The check returns "Concerning" health if the percentage of "Healthy" TaskTrackers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" TaskTrackers falls below the critical threshold.		Warning: 95.0 %, Critical: 90.0 %	`mapreduce_tasktrackers_healthy_thresholds`	false
Service Triggers	The configured triggers for this service. This is a JSON formatted list of triggers. These triggers are evaluated as part as the health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has all of the following fields: `triggerName` (mandatory) - the name of the trigger. This value must be unique for the specific service. `triggerExpression` (mandatory) - a tsquery expression representing the trigger. `streamThreshold` (optional) - the maximum number of streams that can satisfy a condition of a trigger before the condition fires. By default set to 0, and any stream returned causes the condition to fire. `enabled` (optional) - by default set to 'true'. If set to 'false' the trigger will not be evaluated. For example, here is a JSON formatted trigger that fires if there are more than 10 DataNodes with more than 500 file-descriptors opened:`[{"triggerName": "sample-trigger", "triggerExpression": "IF (SELECT fd_open WHERE roleType = DataNode and last(fd_open) > 500) DO health:bad", "streamThreshold": 10, "enabled": "true"}]`Consult the trigger rules documentation for more details on how to write triggers using tsquery.The JSON format is evolving and may change in the future and as a result backward compatibility is not guaranteed between releases at this time.		[]	`service_triggers`	true
Service Monitor Client Config Overrides	For advanced use only, a list of configuration properties that will be used by the Service Monitor instead of the current client configuration for the service.		<property><name>mapreduce.jobclient.rpc.timeout</name><value>10000</value></property><property><name>ipc.ping.interval</name><value>10000</value></property><property><name>ipc.client.connect.timeout</name><value>10000</value></property><property><name>ipc.client.connect.max.retries</name><value>0</value></property><property><name>ipc.client.connect.max.retries.on.timeouts</name><value>0</value></property><property><name>mapreduce.job.counters.limit</name><value>12000</value></property><property><name>mapreduce.job.counters.max</name><value>12000</value></property><property><name>mapreduce.job.counters.group.name.max</name><value>12800</value></property><property><name>mapreduce.job.counters.counter.name.max</name><value>12800</value></property><property><name>mapreduce.job.counters.groups.max</name><value>5000</value></property>	`smon_client_config_overrides`	false
Service Monitor Derived Configs Advanced Configuration Snippet (Safety Valve)	For advanced use only, a list of derived configuration properties that will be used by the Service Monitor instead of the default ones.			`smon_derived_configs_safety_valve`	false

Other

Display Name	Description	Related Name	Default Value	API Name	Required
HDFS Service	Name of the HDFS service that this MapReduce service instance depends on			`hdfs_service`	true
ZooKeeper Service	Name of the ZooKeeper service that this MapReduce service instance depends on			`zookeeper_service`	false

Paths

Display Name	Description	Related Name	Default Value	API Name	Required
MapReduce System Directory	The HDFS directory where the MapReduce service stores system files. This directory must be accessible from both the server and client machines. For example: /hadoop/mapred/system/	`mapred.system.dir`	/tmp/mapred/system	`mapred_system_dir`	false

Performance

Display Name	Description	Related Name	Default Value	API Name	Required
Enable HDFS Short Circuit Read	Enable HDFS short circuit read. This allows a client co-located with the DataNode to read HDFS file blocks directly. This gives a performance boost to distributed clients that are aware of locality.	`dfs.client.read.shortcircuit`	false	`dfs_client_read_shortcircuit`	false
SequenceFile I/O Buffer Size	Size of buffer for read and write operations of SequenceFiles.	`io.file.buffer.size`	64 KiB	`io_file_buffer_size`	false
Job Counters Limit	Limit on the number of counters allowed per job.	`mapreduce.job.counters.max`	120	`mapreduce_job_counters_limit`	false

Security

Display Name	Description	Related Name	Default Value	API Name	Required
Enable Authentication for HTTP Web-Consoles	Enables authentication for hadoop HTTP web-consoles for all roles of this service. Note: This is effective only if security is enabled for the HDFS service.		false	`hadoop_secure_web_ui`	false
Hue's Kerberos Principal Short Name	The short name of Hue's Kerberos principal	`hue.kerberos.principal.shortname`	hue	`hue_kerberos_principal_shortname`	false
SSL Client Truststore File Location	Path to the truststore file used when roles of this service act as SSL clients. Overrides the cluster-wide default truststore location set in HDFS. This truststore must be in JKS format. The truststore contains certificates of trusted servers, or of Certificate Authorities trusted to identify servers. The contents of the truststore can be modified without restarting any roles. By default, changes to its contents are picked up within ten seconds. If not set, the default Java truststore is used to verify certificates.	`ssl.client.truststore.location`		`ssl_client_truststore_location`	false
SSL Client Truststore File Password	Password for the SSL client truststore. Overrides the cluster-wide default truststore password set in HDFS.	`ssl.client.truststore.password`		`ssl_client_truststore_password`	false
Hadoop SSL Server Keystore Key Password	Password that protects the private key contained in the server keystore used for encrypted shuffle and encrypted web UIs. Applies to all configurations of daemon roles of this service.	`ssl.server.keystore.keypassword`		`ssl_server_keystore_keypassword`	false
Hadoop SSL Server Keystore File Location	Path to the keystore file containing the server certificate and private key used for encrypted shuffle and encrypted web UIs. Applies to configurations of all daemon roles of this service.	`ssl.server.keystore.location`		`ssl_server_keystore_location`	false
Hadoop SSL Server Keystore File Password	Password for the server keystore file used for encrypted shuffle and encrypted web UIs. Applies to configurations of all daemon roles of this service.	`ssl.server.keystore.password`		`ssl_server_keystore_password`	false

tasktrackerdefaultgroup

Advanced

Display Name	Description	Related Name	Default Value	API Name	Required
Hadoop Metrics Advanced Configuration Snippet (Safety Valve)	Advanced Configuration Snippet (Safety Valve) for Hadoop Metrics. Properties will be inserted into hadoop-metrics.properties for this role only. Note that Cloudera Manager tunes hadoop-metrics.properties to work optimally with its Service Monitoring features. By overriding the default, Cloudera Manager might not be able to provide accurate monitoring information, health tests or alerts.			`hadoop_metrics_safety_valve`	false
TaskTracker Logging Advanced Configuration Snippet (Safety Valve)	For advanced use only, a string to be inserted into log4j.properties for this role only.			`log4j_safety_valve`	false
Healthchecker Script Arguments	Comma-separated list of arguments which are to be passed to node health script when it is being launched.	`mapred.healthChecker.script.args`		`mapred_healthchecker_script_args`	false
Healthchecker Script Path	Absolute path to the script which is periodically run by the node health monitoring service to determine if the node is healthy or not. If the value of this key is empty or the file does not exist in the location configured here, the node health monitoring service is not started.	`mapred.healthChecker.script.path`		`mapred_healthchecker_script_path`	false
Heap Dump Directory	Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it doesn't exist. However, if this directory already exists, role user must have write access to this directory. If this directory is shared amongst multiple roles, it should have 1777 permissions. Note that the heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role.		/tmp	`oom_heap_dump_dir`	false
Dump Heap When Out of Memory	When set, generates heap dump file when java.lang.OutOfMemoryError is thrown.		false	`oom_heap_dump_enabled`	true
Kill When Out of Memory	When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown.		true	`oom_sigkill_enabled`	true
Automatically Restart Process	When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure.		true	`process_auto_restart`	true
TaskTracker Advanced Configuration Snippet (Safety Valve) for taskcontroller.cfg	For advanced use only, a string to be inserted into taskcontroller.cfg for this role only.			`taskcontroller_config_safety_valve`	false
TaskTracker Advanced Configuration Snippet (Safety Valve) for mapred-site.xml	For advanced use only, a string to be inserted into mapred-site.xml for this role only.			`tasktracker_config_safety_valve`	false
Java Configuration Options for TaskTracker	These arguments will be passed as part of the Java command line. Commonly, garbage collection flags or extra debugging flags would be passed here.		-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled	`tasktracker_java_opts`	false

Classes

Display Name	Description	Related Name	Default Value	API Name	Required
TaskTracker Instrumentation Class	The instrumentation class to associate with each TaskTracker. If using Cloudera's Activity Monitor, adjust this to use org.apache.hadoop.mapred.TaskTrackerCmonInst.	`mapred.tasktracker.instrumentation`	org.apache.hadoop.mapred.TaskTrackerMetricsInst	`mapred_tasktracker_instrumentation`	false

Compression

Display Name	Description	Related Name	Default Value	API Name	Required
Compression Codecs (Client Override)	Comma-separated list of compression codecs that can be used in job or map compression.	`io.compression.codecs`		`override_io_compression_codecs`	false
Use Compression on Map Outputs (Client Override)	If enabled, uses compression on the map outputs before they are sent across the network. Will override value in client configuration.	`mapred.compress.map.output`	no_override	`override_mapred_compress_map_output`	false
Compression Codec of MapReduce Map Output (Client Override)	For MapReduce map outputs that are compressed, specify the compression codec to use. Will override value in client configuration.	`mapred.map.output.compression.codec`		`override_mapred_map_output_compression_codec`	false
Compress MapReduce Job Output (Client Override)	Compress the output of MapReduce jobs. Will override value in client configuration.	`mapred.output.compress`	no_override	`override_mapred_output_compress`	false
Compression Codec of MapReduce Job Output (Client Override)	For MapReduce job outputs that are compressed, specify the compression codec to use. Will override value in client configuration.	`mapred.output.compression.codec`		`override_mapred_output_compression_codec`	false
Compression Type of MapReduce Job Output (Client Override)	For MapReduce job outputs that are compressed as SequenceFiles, you can select one of these compression type options: NONE, RECORD or BLOCK. Cloudera recommends BLOCK. Will override value in client configuration.	`mapred.output.compression.type`		`override_mapred_output_compression_type`	false

Jobs

Display Name	Description	Related Name	Default Value	API Name	Required
Number of Tasks to Run per JVM (Client Override)	Number of tasks to run per JVM. If set to -1, there is no limit. Will override value in client configuration.	`mapred.job.reuse.jvm.num.tasks`		`override_mapred_job_reuse_jvm_num_tasks`	false
Map Tasks Speculative Execution (Client Override)	If enabled, multiple instances of some map tasks may be executed in parallel.	`mapred.map.tasks.speculative.execution`	no_override	`override_mapred_map_tasks_speculative_execution`	false
Number of Map Tasks to Complete Before Reduce Tasks (Client Override)	Fraction of the number of map tasks in the job which should be completed before reduce tasks are scheduled for the job.	`mapred.reduce.slowstart.completed.maps`		`override_mapred_reduce_slowstart_completed_maps`	false
Reduce Tasks Speculative Execution (Client Override)	If enabled, multiple instances of some reduce tasks may be executed in parallel.	`mapred.reduce.tasks.speculative.execution`	no_override	`override_mapred_reduce_tasks_speculative_execution`	false
Mapreduce Submit Replication (Client Override)	The replication level for submitted job files.	`mapred.submit.replication`		`override_mapred_submit_replication`	false
Maximum Time to Retain User Logs (Client Override)	The maximum time, in hours, to retain the user logs after job completion.	`mapred.userlog.retain.hours`		`override_mapred_userlog_retain_hours`	false

Logs

Display Name	Description	Related Name	Default Value	API Name	Required
TaskTracker Logging Threshold	The minimum log level for TaskTracker logs		INFO	`log_threshold`	false
TaskTracker Maximum Log File Backups	The maximum number of rolled log files to keep for TaskTracker logs. Typically used by log4j.		10	`max_log_backup_index`	false
TaskTracker Max Log Size	The maximum size, in megabytes, per log file for TaskTracker logs. Typically used by log4j.		200 MiB	`max_log_size`	false
TaskTracker Log Directory	Directory where TaskTracker will place its log files.	`hadoop.log.dir`	/var/log/hadoop-0.20-mapreduce	`tasktracker_log_dir`	false

Metrics

Display Name	Description	Related Name	Default Value	API Name	Required
Hadoop Metrics Class	Implementation daemons will use to report some internal statistics. The default (NoEmitMetricsContext) will display metrics on /metrics on the status port. The GangliaContext and GangliaContext31 classes will report metrics to your specified Ganglia Monitoring Daemons (gmond). The ganglia wire format changed incompatibly at version 3.1.0. If you are running any version of ganglia 3.1.0 or newer, use the GangliaContext31 metric class; otherwise, use the GangliaContext metric class.		org.apache.hadoop.metrics.spi.NoEmitMetricsContext	`hadoop_metrics_class`	false
Hadoop Metrics Output Directory	If using FileContext, directory to write metrics to.		/tmp/metrics	`hadoop_metrics_dir`	false
Hadoop Metrics Ganglia Servers	If using GangliaContext, a comma-delimited list of host:port pairs pointing to 'gmond' servers you would like to publish metrics to. In practice, this set of 'gmond' should match the set of 'gmond' in your 'gmetad' datasource list for the cluster.			`hadoop_metrics_ganglia_servers`	false

Monitoring

Display Name	Description	Related Name	Default Value	API Name	Required
Enable Health Alerts for this Role	When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold		false	`enable_alerts`	false
Enable Configuration Change Alerts	When set, Cloudera Manager will send alerts when this entity's configuration changes.		false	`enable_config_alerts`	false
Log Directory Free Space Monitoring Absolute Thresholds	The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory.		Warning: 10 GiB, Critical: 5 GiB	`log_directory_free_space_absolute_thresholds`	false
Log Directory Free Space Monitoring Percentage Thresholds	The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured.		Warning: Never, Critical: Never	`log_directory_free_space_percentage_thresholds`	false
Rules to Extract Events from Log Files	This file contains the rules which govern how log messages are turned into events by the custom log4j appender that this role loads. It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. Each rule has some or all of the following fields: `alert` - whether or not events generated from this rule should be promoted to alerts. A value of "true" will cause alerts to be generated. If not specified, the default is "false". `rate` (mandatory) - the maximum number of log messages matching this rule that may be sent as events every minute. If more than rate matching log messages are received in a single minute, the extra messages are ignored. If rate is less than 0, the number of messages per minute is unlimited. `periodminutes` - the number of minutes during which the publisher will only publish rate events or fewer. If not specified, the default is one minute `threshold` - apply this rule only to messages with this log4j severity level or above. An example is "WARN" for warning level messages or higher. `content` - match only those messages whose contents match this regular expression. `exceptiontype` - match only those messages which are part of an exception message. The exception type must match this regular expression. Example:`{"alert": false, "rate": 10, "exceptiontype": "java.lang.StringIndexOutOfBoundsException"}`This rule will send events to Cloudera Manager for every `StringIndexOutOfBoundsException`, up to a maximum of 10 every minute.		version: 0, rules: [ alert: false, rate: 0, threshold:ERROR, content:/mapOutput., alert: false, rate: 1, periodminutes: 1, threshold:FATAL, alert: false, rate: 0, threshold:WARN, content: . is deprecated. Instead, use ., alert: false, rate: 0, threshold:WARN, content: . is deprecated. Use .* instead, alert: false, rate: 0, exceptiontype: java.io.IOException, alert: false, rate: 0, exceptiontype: java.net.SocketException, alert: false, rate: 0, exceptiontype: java.net.SocketClosedException, alert: false, rate: 0, exceptiontype: java.io.EOFException, alert: false, rate: 0, exceptiontype: org.mortbay.jetty.EofException, alert: false, rate: 0, exceptiontype: java.nio.channels.CancelledKeyException, alert: false, rate: 1, periodminutes: 2, exceptiontype: ., alert: false, rate: 0, threshold:WARN, content:Unknown job [^ ]+ being deleted., alert: false, rate: 0, threshold:WARN, content:Error executing shell command .+ No such process.+, alert: false, rate: 0, threshold:WARN, content:Error sending signal TERM to process group.No such process., alert: false, rate: 0, threshold:WARN, content:Exit code from task is :., alert: false, rate: 0, threshold:WARN, content:TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is disabled., alert: false, rate: 0, threshold:WARN, content:.attempt to override final parameter.+, alert: false, rate: 0, threshold:WARN, content:[^ ]+ is a deprecated filesystem name. Use., alert: false, rate: 1, threshold:INFO, content:.failed to report status for.*Killing!, alert: false, rate: 1, periodminutes: 1, threshold:WARN ]	`log_event_whitelist`	false
Role Triggers	The configured triggers for this role. This is a JSON formatted list of triggers. These triggers are evaluated as part as the health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has all of the following fields: `triggerName` (mandatory) - the name of the trigger. This value must be unique for the specific role. `triggerExpression` (mandatory) - a tsquery expression representing the trigger. `streamThreshold` (optional) - the maximum number of streams that can satisfy a condition of a trigger before the condition fires. By default set to 0, and any stream returned causes the condition to fire. `enabled` (optional) - by default set to 'true'. If set to 'false' the trigger will not be evaluated. For example, here is a JSON formatted trigger configured for a DataNode that fires if the DataNode has more than 1500 file-descriptors opened:`[{"triggerName": "sample-trigger", "triggerExpression": "IF (SELECT fd_open WHERE roleName=$ROLENAME and last(fd_open) > 1500) DO health:bad", "streamThreshold": 0, "enabled": "true"}]`Consult the trigger rules documentation for more details on how to write triggers using tsquery.The JSON format is evolving and may change in the future and as a result backward compatibility is not guaranteed between releases at this time.		[]	`role_triggers`	true
TaskTracker Blacklisted Health Test	Enables the health test that the TaskTracker is not blacklisted		true	`tasktracker_blacklisted_health_enabled`	false
TaskTracker Connectivity Health Test	Enables the health test that the TaskTracker is connected to the JobTracker		true	`tasktracker_connectivity_health_enabled`	false
TaskTracker Connectivity Tolerance at Startup	The amount of time to wait for the TaskTracker to fully start up and connect to the JobTracker before enforcing the connectivity check.		3 minute(s)	`tasktracker_connectivity_tolerance`	false
File Descriptor Monitoring Thresholds	The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit.		Warning: 50.0 %, Critical: 70.0 %	`tasktracker_fd_thresholds`	false
Garbage Collection Duration Thresholds	The health test thresholds for the weighted average time spent in Java garbage collection. Specified as a percentage of elapsed wall clock time.		Warning: 30.0, Critical: 60.0	`tasktracker_gc_duration_thresholds`	false
Garbage Collection Duration Monitoring Period	The period to review when computing the moving average of garbage collection time.		5 minute(s)	`tasktracker_gc_duration_window`	false
TaskTracker Host Health Test	When computing the overall TaskTracker health, consider the host's health.		true	`tasktracker_host_health_enabled`	false
TaskTracker Process Health Test	Enables the health test that the TaskTracker's process state is consistent with the role configuration		true	`tasktracker_scm_health_enabled`	false
Web Metric Collection	Enables the health test that the Cloudera Manager Agent can successfully contact and gather metrics from the web server.		true	`tasktracker_web_metric_collection_enabled`	false
Web Metric Collection Duration	The health test thresholds on the duration of the metrics request to the web server.		Warning: 10 second(s), Critical: Never	`tasktracker_web_metric_collection_thresholds`	false
Unexpected Exits Thresholds	The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role.		Warning: Never, Critical: Any	`unexpected_exits_thresholds`	false
Unexpected Exits Monitoring Period	The period to review when computing unexpected exits.		5 minute(s)	`unexpected_exits_window`	false

Other

Display Name	Description	Related Name	Default Value	API Name	Required
TaskTracker Local Data Directory List	List of directories on the local filesystem where a TaskTracker stores intermediate data files. To spread disk I/O, enter a comma-separated list of directories on different devices. Directories that do not exist are ignored. Typical values are /data/N/mapred/local for N = 1, 2, 3...	`mapred.local.dir`		`tasktracker_mapred_local_dir_list`	true

Performance

Display Name	Description	Related Name	Default Value	API Name	Required
I/O Sort Factor (Client Override)	The number of streams to merge at once while sorting files. That is, the number of sort heads to use during the merge sort on the reducer side. This determines the number of open file handles. Merging more files in parallel reduces merge sort iterations and improves run time by eliminating disk I/O. Note that merging more files in parallel uses more memory. If 'io.sort.factor' is set too high or the maximum JVM heap is set too low, excessive garbage collection will occur. The Hadoop default is 10, but Cloudera recommends a higher value. Will override value in client configuration.	`io.sort.factor`		`override_io_sort_factor`	false
I/O Sort Memory Buffer (MiB) (Client Override)	The total amount of memory buffer, in megabytes, to use while sorting files. Note that this memory comes out of the user JVM heap size (meaning total user JVM heap - this amount of memory = total user usable heap space. Note that Cloudera's default differs from Hadoop's default; Cloudera uses a bigger buffer by default because modern machines often have more RAM. Will override value in client configuration.	`io.sort.mb`		`override_io_sort_mb`	false
I/O Sort Record Percent (Client Override)	The percentage of 'io.sort.mb' dedicated to tracking record boundaries. If this value is represented as 'r', and 'io.sort.mb' is represented as 'x', then the maximum number of records collected before the collection thread must block is equal to (r * x) / 4. The syntax is in decimal units; the default is 5% and is formatted 0.05. Will override value in client configuration.	`io.sort.record.percent`		`override_io_sort_record_percent`	false
I/O Sort Spill Percent (Client Override)	The soft limit in either the buffer or record collection buffers. When this limit is reached, a thread will begin to spill the contents to disk in the background. Note that this does not imply any chunking of data to the spill. A value less than 0.5 is not recommended. The syntax is in decimal units; the default is 80% and is formatted 0.8. Will override value in client configuration.	`io.sort.spill.percent`		`override_io_sort_spill_percent`	false
MapReduce Child Java Opts Base (Client Override)	Java opts for the TaskTracker child processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp pass a value of: "-verbose:gc -Xloggc:/tmp/@taskid@.gc". The configuration variable 'mapred.child.ulimit' can be used to control the maximum virtual memory of the child processes. Note that unlike Hadoop, Cloudera Manager separates the child options into this setting and a separate setting just for the maximum heap size. Will override value in client configuration.	`mapred.child.java.opts`		`override_mapred_child_java_opts_base`	false
Map Task Java Opts Base (Client Override)	Java opts for the TaskTracker child map processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp pass a value of: "-verbose:gc -Xloggc:/tmp/@taskid@.gc". The configuration variable 'Map Task Maximum Virtual Memory' can be used to control the maximum virtual memory of the map processes. This takes precedence over the generic 'mapred.child.java.opts'.	`mapred.map.child.java.opts`		`override_mapred_map_task_java_opts`	false
Default Number of Parallel Transfers During Shuffle (Client Override)	The default number of parallel transfers run by reduce during the copy (shuffle) phase. This number should be between sqrt(nodesnumber_of_map_slots_per_node) and nodess/2. Will override value in client configuration.	`mapred.reduce.parallel.copies`		`override_mapred_reduce_parallel_copies`	false
Reduce Task Java Opts Base (Client Override)	Java opts for the TaskTracker child map processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp pass a value of: "-verbose:gc -Xloggc:/tmp/@taskid@.gc". The configuration variable 'Reduce Task Maximum Virtual Memory' can be used to control the maximum virtual memory of the reduce processes. This takes precedence over the generic 'mapred.child.java.opts'.	`mapred.reduce.child.java.opts`		`override_mapred_reduce_task_java_opts`	false
Maximum Process File Descriptors	If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value.			`rlimit_fds`	false
Number of TaskTracker HTTP Threads	The number of worker threads for the HTTP server. This is used for map output fetching.	`tasktracker.http.threads`	80	`tasktracker_http_threads`	false

Ports and Addresses

Display Name	Description	Related Name	Default Value	API Name	Required
TaskTracker Activity Monitor Instrumentation Plugin Address	Address where TaskTracker Activity Monitor instrumentation plugin listens for requests. This setting is ignored unless the TaskTracker Instrumentation Class is set to org.apache.hadoop.mapred.TaskTrackerCmonInst. This is usually set to 127.0.0.1.	`mapred.tasktracker.instrumentation.cmon.jettyhost`	127.0.0.1	`mapred_tasktracker_instrumentation_cmon_jettyhost`	false
TaskTracker Activity Monitor Instrumentation Plugin Port	Port where TaskTracker Activity Monitor instrumentation plugin listens for requests. This setting is ignored unless the TaskTracker Instrumentation Class Class is set to org.apache.hadoop.mapred.TaskTrackerCmonInst.	`mapred.tasktracker.instrumentation.cmon.jettyport`	4867	`mapred_tasktracker_instrumentation_cmon_jettyport`	false
TaskTracker Web UI Address	Address where TaskTracker listens for web requests		0.0.0.0	`task_tracker_http_address`	false
TaskTracker Web UI Port	Port where TaskTracker listens for web requests	`mapred.task.tracker.http.address`	50060	`task_tracker_http_port`	false

Resource Management

Display Name	Description	Related Name	Default Value	API Name	Required
Maximum Number of Simultaneous Map Tasks	The maximum number of map tasks that a TaskTracker can run simultaneously. Sometimes referred to as "map slots."	`mapred.tasktracker.map.tasks.maximum`	2	`mapred_tasktracker_map_tasks_maximum`	false
Maximum Number of Simultaneous Reduce Tasks	The maximum number of reduce tasks that a TaskTracker can run simultaneously. Sometimes referred to as "reduce slots."	`mapred.tasktracker.reduce.tasks.maximum`	2	`mapred_tasktracker_reduce_tasks_maximum`	false
MapReduce Child Java Maximum Heap Size (Client Override)	The maximum heap size, in bytes, of the Java child process. This number will be formatted and concatenated with the 'base' setting for 'mapred.child.java.opts' to pass to Hadoop. Will override value in client configuration.			`override_mapred_child_java_opts_max_heap`	false
MapReduce Maximum Virtual Memory (KiB) (Client Override)	The maximum virtual memory, in KiB, of a process launched by the MapReduce framework. This can be used to control both the MapReduce tasks and applications using Hadoop Pipes, Hadoop Streaming, and so on. By default, it is left unspecified to allow administrators to control it 'via limits.conf' and other mechanisms. Note: 'mapred.child.ulimit' must be greater than or equal to approximately 1.5 times the -Xmx passed to JavaVM, or else the VM might not start. Will override value in client configuration.	`mapred.child.ulimit`		`override_mapred_child_ulimit`	false
Map Task Maximum Heap Size (Client Override)	The maximum heap size, in bytes, of the child map processes. This number will be formatted and concatenated with 'Map Task Java Opts Base' to pass to Hadoop. Will override value in client configuration.			`override_mapred_map_task_max_heap`	false
Map Task Maximum Virtual Memory (KiB) (Client Override)	The maximum virtual memory, in KiB, available to map tasks. Note: this must be greater than or equal to the -Xmx passed to the JavaVM via 'Map Task Java Opts', or else the VM might not start. This takes precedence over the generic 'mapred.child.ulimit'. Will override value in client configuration.	`mapred.map.child.ulimit`		`override_mapred_map_task_ulimit`	false
Reduce Task Maximum Heap Size (Client Override)	The maximum heap size, in bytes, of the child reduce processes. This number will be formatted and concatenated with 'Reduce Task Java Opts Base' to pass to Hadoop. Will override value in client configuration.			`override_mapred_reduce_task_max_heap`	false
Reduce Task Maximum Virtual Memory (KiB) (Client Override)	The maximum virtual memory, in KiB, available to reduce tasks. Note: this must be greater than or equal to the -Xmx passed to the JavaVM via 'Map Task Java Opts', or else the VM might not start. This takes precedence over the generic 'mapred.child.ulimit'. Will override value in client configuration.	`mapred.reduce.child.ulimit`		`override_mapred_reduce_task_ulimit`	false
Cgroup CPU Shares	Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager.	`cpu.shares`	1024	`rm_cpu_shares`	true
Cgroup I/O Weight	Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager.	`blkio.weight`	500	`rm_io_weight`	true
Cgroup Memory Hard Limit	Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit.	`memory.limit_in_bytes`	-1 MiB	`rm_memory_hard_limit`	true
Cgroup Memory Soft Limit	Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit.	`memory.soft_limit_in_bytes`	-1 MiB	`rm_memory_soft_limit`	true
Java Heap Size of TaskTracker in Bytes	Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.		1 GiB	`task_tracker_java_heapsize`	false

Security

Display Name	Description	Related Name	Default Value	API Name	Required
Users Banned from Job Submission	Comma-separated list of users banned from submitting MapReduce jobs to this TaskTracker. Only applies when the TaskTracker is running in secure mode	`banned.users`	mapred, hdfs, bin	`taskcontroller_banned_users`	false
Task Controller Group	The system group that owns the task-controller binary. This does not need to be changed unless the ownership of the binary is explicitly changed.	`mapreduce.tasktracker.group`	mapred	`taskcontroller_group`	false
Minimum User ID for Job Submission	The lowest user ID (UID) that a user may have in order to submit a job to this TaskTracker. Only applies when the TaskTracker is running in secure mode	`min.user.id`	1000	`taskcontroller_min_user_id`	false

Stacks Collection

Display Name	Description	Related Name	Default Value	API Name	Required
Stacks Collection Data Retention	The amount of stacks data that will be retained. After the retention limit is reached, the oldest data will be deleted.	`stacks_collection_data_retention`	100 MiB	`stacks_collection_data_retention`	false
Stacks Collection Directory	The directory in which stacks logs will be placed. If not set, stacks will be logged into a `stacks` subdirectory of the role's log directory.	`stacks_collection_directory`		`stacks_collection_directory`	false
Stacks Collection Enabled	Whether or not periodic stacks collection is enabled.	`stacks_collection_enabled`	false	`stacks_collection_enabled`	true
Stacks Collection Frequency	The frequency with which stacks will be collected.	`stacks_collection_frequency`	5.0 second(s)	`stacks_collection_frequency`	false
Stacks Collection Method	The method that will be used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected that HTTP endpoint is periodically scraped.	`stacks_collection_method`	jstack	`stacks_collection_method`	false

Categories: Cloudera Manager | Configuring | MapReduce | Role Groups | Services | All Categories

KMS Properties in CDH 5.0.0

Oozie Properties in CDH 5.0.0