Important Note:

Cloudera Manager version 3 and CDH3 have reached End of Maintenance (EOM) as of June 20th, 2013. Cloudera will not support or provide patches for any of the Cloudera Manager version 3 and CDH3 releases. To view documentation related to later releases, click the Documentation link at the top of this page.

Tips and Guidelines

Setting the vm.swappiness Linux Kernel Parameter

vm.swappiness controls how aggressively memory pages are swapped to disk. It can be set to a value between 0-100; the higher the value, the more aggressive the kernel is in seeking out inactive memory pages and swapping them to disk.

You can see what value vm.swappiness is currently set to by looking at /proc/sys/vm; for example:

cat /proc/sys/vm

On most systems, it is set to 60 by default. This is not suitable for Hadoop clusters nodes, since it can cause processes to get swapped out even when there is free memory available. This can cause problems, including lengthy garbage collection pauses on important system daemons, and affect stability and performance. Cloudera recommends that you set this parameter to 0; for example:

# sysctl -w vm.swappiness=0 

Changing the Logging Level on a Job

As of CDH3u5, you can change the logging level for an individual job. You do this by setting the following properties in the job configuration (JobConf):

  • mapreduce.map.log.level
  • mapreduce.reduce.log.level

Valid values are NONE, INFO, WARN, DEBUG, TRACE, and ALL.

Example:

JobConf conf = new JobConf();
...

conf.set("mapreduce.map.log.level", "DEBUG");
conf.set("mapreduce.reduce.log.level", "TRACE");
...

When SSH is and is not Used

It is a good idea to use SSH for remote administration purposes (instead of rlogin, for example) but note that Hadoop itself and the related services do not use SSH for communication as a matter of course. Some scripts, and in particular the Hadoop start-all and stop-all scripts, do use SSH, but otherwise SSH is not used for communication among the following:

  • Datanode
  • Namenode
  • TaskTracker
  • JobTracker
  • /etc/init.d scripts (which start daemons locally)