Cloudera Manager Metrics

This section provides information on metrics supported by Cloudera Manager.

A metric is a property that can be measured to quantify the state of an entity or activity. They include properties such as the number of open file descriptors or CPU utilization percentage across your cluster.

Cloudera Manager monitors a number of performance metrics for services and role instances running on your clusters. These metrics are monitored against configurable thresholds and can be used to indicate whether a host is functioning as expected or not. You can view these metrics in the Cloudera Manager Admin Console which displays metrics about your jobs (such as the number of currently running jobs and their CPU/memory usage), Hadoop services (such as the average HDFS I/O latency and number of concurrent jobs), your clusters (such as average CPU load across all your hosts) and so on.

Cloudera Manager pre-aggregates metrics from their generating entity to the entities that they are part of. For example, metrics generated by disks, network interfaces, and filesystems are aggregated to their respective hosts and clusters. See Metric Aggregation for more details.

In the Cloudera Manager Admin Console, you can discover which metrics are collected by Cloudera Manager in either of the following ways:
  • List of metrics:
    1. Select Charts > Chart Builder.
    2. Click the question mark icon to the right of the Build Chart button.
    3. Click the List of Metrics link.
  • Use the tsquery language to retrieve all metrics for the type of entity you are interested in. The tsquery language is the language used to specify statements for retrieving time series data, that is, a stream of metric data points with each point containing a timestamp and the value of the metric at that timestamp.

You can chart metrics over a time range. See Viewing Charts for Cluster, Service, Role, and Host Instances for more details. The metrics listed in this guide include a short description as well as their units and the version of CDH they are applicable to. Most of the units are self-explanatory. The unit "CPU seconds per second" is defined as the number of CPU seconds being used per second. For example, if you have a 1 host cluster with 16 cores, and 16 tasks each using one core, the value of the metric would be 16.

The sampling rate for all metrics is one minute.

Continue reading: