Metric Aggregation

In addition to collecting and storing raw metric values, the Cloudera Manager Service Monitor and Host Monitor produce a number of aggregate metrics from the raw metric data. Where a raw data point is a timestamp value pair, an aggregate metric point is a timestamp paired with a bundle of statistics including the minimum, maximum, average, and standard deviation of the data points considered by the aggregate.

Individual metric streams are aggregated across time to produce statistical summaries at different data granularities. For example, an individual metric stream of the number of open file descriptors on a host will be aggregated over time to the ten-minute, hourly, six-hourly, daily and weekly data granularities. A point in the hourly aggregate stream will include the maximum number of open file descriptors seen during that hour, the minimum, the average and so on. When servicing a time-series request, either for the Cloudera Manager UI or API, the Service Monitor and Host Monitor automatically choose the appropriate data granularity based on the time-range requested.

Cross-Time Aggregate Example

Consider the following fd_open raw metric values for a host:

9:00, 100 fds
9:01, 101 fds
9:02, 102 fds
. . . 
9:09, 109 fds

The ten minutely cross-time aggregate point covering the ten-minute window from 9:00 - 9:10 would have the following statistics and metadata:

min: 100 fds
min timestamp: 9:00
max 109 fds
max timestamp 9:09
mean 104.5 fds
standard deviation: 3.02765 fds
count: 10 points
sample: 109 fds
sample timestamp: 9:09

The Service Monitor and Host Monitor also produce cross-entity aggregates for a number of entities in the system. Cross-entity aggregates are produced by considering the metric value of a particular metric across a number of entities of the same type at a particular time. For each stream considered, two metrics are produced. The first tracks statistics such as the minimum, maximum, average and standard deviation across all considered entities as well as the identities of the entities that had the minimum and maximum values. The second tracks the sum of the metric across all considered entities.

An example of the first type of cross-entity aggregate is the fd_open_across_datanodes metric. For an HDFS service this metric contains aggregate statistics on the fd_open metric value for all the DataNodes in the service. For a rack this metric contains statistics for all the DataNodes within that rack, and so on. An example of the second type of cross-entity aggregate is the total_fd_open_across_datanodes metric. For an HDFS service this metric contains the total number of file descriptors open by all the DataNodes in the service. For a rack this metric contains the total number of file descriptors open by all the DataNodes within the rack, and so on. Note that unlike the first type of cross-entity aggregate, this total type of cross-entity aggregate is a simple timestamp, value pair and not a bundle of statistics.

Cross-Entity Aggregate Example

Consider the following fd_open raw metric values for a set of ten DataNodes in an HDFS service at a given timestamp:

datanode-0, 200 fds
datanode-1, 201 fds
datanode-2, 202 fds
…
datanode-9, 209 fds
The cross-entity aggregate fd_open_across_datanodes point for that HDFS service at that time would have the following statistics and metadata:
min: 200 fds
min entity: datanode-0
max: 209 fds
max entity: datanode-9
mean: 204.5 fds
standard deviation: 3.02765 fds
count: 10 points
sample: 209 fds
sample entity: datanode-9

Just like every other metric, cross-entity aggregates are aggregated across time. For example, a point in the hourly aggregate of fd_open_across_datanodes for an HDFS service will include the maximum fd_open value of any DataNode in that service over that hour, the average value over the hour, and so on. A point in the hourly aggregate of total_fd_open_across_datanodes for an HDFS service will contain statistics on the value of the total_fd_open_across_datanodes for that service over the hour.

Presentation of Aggregate Data

Aggregate data points returned from the Cloudera Manager API appear as shown in this section.

A cross-time aggregate:

{
        "timestamp" : "2014-02-24T00:00:00.000Z",
        "value" : 0.014541698027508003,
        "type" : "SAMPLE",
        "aggregateStatistics" : {
          "sampleTime" : "2014-02-23T23:59:35.000Z",
          "sampleValue" : 0.0,
          "count" : 360,
          "min" : 0.0,
          "minTime" : "2014-02-23T18:00:35.000Z",
          "max" : 2.9516129032258065,
          "maxTime" : "2014-02-23T19:37:36.000Z",
          "mean" : 0.014541698027508003,
          "stdDev" : 0.17041289765265377
        }
      }

A raw cross-entity aggregate:

{
        "timestamp" : "2014-03-26T00:50:15.725Z",
        "value" : 3288.0,
        "type" : "SAMPLE",
        "aggregateStatistics" : {
          "sampleTime" : "2014-03-26T00:49:19.000Z",
          "sampleValue" : 7232.0,
          "count" : 4,
          "min" : 1600.0,
          "minTime" : "2014-03-26T00:49:42.000Z",
          "max" : 7232.0,
          "maxTime" : "2014-03-26T00:49:19.000Z",
          "mean" : 3288.0,
          "stdDev" : 2656.7549127961856,
          "crossEntityMetadata" : {
            "maxEntityDisplayName" : "cleroy-9-1.ent.cloudera.com",
            "minEntityDisplayName" : "cleroy-9-4.ent.cloudera.com",
            "numEntities" : 4.0
          }
        }
      }

A cross-time, cross-entity aggregate:

{
        "timestamp" : "2014-03-11T00:00:00.000Z",
        "value" : 3220.818863879957,
        "type" : "SAMPLE",
        "aggregateStatistics" : {
          "sampleTime" : "2014-03-10T22:28:48.000Z",
          "sampleValue" : 7200.0,
          "count" : 933,
          "min" : 1536.0,
          "minTime" : "2014-03-10T21:02:17.000Z",
          "max" : 7200.0,
          "maxTime" : "2014-03-10T22:28:48.000Z",
          "mean" : 3220.818863879957,
          "stdDev" : 2188.6143063503378,
          "crossEntityMetadata" : {
            "maxEntityDisplayName" : "cleroy-9-1.ent.cloudera.com",
            "minEntityDisplayName" : "cleroy-9-4.ent.cloudera.com",
            "numEntities" : 3.9787037037037036
          }
        }
      }

These differ from non-aggregate data points by having the aggregateStatistics structure. Note that the value field in the point structure will always be the same as the aggregteStatistics mean field. The Cloudera Manager UI presents aggregate statistics in a number of ways. First, aggregate statistics are made available in the hover detail and chart popover when dealing with aggregate data. Second, it is possible to turn on and turn off the display of minimum and maximum time-series streams in line charts of aggregate data. These streams are displayed using dotted lines and give a visual indication of the underlying metric values data range over the time considered, entities considered or both. These lines are displayed by default for single stream line charts of aggregate data. For all line charts this behavior can be turned on and turned off using the chart popover.



Accessing Aggregate Statistics Through tsquery

The stats function can be used to access aggregate statistics directly in tsquery. For example, select stats(fd_open_across_datanodes, max) where category = service and serviceDisplayName = “my-hdfs-service” will return a single time-series stream containing the just the maximum statistic values from the fd_open_across_datanodes stream. The following statistics are available through the stats function: min, max, avg, std_dev, and sample. See tsquery Language for more details on the stats function.