Cloudera Director Usage Bundles

A Cloudera Director usage bundle is a JSON document representing a snapshot in time of Cloudera Manager and cluster usage. It contains three sections: a metadata section, a Cloudera Manager block, and a Cloudera Director block.

Metadata Section

The short metadata section in a usage bundle contains:
  • The version of the metadata structure
  • The complete license key and billing ID for the deployment
  • The creation time of the bundle
  • A message ID structure, used by Cloudera's metering service for context and sequencing

Cloudera Manager Block

The Cloudera Manager block contains information queried from Cloudera Manager about itself and clusters that it manages.

Initial metadata in the block includes the metadata structure version and the host, port, and API version of Cloudera Manager itself.

Next, the block contains details about Cloudera Manager itself, as retrieved from its /cm/deployment API endpoint. See the Cloudera Manager REST API documentation for complete information. The data is retrieved from Cloudera Manager using its export redacted view, which eliminates sensitive configuration information such as passwords. This deployment data includes information about all clusters and their services, all hosts, and all management services. Some specific data items in the details are:
  • Cluster, service, and role names
  • Service and role configurations (redacted) and health statuses
  • Cloudera Manager's internal user accounts, with redacted passwords
  • Instances' Cloudera Manager host identifiers, private IP addresses, and private host names
  • Instances' core counts and memory sizes

Finally, the block includes time series data for the capacity and used capacity of each filesystem associated with the Cloudera Manager instance and with every instance that is part of a cluster. The data covers the five minutes prior to the bundle's creation. See the Cloudera Manager REST API documentation for complete information on the data structures in a time series. Instance private IP addresses and host names are included in the time series data.

Cloudera Director Block

The Cloudera Director block contains information queried from Cloudera Director itself about Cloudera Manager installations and clusters that it manages. For complete information on the data structures described here, consult the Cloudera Director API documentation or explore using the API console included with Cloudera Director, at the /api-console URL.

Initial metadata in the block includes the metadata structure version and the host, port, and API version of Cloudera Director itself. Ensuing details begin with the version of Cloudera Director and the time when the block was created.

Next, the block includes the deployment template used to create the Cloudera Manager installation. The data retrieved from Cloudera Director here is redacted, eliminating potentially sensitive information such as external database account details and inline scripts. Some specific data items are:
  • Redacted license and billing ID (which are available unredacted in the usage bundle metadata)
  • External Cloudera Manager database templates, if any
  • The Cloudera Manager instance template
After some deployment health and status information, details about the running deployment are included. As with the deployment template information, deployment information is redacted to eliminate potentially sensitive information such as the Cloudera Manager administrator password. Some specific data items are:
  • The Cloudera Manager version and private IP address
  • Details about the instance running Cloudera Manager, including its public and private IP addresses and host names, information specific to the cloud provider such as virtual network and subnet identifiers, its installed software capabilities, and its instance template
  • The Cloudera Manager port and administrative username
Next, the cluster templates for each of the clusters created by Cloudera Director are listed. As with other Cloudera Director API calls, cluster template information such as inline scripts is redacted for security. Some specific data items are:
  • The cluster template name and list of services deployed
  • External service database templates, if any
  • Virtual instance groups and associated instance templates
Finally, after some cluster status and health information, details about each bootstrapped cluster are provided. Sensitive information is left out of these query results like the rest. Some specific data items are:
  • Overall cluster health and individual service health checks
  • Installed software capabilities of each cluster instance

Usage Logging

Cloudera Director is capable of logging usage bundles and heartbeats as they exist immediately before submission to Cloudera's metering service. The logging is disabled by default, but it can be enabled and configured to provide visibility into precisely what Cloudera Director is sending out.

To enable usage logging in the Cloudera Director server, locate the logback.xml file used to configure its logging system. The file is normally in /usr/lib/cloudera-director-server/etc [check this]. Look for the configurations for the following loggers:
  • com.cloudera.director.metering.heartbeats
  • com.cloudera.director.metering.bundles

Change the level for each logger to INFO to enable usage logging. To disable usage logging, change the level back to ERROR. After changing the level, restart Cloudera Director so that the change takes effect.

The logging configuration writes the JSON for heartbeats and usage bundles to a dedicated log file. Those comfortable with configuring the Logback logging system can make further changes to have the information written elsewhere. Consult Logback documentation for the options available.

Usage logging increases the demand for file storage on the Cloudera Director instance. Do not enable it for long periods of time, to avoid running out of disk space.