Cloudera Manager provides many features for monitoring the health and performance of the components of your clusters (hosts, service daemons) as well as the performance and resource demands of the jobs running on your clusters:
- Monitor Services - Service monitoring lets you view the results of health checks at both the service and role instance level. Various types of metrics are displayed in charts that help with problem diagnosis. Health checks include advice about actions you can take if the health of a component becomes concerning or bad. You can also view the history of actions performed on a service or role, and can view an Audit log of configuration changes.
- Monitor Hosts - Host monitoring lets you view information pertaining to all the hosts on your cluster: which hosts are up or down, current resident and virtual memory consumption for a host, what role instances are running on a host, which hosts are assigned to different racks, and so on. You can look at a summary view for all hosts in your cluster or drill down for extensive details about an individual host, including charts that provide a visual overview of key metrics on your host.
- Monitor Activities - Activity monitoring lets you see who's running what activities on the cluster, both at the current time and through views of historical activity, and provides many statistics both in tabular displays and charts about the resources used by individual jobs. You can compare the performance of similar jobs and view the performance of individual task attempts across a job to help diagnose behavior or performance problems.
- Events - The Event Server aggregates relevant Hadoop events and makes them available for alerting and for searching, giving you a view into the history of all relevant events that occur cluster-wide. You can also filter event entries by time range, service, host, keyword, and so on.
- Alerts - You can configure Cloudera Manager to generate alerts from a variety of events. You can configure thresholds for certain types of events, enable and disable them, and configure alert notifications by email or via SNMP trap for critical events. You can also suppress alerts temporarily for individual roles, services, hosts, or even the entire cluster to allow system maintenance/troubleshooting without generating excessive alert traffic.
- Audit Events - Audits pages display audit events such as creating a role or service, making configuration revisions for a role or service, decommissioning and recommissioning hosts, and running commands. You can also filter audit event entries by time range, service, host, keyword, and so on.
- Chart Time-Series Data - Cloudera Manager enables you to search metric data, create charts of the data, group (facet) the data, and save those charts to user-defined views.
- Logs - Cloudera Manager provides access to logs in a variety of ways that take into account the current context you are viewing. For example, when monitoring a service, you can easily click a single link to view the log entries related to that specific service, through the same user interface. When viewing information about a user's activity, you can easily view the relevant log entries that occurred on the hosts used by the job while the job was running.
- Reports - Reports provide an historical view into disk utilization by user, user group, and by directory. You can manage your HDFS directories as well, including searching and setting quotas. You can also view cluster job activity user, group, or job ID. These reports are aggregated over selected time periods (hourly, daily, weekly, and so on) and can be exported as XLS or CSV files.
- Troubleshooting Cluster Configuration and Operation - Contains solutions to some common problems that prevent you from using Cloudera Manager and describes how to use Cloudera Manager log and notification management tools to diagnose problems.