Adding Audit Filters
Audit filters control what information Navigator Audit Server collects. The following sections describe cases where a filter can allow you to remove noise from the audits or otherwise improve the quality of the audit trail.
The components of audit events differ based on the service that produces the event. See Service Audit Events for a list of the event fields. You can modify audit filters for services in the Cloudera Manager configuration property Audit Event Filter (navigator.event.filter) located in the Cloudera Navigator category in each service's configuration page. See Configuring Service Auditing Properties.
Customizing the Default Audit Filters
Cloudera Manager includes audit filters for HDFS, HBase, Hive by default. It may be possible to improve the effectiveness of these audit filters by using information you know about your system. You can check the Audits tab in the Navigator console to verify the exact text of the fields to ensure your filters match the right events. For example, check usernames for specific syntax such as when using Kerberized names.
There are two changes that can potentially have vast improvements for the effectiveness of the default audit filters:
- Default system usernames
If your system uses other usernames for key system users, you can improve the default filters by replacing the generic user names with the specific names used in your environment. For example, if you have a system user that performs HDFS operations that is named something other than hdfs, modify the default filters to use the specific name.
- Events from unneeded operations
You can improve the default filters by adding system events that don't add value to your audit tracking.
The default HDFS filters are the following:
- Excludes events produced by typical system user roles.
Action: discard Fields: username: (?:cloudera-scm|dr.who|hbase|hive|impala|mapred|solr|spark)(?:/.+)?
- Excludes common events that don't add value to an audit log.
Action: discard Fields: username: (?:hdfs)(?:/.+)?, operation: (?:listStatus|listCachePools|listCacheDirectives|getfileinfo)
- Excludes activity in /tmp and Hue system directories
Action: discard Fields: src: /user/hue/\.cloudera_manager_hive_metastore_canary(?:/.*)? Action: discard Fields: src: /user/hue/\.Trash/Current/user/hue/\.cloudera_manager_hive_metastore_canary(?:/.*)? Action: discard Fields: src: /tmp(?:/.*)?
The default HBase filter is the following
- Excludes activity on the HBase system tables
Action: discard Fields: tableName: (?:-ROOT-|.META.|_acl_|hbase:meta|hbase:acl)
The default Hive filter is the following:
- Excludes queries against the internal Hive /tmp data
Action: discard Fields: operation: QUERY, objectType: DFS_DIR, resourcePath: /tmp/hive-(?:.+)?/hive_(?:.+)?/-mr-.*
Safegaurding Audit Events for Important Operations
To make sure your audit history includes the critical events that you want to track such as HDFS delete or rename operations, consider setting the first filter in the list for a service to something like the following:
Action: accept Fields: operation: delete|rename
This filter ensures that any delete or rename operation gets accepted, irrespective of other filters.
Removing Audit Events for Automated Tasks
If you find that the audit events include references to automated tasks against HDFS or other services that you don't want to include in your audit, you can include the tasks in a "discard" filter. For example, to discard all tasks performed against HDFS by the user under which the automated tasks are performed, create an HDFS filter:
Action: Discard Fields: username: (?:email@example.com)
This rule would discard any event performed by the automationuser account, where the user is specified by itself or with a Kerberos realm designation. Note that the filter is not case sensitive. As a performance optimization, the expression is included in parentheses with ?: to indicate that the matched values are not retained.
Removing Audit Events for Low Priority Operations
You may find that your audit log includes reference to events that you are not interested in tracking. You can prevent these events from being included in the audit log by filtering them out by operation. For example, if the audit event that you don't want to track looks like this:
where the HDFS operation getfileinfo ran against the file status for the user someuser on a node in the cluster. The user account is indicated with a Kerberos realm.
To remove the HDFS getfileinfo accesses by this user, the audit filter would reference the user (by itself or with a Kerberos realm name) and the getfileinfo operation:
Action: Discard Fields: username: (?:username|username@MYREALM.com) operation: (?:getfileinfo)
As a performance optimization, the expressions are included in parentheses with ?: to indicate that the matched values are not retained.
To include more than one operation in the same filter, include both operations with an OR operator |. :
Action: Discard Fields: username: (?:username|username@MYREALM.com) operation: (?:getfileinfo|liststatus)