Introducing Cloudera Navigator

Cloudera Navigator is the first fully integrated data management tool for the Hadoop platform. Cloudera Navigator 1.0 provides data governance capabilities such as verifying access privileges and auditing access to all data stored in Hadoop. These capabilities are critical for enterprise customers that are in highly regulated industries and have stringent compliance requirements.

Cloudera Navigator tracks access permissions and actual accesses to all data objects in Hive, HBase, and HDFS to help answer questions such as - who has access to which data object(s), which data objects were accessed by a user, when was a data object accessed and by whom, what data assets were accessed using a service, which device was used to access, and so on. In the current release Cloudera Navigator supports tracking access to:

  • HDFS data accessed through HDFS, Hive, and HBase operations
  • Hive metadata

Cloudera Navigator allows administrators to configure, collect, and view audit events, to understand who accessed what data and how. The information in an audit event includes:

  • Timestamp - The date and time of the access.
  • Operation - The operation performed on the object. For example, list an HDFS directory, create a Hive table, or put an HBase object.
  • Object accessed - The object that was accessed. For example, a Hive table, an HDFS file or directory, or an HBase table.
  • User - The principal that accessed the object. Typically, this is a username. Where appropriate, this is annotated with the authentication mechanism.
  • IP address - The address of the machine that accessed the object.
  • Service - The service instance through which the data was accessed. For example, a Hive service instance.

Cloudera Navigator allows administrators to generate reports that list the HDFS access permissions granted to groups.

Cloudera Navigator Architecture

The architecture of Cloudera Navigator is illustrated below.

images/image1.jpeg

Cloudera Navigator is implemented as an add-on to Cloudera Manager 4.5; all Cloudera Navigator functions (installation, configuration, and audit log review) are accessed through the Cloudera Manager Admin Console.

When Cloudera Navigator is installed, plug-ins that enable collection of audit events are added to each audited service. When data is accessed via the services for whom auditing is enabled via Cloudera Navigator, audit events are generated and sent to the Navigator Server, which stores the events securely and durably in a database.

Service Versions and Audited Operations

This section describes the service versions and audited operations supported by Cloudera Navigator.

HDFS

Minimum supported CDH version: 4.0

The captured operations are:

  • Operations that access or modify a file's or directory's data or metadata
  • Operations denied due to lack of privileges

HBase

Minimum supported CDH version: 4.0

The captured operations are:

  • Operations that require a privilege (except balance, balance switch, and append)
  • Operations denied due to lack of privileges
  Note:
  • In CDH versions less than 4.2, for grant and revoke operations, the operation in log events is "ADMIN"
  • In simple authentication mode, if the HBase Secure RPC Engine property is "false" (the default), the username in log events is "UNKNOWN". To see a meaningful user name:
    1. Click the HBase service.
    2. Select Configuration > View and Edit > Service-wide > Security
    3. Set the HBase Secure RPC Engine property to"true".
    4. Save the change and restart the service.

Hive

Minimum supported CDH version: 4.2

The captured operations are:

  • Operations (except grant, revoke, and metadata access only) sent to HiveServer2
  Note:
  • Operations denied due to lack of privileges are not captured
  • Access via the Hive CLI is not supported
  • In simple authentication mode, the username in log events is the username passed in the HiveServer2 connect command. If you do not pass a username in the connect command, the username is log events is "anonymous".

Hue

Minumum supported CDH version: 4.2

The captured operations are:

  • Operations (except grant, revoke, and metadata access only) sent to Beeswax Server
  Note: You do not directly configure the Hue service for auditing. Instead, when you configure the Hive service for auditing, operations sent to the Hive service through Beeswax appear in the Hue service audit log