Cloudera Data Management
This guide describes how to perform data management using Cloudera Navigator. Data management activities include auditing access to data residing in HDFS and Hive metastores, reviewing and updating metadata, and discovering the lineage of data objects.
Cloudera Navigator is a fully integrated data management tool for the Hadoop platform. Data management capabilities are critical for enterprise customers that are in highly regulated industries and have stringent compliance requirements.
- Auditing data access and verifying access privileges - The goal of auditing is to capture a complete and
immutable record of all activity within a system. While Hadoop has historically lacked centralized cross-component audit capabilities, products such as Cloudera Navigator add secured, real-time audit
components to key data and access frameworks. Cloudera Navigator allows administrators to configure, collect, and view audit events, to understand who accessed what data and how. Cloudera Navigator
also allows administrators to generate reports that list the HDFS access permissions granted to groups.
Cloudera Navigator tracks access permissions and actual accesses to all entities in HDFS, Hive, HBase, Impala, and Sentry to help answer questions such as - who has access to which entities, which entities were accessed by a user, when was an entity accessed and by whom, what entities were accessed using a service, which device was used to access, and so on. Cloudera Navigator auditing supports tracking access to:
- HDFS data accessed through HDFS, Hive, HBase, Cloudera Impala services
- HBase and Impala operations
- Hive metadata
- Sentry access
- Searching metadata and visualizing lineage - Cloudera Navigator metadata management features allow DBAs,
data modelers, business analysts, and data scientists to search for, amend the properties of, and tag data entities.
In addition, to satisfy risk and compliance audits and data retention policies, it supports the ability to answer questions such as: where did the data come from, where is it used, and what are the consequences of purging or modifying a set of data entities. Cloudera Navigator supports tracking the lineage of HDFS files and directories, Hive tables and columns, MapReduce and YARN jobs, Hive queries, Pig scripts, Sqoop jobs, and Oozie workflows.