Cloudera Data Management

This guide describes how to perform data management using Cloudera Navigator. Data management activities include auditing access to data residing in HDFS and Hive metastores, reviewing and updating metadata, and discovering the lineage of data objects.

Cloudera Navigator is a fully integrated data management and security system for the Hadoop platform. Cloudera Navigator features address the needs of a broad range of stakeholders interacting with data at scale:

  • Compliance groups must track and protect access to sensitive data. Their concerns focus on being prepared for an audit, tracking who is accessing what data and what are they doing with it, and ensuring that sensitive data is governed and protected.
  • Hadoop administrators and DBAs are responsible for boosting user productivity and cluster performance. These users are concerned with how is data being used and how it can be optimized for future workloads.
  • Data stewards and curators manage and organize data assets at Hadoop scale. Their tasks involve managing the data lifecycle efficiently, from ingest to purge.
  • Data scientists and BI users need to find the data that matters most. They want to be able explore data, trust what they find, and be able to visualize relationships between data sets.
To address the requirements of all these users, Cloudera Navigator provides the following categories of functionality:
  • Data Management - Data management provides visibility into and control over the data residing in Hadoop datastores and the computations performed on that data. The Cloudera Navigator features that address the data management needs of Hadoop administrators, data stewards, and data scientists are:
    • Auditing data access and verifying access privileges - The goal of auditing is to capture a complete and immutable record of all activity within a system. Cloudera Navigator auditing features add secured, real-time audit components to key data and access frameworks. Cloudera Navigator allows compliance groups to configure, collect, and view audit events, and to understand who accessed what data and how.
    • Searching metadata and visualizing lineage - Cloudera Navigator metadata management features allow DBAs, data stewards, business analysts, and data scientists to define, search for, amend the properties of, and tag data entities and view relationships between datasets.
    • Policies - Cloudera Navigator policy features enable data stewards to specify automated actions based on data access or on a schedule to add metadata, create alerts, and move or purge data.
    • Analytics - Cloudera Navigator analytics features enable Hadoop administrators to examine data usage patterns and create policies based on those patterns.
  • Data Encryption - Data encryption and key management provide a critical layer of protection against potential threats by malicious actors on the network or in the data center. It is also a requirement for meeting key compliance initiatives and ensuring the integrity of your enterprise data. The following Cloudera Navigator components enable compliance groups to manage encryption:
    • Cloudera Navigator Encrypt transparently encrypts and secures data at rest without requiring changes to your applications and ensures there is minimal performance lag in the encryption or decryption process.
    • Cloudera Navigator Key Trustee Server is an enterprise-grade virtual safe-deposit box that stores and manages cryptographic keys and other security artifacts.
    • Cloudera Navigator Key HSM allows Cloudera Navigator Key Trustee Server to seamlessly integrate with a hardware security module (HSM).

Cloudera Navigator data management and data encryption components can be installed independently.