Cloudera Navigator support for Virtual Private Clusters
Cloudera Manager supports deploying workloads in virtual private compute clusters, allowing administrators to access resources for high-demand times or to isolate workloads. In this environment Cloudera Navigator continues to extract metadata and track audit events from services running on the Base cluster. Navigator does not extract metadata from services running on the Compute cluster. Navigator does not track audit events from services running on the Compute cluster.
When you use Compute clusters, you define a data context to control how data is shared between a Compute cluster and the Base cluster. The interaction between the Compute clusters and the Base clusters through the data context means that some of the activity that occurs on Compute clusters does affect the metadata collected in Navigator. For example, if you create Hive data assets using HiveServer2 or SparkSQL on the Compute cluster and you have Hive in your data context, you will see entities for the new Hive data assets in Navigator. You won't see lineage for how these assets were created because the operations on the Compute cluster are not extracted. You won't see audits for the events that created the assets because audits are not collected from the services running on the Compute cluster. The following tables describe the behavior of Navigator metadata and audit collection in Base and Compute clusters for the services Navigator supports.
Navigator Auditing in Virtual Private Compute Clusters
No audits appear in Navigator for events that occur on a Compute cluster. If Sentry is included in the data context for a cluster, you will see audit events for Sentry actions when those actions are performed in HiveServer2 or Impala on the Compute cluster.
Navigator Metadata and Lineage Extraction in Virtual Private Compute Clusters
No metadata is extracted from services running on a Compute cluster. However, if HDFS or Hive is included in the data context for a Compute cluster, Navigator shows entities created or updated on a Compute cluster and stored in HDFS or Hive Metastore on the Base cluster. For example, when directories or files are created from actions on a Compute cluster with HDFS in its data context, the directories and files are stored on the HDFS in the Base cluster. Navigator collects the metadata from the Base cluster HDFS and creates entities for the directories and files. Similarly, when Hive databases, tables, views, or partitions are created or modified by HiveServer2, Impala, or SparkSQL operations on a Compute cluster and Hive is included in the data context for that cluster, the updated metadata is extracted from HMS on the Base cluster and collected by Navigator. Because Navigator does not extract metadata directly from the Compute cluster, the operations and operation executions that created the data assets are not collected; therefore, Navigator does not calculate lineage for these data assets.