Known Issues and Workarounds in Cloudera Navigator 2 Data Management

The following sections describe the current known issues in Cloudera Navigator 2.

Discrepancies between data shown in Search and dashboard

Because Navigator uses Audit Server data for the "Database Creation" count, there might be discrepancies in what you see in the Solr dataset and Audit dataset. The Audit dataset captures real data continuously. Where as Solr is based on Hive extractions that run periodically. So, between two Hive extractions, if a table is created/dropped, it will not show up in Solr. But, it will show up in Audit records. So, there might be more drop tables reported in dashboard than tables created.

When changing S3 key to a different account, Navigator does not create a queue in the new account

Navigator does not support changing keys from one S3 account to another. If a new key is provided to Navigator in Cloudera Manager, the key must belong to the same S3 account as the previous key.

Navigator does not extract unnamed folders in S3

Navigator does not extract folders that have no name, but their contents are extracted. For example: if the top level folder in the bucket has no name (for example, /bucket//folder/file), it is extracted as /bucket/folder/file.

Implicit folders are not marked as deleted in Navigator

If an implicit folder is deleted in S3, it does not appear as deleted in Navigator.

Workaround: To prevent folders deleted in S3 from appearing in Navigator Search results, include implicit:false in the search query.

Inconsistencies in AWS can cause Navigator extraction to stop

Inconsistencies that occur in AWS (for example, due to eventual consistency) can delay Navigator extraction of S3 data. When Navigator detects an inconsistency, extraction may stop until the inconsistency is resolved in AWS. Navigator will retry at the next scheduled extraction.

User might see a "not authorized" message when logging in

The Navigator UI saves the state of the last URL accessed when you log out, and takes you to the same page on the next login. If two different users log in to Navigator using the same browser tab, the state of the first user applies to the second. If the second user does not have permissions to that section of the page, that user receives an error message.

Workaround: Close the browser tab and log in on a new tab. The state is cleared, and the access error message does not appear.

Purge specifications for Navigator

Policies cannot use cluster names in queries. Cluster name is a derived attribute and cannot be used as-is.

Workaround: When setting move actions for Cloudera Navigator, if there is only one cluster known to the Navigator instance, remove the clusterName clause.

If there is more than one cluster known to the Navigator instance, replace clusterName with sourceId. To get the sourceId, issue a query in this format:
curl '<nav-url>/api/v9/entities/?query=type%3Asource&limit=100&offset=0'
Use the identity of the matching HDFS service for this cluster as the sourceId.

Lineage returns deleted entities even if the Hide deleted entities option is selected

Purge appears suspended while extraction is running

When you issue a purge command, it does not start if extractors are running. During this time, the maintenance page indicates that maintenance tasks are not running. Once the extraction is complete and purge starts, it shows the status of the purge operations.

Audit logs are not drained when audited process is stopped

If an audited role is deleted or migrated to a different host, and there are pending audits that are waiting to be transferred to Audit Server, those audits may not get transferred. There are pending audits when audits cannot be transferred either because Audit Server is down or is unreachable because of a network issue. During role migration, ensure that Audit Server is in a healthy state to make sure all audited actions make to Audit Server.

Spurious errors about missing database connectors are reported in the Metadata Server log file

Workaround: Ignore the errors.

Audit CSV has extra columns and is missing some data

When you export audits to CSV, Sentry data is not visible in the generated CSV file. Also, some of the column names show up twice (Operation Text, Database Name, Object Type, and so on.), but the data only shows up in one of the columns.

Workaround: Export audits to JSON format to see Sentry data.

Metadata component in Cloudera Navigator 1.2 (included with Cloudera Manager 5.0) cannot be upgraded to 2.0

Cloudera does not provide an upgrade path from the Navigator Metadata Server that was a beta release in Cloudera Navigator 1.2 to the Cloudera Navigator 2 release. If you are upgrading from Cloudera Navigator 1.2 (included with Cloudera Manager 5.0), you must perform a clean install of Cloudera Navigator 2.

Workaround:

  1. Delete the Navigator Metadata Server role.
  2. Remove the contents of the Navigator Metadata Server storage directory.
  3. Add the Navigator Metadata Server role according to the process described in Adding the Navigator Metadata Server.
  4. Clear the cache of any browser that used the 1.2 release of the Navigator Metadata component. Otherwise, you may see errors in the Navigator Metadata UI.

The Hive extractor does not handle all Hive statements

The Hive extractor does not handle the following cases:

  • Table generating functions
  • Lateral views
  • Transform clauses
  • Regular expression in select clause

If a query involves any of the above, lineage will not be complete for that Hive query.

Workaround: None.

The IP address in a Hue service audit log shows as "unknown"

The IP address in a Hue service audit log shows as "unknown".

Severity: Low

Workaround: None.

Hive service configuration impact on Hue service auditing

If the audit configuration for a Hive service is changed, Beeswax must be restarted to pick up the change in the Hue service audit log.

Severity: Low

Workaround: None.

Hive service configuration in auditing component

For Hive services, the auditing component does not support the "Shutdown" option for the "Queue Policy" property.

Severity: Low

Workaround: None.

Restrictions on Lineage for Spark

The Spark support for lineage diagrams with Cloudera Navigator has the following limitations and restrictions:
  • Spark Lineage information is produced only for data that is read/written and processed using the Dataframe and SparkSQL APIs. Lineage is not available for data that is read/written or processed using Spark's RDD APIs.

  • Spark Lineage information is not produced for calls to aggregation functions such as groupBy().

  • The lineage feature is not available for Spark when Cloudera Manager is running in single user mode.

  • The lineage feature is only supported for Spark 1.6 included with CDH 5.11 and higher. It is not supported with Cloudera Distribution of Apache Spark 2.0 or 2.1.

  • The lineage feature is not available for Spark when Cloudera Manager is running in single user mode.

  • Prior to CDH 5.11, a rudimentary Spark extractor could be turned on using the safety valve by setting nav.spark.extraction.enable=true. Remove this setting from the safety valve when upgrading to CDH 5.11 or higher. This setting and the original Spark extractor are now deprecated and could be removed in the future.

  • Changing the lineage directory for Spark on Yarn does not work and the default value remains in effect.

  • The default lineage directory for Spark on Yarn is /var/log/spark/lineage. No process or user should write any file in this directory; doing so could cause agent failures.