Known Issues in Cloudera Navigator 6.0.0

Authentication and Authorization

Errors when using local login returns the browser to SAML login page

With SAML authentication enabled for Navigator, administrators are allowed to use locallogin.html to login with local credentials instead of SAML. However if the administrator enters a wrong username or password, the page is redirected to login.html?error=true.

When that happens, the login.html URL is no longer a local login and the login.html page address gets redirected to the IDP address for SAML authentication.

Workaround: After the login failure, the URL changes to something similar to:

https://hostname:7187/login.html?error=true

To return to the local login page, change the browser address to a URL similar to:

https://hostname:7187/locallogin.html

Affected Versions: Cloudera Navigator 6.0.0 and later

Fixed Versions: N/A

Cloudera Issue: NAV-5824

Cloudera Manager Configuration

Adding a blank audit filter removes filter configuration property

In Cloudera Manager, when adding an empty rule to a service's Audit Event Filter and then saving the change, all existing audit event filters are lost. The filter configuration property is removed from Cloudera Manager's list of configuration properties. Reverting the change in the History and Rollback does not restore the previous filters nor reproduce the filter property.

Affected Versions: Cloudera Navigator 6.0.0 and later

Fixed Versions: Cloudera Navigator 6.2.1, 6.3.1

Cloudera Issue: NAV-6096

Overriding safety valve settings disables audit and lineage features

Customers or third party applications such as Unravel may require that hive.exec.post.hooks is configured in a HiveServer2 safety valve. Cloudera Manager will comment out the hive.exec.post.hooks value that is configured if audit or lineage is enabled for Hive. The safety valve content shows the commented code:

<!--'hive.exec.post.hooks', originally set to
'com.cloudera.navigator.audit.hive.HiveExecHookContext,org.apache.hadoop.hive.ql.hooks.LineageLogger'
(non-final), is overridden below by a safety valve-->

This automated change disables Navigator's auditing and lineage features without notification.

At this time, there is no workaround.

Affected Versions: Cloudera Navigator 6.0.0 and later

Fixed Versions: N/A

Cloudera Issue: NAV-5331

Cloudera Manager audit events case-sensitive when using PostgreSQL

When Cloudera Manager and Navigator Audit Server are installed using PostgreSQL databases, the behavior of queries run from the Navigator console is different between the two databases. The result is that Cloudera Manager events are returned only if the query values match the case of the event values as they are stored in the Cloudera Manager database.

For example, the Hive operation "HiveReplicationCommand" is audited by Cloudera Manager; the audit log shows the command as HIVEREPLICATIONCOMMAND but querying with upper case fails to return the corresponding audit events. However, querying as operation = HiveReplicationCommand does return results.

Audit events other than those for Cloudera Manager are not affected.

Affected Versions: Cloudera Navigator 6.0.0, 6.0.1

Fixed Versions: 6.1.0

Cloudera Issue: NAV-6141, NAV-6795

Hive, Hue, Impala

ClassCastException in Navigator Metadata Server log

When a Hive view is created, then dropped, and then subsequently recreated as a table with the same name as that of the original view, the Hive extraction process shows this exception in Navigator Metadata Server logs:

java.lang.ClassCastException: com.cloudera.nav.hive.model.HView cannot be cast to com.cloudera.nav.hive.model.HTable

Affected Versions: Cloudera Navigator 6.0.0, 6.0.1

Fixed Versions: Cloudera Navigator 6.1.0

Cloudera Issue: NAV-5939

ConsoleAppender error in Hive Server 2 log

After running a query in Hive, the HiveServer2 stderr log includes the following error:

log4j:ERROR A "org.apache.log4j.ConsoleAppender" object is not assignable to a "com.cloudera.navigator.shaded.log4j.Appender" variable.
log4j:ERROR The class "com.cloudera.navigator.shaded.log4j.Appender" was loaded by
log4j:ERROR [sun.misc.Launcher$AppClassLoader@47d384ee] whereas object of type
log4j:ERROR "org.apache.logj4.ConsoleAppender" was loaded by [sun.misc.Launcher$AppClassLoader@47d384ee].
log4j:ERROR Could not instantiate appender named "out".
      

This problem occurs because the Navigator Audit Server plugin for Hive Server 2 expects a different log4j integration than what HiveServer 2 is using. The result is that no Navigator Audit messages appear in the Hive Server 2 logs. Hive Server 2 auditing is not affected by this problem.

Affected Versions: Cloudera Navigator 6.0.0, 6.0.1

Fixed Versions: Cloudera Navigator 6.1.0

Cloudera Issue: NAV-6523

Overriding safety valve settings disables audit and lineage features

Customers or third party applications such as Unravel may require that hive.exec.post.hooks is configured in a HiveServer2 safety valve. Cloudera Manager will comment out the hive.exec.post.hooks value that is configured if audit or lineage is enabled for Hive. The safety valve content shows the commented code:

<!--'hive.exec.post.hooks', originally set to
'com.cloudera.navigator.audit.hive.HiveExecHookContext,org.apache.hadoop.hive.ql.hooks.LineageLogger'
(non-final), is overridden below by a safety valve-->

This automated change disables Navigator's auditing and lineage features without notification.

Affected Versions: Cloudera Navigator 6.0.0 and later

Workaround: To fix this problem, manually merge the original HiveServer2 safety valve content for hive.exec.post.hooks with the new value. For example, in the case of Unravel, the new safety valve would look like the following:

<property>
  <name>hive.exec.post.hooks</name>
  <value>com.unraveldata.dataflow.hive.hook.HivePostHook,com.cloudera.navigator.audit.hive.HiveExecHookContext,org.apache.hadoop.hive.ql.hooks.LineageLogger</value>
  <description>for Unravel, from unraveldata.com</description>
</property>

Cloudera Issue: NAV-5331

Impala and Hive audit events fail to be captured when one audit event includes 4-byte characters, such as an emoji

This problem applies when the Navigator Audit Server database is a MySQL database configured to use the "UTF8" character set.

When a query includes an emoji or other Unicode supplementary-plane character that is encoded as four bytes in UTF-8, Navigator Audit Server fails to process the event and any following events from the same service.

Workaround: You can resolve this problem by configuring MySQL v5.5 or later to use the "UTF8MB4" character set. The error is described in this Stack Overflow article:

https://stackoverflow.com/questions/13653712/java-sql-sqlexception-incorrect-string-value-xf0-x9f-x91-xbd-xf0-x9f

The solution is described in the MySQL documentation topic The UTF8MB4 Character Set (4-Byte UTF-8 Unicode Encoding). Changing the character set requires restarting the MySQL server. It doesn't affect the Navigator Audit Server data.

Affected Versions: Cloudera Navigator 6.0.0, 6.0.1

Fixed Versions: 6.1.0

Cloudera Issue: NAV-4845

Viewing Navigator tags in Hue overloads Metadata Server heap

When viewing Cloudera Navigator tags through Hue, Navigator uses more memory than usual and does not release the memory after logging out of Hue. Eventually, the calls between Hue and Navigator will occupy the majority of the heap space allocated to Navigator Metadata Server.

Workaround: Restart the Navigator Metadata Server periodically to clear the heap usage.

Affected Versions: Cloudera Navigator 6.0.0 and later

Fixed Versions: N/A

Cloudera Issue: NAV-4326

Lineage not generated for Pig operations on Hive tables using HCatalog loader

When accessing a Hive table using Pig, lineage is generated in Navigator when using physical file loads, such as:

A = LOAD '/user/hive/warehouse/navigator_demo.db/salesdata';
B = LIMIT A 16;
STORE B INTO '/user/hive/warehouse/navigator_demo.db/salesdata_sample_file' using PigStorage (';');

However, when accessing the Hive table using the HCatalog load, lineage for the Pig operation is not generated when browsing the source table lineage. Such as:

A = LOAD 'navigator_demo.salesdata' using org.apache.hive.hcatalog.pig.HCatLoader();
B = LIMIT A 16;
STORE B INTO 'navigator_demo.salesdata_sample_hcatalog' using org.apache.hive.hcatalog.pig.HCatStorer();

Affected Versions: Cloudera Navigator 6.0.0 and later

Fixed Versions: N/A

Cloudera Issue: NAV-3411

Impala lineage delay when running queries from Hue

When using Hue to perform Impala queries, after running the query, the lineage doesn't show up in Navigator until Impala determines that the query is complete. Hue gives users the opportunity to pull another set of results on the same query, so Impala holds the query open. Lineage metadata is sent after Impala reaches its configured query timeout or an event such as another query or logging out of Hue occurs.

Workaround: Set low timeouts for queries in Hue or add an Impala query timeout specifically to the Hue safety valve and set the timeout for 3-5 minutes so that you see the queries show up in Navigator after Hue is idle for some time. Hue will notify users that the query needs to be run again, but it also releases the query resources. Here are the options:

HiveServer1 and Hive CLI support removed

Cloudera Navigator requires HiveServer2 for complete governance Hive queries. Cloudera Navigator does not capture audit events for queries that are run on HiveServer1/Hive CLI, and lineage is not captured for certain types of operations that are run on HiveServer1.

If you use Cloudera Navigator to capture auditing, lineage, and metadata for Hive operations, upgrade to HiveServer2 if you have not done so already.

Affected Versions: Cloudera Navigator 6.x

Fixed Versions: N/A

Cloudera Issue: TSB-185

Streaming Audit Events

Error blocks second streaming target

When streaming audit messages to both Flume and Kafka, if the Flume client throws an exception, Navigator Audit Server does not send the same messages to Kafka. To recover from this problem, the Flume client needs to be working.

Affected Versions: Cloudera Navigator 6.x

Fixed Versions: N/A

Cloudera Issue: NAV-7143

Navigator Metadata Server

Navigator does not mark HDFS entities as deleted when in bulk extraction takes too long to complete

In large HDFS deployments, the fsimage takes a long time to index. When an HDFS checkpoint occurs it creates a new fsimage. However if the previous fsimage is still in the process of being indexed, Navigator cannot use the incremental changes found in the inotify stream because it refers to the newly created fsimage.

When this happens, Navigator attempts to start indexing the newer fsimage, creating a loop where Navigator can never take advantage of the more efficient change processing through inotify. The immediate fallout of this delay is that no HDFS entities deleted in the cluster will be marked as deleted in Navigator.

Affected Versions: Cloudera Navigator 6.0.0, 6.0.1

Fixed Versions: Cloudera Navigator 6.1.0

Cloudera Issue: NAV-6456

International characters not supported in tags or property names

Navigator tags and the key portion of user-defined and managed properties do not support UNICODE characters beyond ASCII. Only ASCII text can be used in the text of a tag or the name of a property. Property values can include international characters.

Affected Versions: Cloudera Navigator 6.0.0 and later

Fixed Versions: Tags are fixed in Cloudera Navigator 6.2.0

Cloudera Issue: NAV-7011, NAV-7044

Navigator Embedded Solr can reach its limit on number of documents it can store

Navigator Metadata Server extracts HDFS entities by performing a one-time bulk extraction and then switching to incremental extraction. In Cloudera Manager releases 5.10.0, 5.10.1 and 5.11.0 (Navigator releases 2.9.0, 2.9.1, and 2.10.0), a problem causes HDFS bulk extraction to be run more than one time, resulting in duplicate relations created for HDFS. Over time, embedded Solr runs out of document IDs that it can assign to new relations and fails with following error:

"Caused by: java.lang.IllegalArgumentException: Too many documents, composite IndexReaders cannot exceed 2147483519"       

When this happens, Navigator stops any more extraction of data as no new documents can be added to Solr.

After upgrading to this release, there is an additional recover step as described in "Repairing metadata in the storage directory after upgrading" in Troubleshooting Navigator Data Management.

Affected Versions: Versions prior to Cloudera Manager 5.10 upgraded to Cloudera Manager 5.10 or higher

Fixed Versions: N/A

Cloudera Issue: NAV-5600

Log includes the error "EndPoint1 must not be null"

The following error may appear in the Navigator Metadata Server log in systems upgraded from Cloudera Manager version 5.x:

2017-10-17 13:00:23,007 ERROR com.cloudera.nav.hive.extractor.AbstractHiveExtractor [CDHExecutor-0-CDHUrlClassLoader@14784b7b]: Unable to parse hive view query *: EndPoint1 must not be null or empty
java.lang.IllegalStateException: EndPoint1 must not be null or empty

This error occurs because the Hive pull extraction for creating a Hive view produces an incorrect lineage relationship for the Hive view. However, Navigator also receives information for the view creation through the push extractor, which correctly produces the lineage relation. You can safely ignore this error.

Affected Versions: Cloudera Navigator 6.0.0 and later

Fixed Versions: N/A

Cloudera Issue: NAV-4224

Purge

Navigator Metadata Server purge jobs may not run if there are policies configured

Navigator Metadata Server purge can produce messages such as "Checking if maintenance is running" and then fail to run during the available window. This problem occurs when a scheduled purge job waits for extraction tasks to finish but while waiting, a policy job starts, preventing the purge job from running.

Workaround: Specifically, to stop policies from running when you are trying to ensure that purge jobs will run, you can delete policies or temporarily change them to run infrequently to give the purge job time to run. When purge jobs have caught up to their backlog of work, you can change the policies back to running more frequently. Note that simply disabling the policies is not sufficient.

More generally, if you find that scheduled purge jobs are not running because there are other Navigator tasks in progress, consider stopping Navigator extractors and setting policies to run much less frequently. Then manually run the metadata purge using an API call to match your Navigator version:

curl -X POST -u user:password "https://navigator_host:7187/api/vXX/maintenance/purge?deleteTimeThresholdMinutes=duration"

API versions correspond to Navigator versions as described Mapping API Versions to Product Versions.

Affected Versions: Cloudera Navigator 6.0.0, 6.0.1, 6.1.0, 6.1.1

Fixed Versions: Cloudera Navigator 6.2.0

Cloudera Issue: NAV-7037

First purge job may run twice

Navigator purge jobs are scheduled using UTC. However, the first time Navigator runs a purge, the scheduler triggers the job twice, once in UTC timezone and a second time one in local timezone. After that the schedule is triggered as expected. Other than the first purge running at an unexpected time, there are no side-effects of this issue.

Affected Versions: Cloudera Navigator 6.0.0 and later

Fixed Versions: N/A

Cloudera Issue: NAV-6666

Purge can create data that's too big for Solr to process

Solr's POST request payload is set to 2MB, which can be exceeded when purging a large Navigator metadata storage directory. The purge job fails with an error similar to the following:

2018-05-31 02:42:23,959 ERROR com.cloudera.nav.maintenance.purge.hiveandimpala.PurgeHiveOrImpalaSelectOperations [scheduler_Worker-1]:
Failed to purge operations for DELETE_HIVE_AND_IMPALA_SELECT_OPERATIONS with error
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
     Expected mime type application/octet-stream but got application/xml.

To work-around this problem, set the following options in the Navigator Metadata Server Advanced Configuration Snippet (Safety Valve) for cloudera-navigator.properties in Cloudera Manager:

nav.solr.commit_batch_size=50000
nav.solr.batch_size=50000

Restart Navigator Metadata Server. Leave these options in place until a more than one purge job has run successfully, then remove the options and restart Navigator Metadata Server.

Affected Versions: Cloudera Navigator 6.0.0 and later

Fixed Versions: N/A

Cloudera Issue: NAV-6452

Policy specifications and cluster names affect purge

Policies cannot use cluster names in queries. Cluster name is a derived attribute and cannot be used as-is.

Workaround: When setting move actions for Cloudera Navigator, if there is only one cluster known to the Navigator instance, remove the clusterName clause.

If there is more than one cluster known to the Navigator instance, replace clusterName with sourceId. To get the sourceId, issue a query in this format:
curl '<nav-url>/api/v9/entities/?query=type%3Asource&limit=100&offset=0'
Use the identity of the matching HDFS service for this cluster as the sourceId.

Affected Versions: Cloudera Navigator 6.0.0 and later

Fixed Versions: N/A

Cloudera Issue: NAV-3537

Spark

Spark Lineage Limitations and Requirements

Spark lineage diagrams are supported in the Cloudera Navigator 6.0 release. Spark lineage is supported for Spark 1.6 and Spark 2.3. Lineage is not available for Spark when Cloudera Manager is running in single user mode. In addition to these requirements, Spark lineage has the following limitations:
  • Lineage is produced only for data that is read/written and processed using the Dataframe and SparkSQL APIs. Lineage is not available for data that is read/written or processed using Spark's RDD APIs.
  • Lineage information is not produced for calls to aggregation functions such as groupBy().
  • The default lineage directory for Spark on Yarn is /var/log/spark/lineage. No process or user should write files to this directory—doing so can cause agent failures. In addition, changing the Spark on Yarn lineage directory has no effect: the default remains /var/log/spark/lineage.

Navigator doesn't recognize local files in Spark jobs

Spark jobs can use files on the local filesystem as job inputs or outputs. Navigator, however, only supports HFDS, Hive, and S3 assets as job inputs or outputs. When Navigator extracts metadata from Spark and encounters a local source type, the metadata is discarded and the following error appears in the Navigator Metadata Server log:

2018-10-11 12:14:26,192 WARN com.cloudera.nav.api.ApiExceptionMapper [qtp1574898980-23815]: Unexpected exception.
          java.lang.RuntimeException: Source LOCAL isn't supported for Spark Lineage

Affected Versions: Cloudera Navigator 6.0.0, 6.0.1, 6.1.0, 6.1.1

Fixed Versions: Cloudera Navigator 6.2.0

Cloudera Issue: NAV-6811

Spark extractor enabled using safety valve deprecated

The Spark extractor included prior to CDH 5.11 and enabled by setting the safety valve, nav.spark.extraction.enable=true is being deprecated, and could be removed completely in a future release. If you are upgrading from CDH 5.10 or earlier and were using the extractor configured with this safety valve, be sure to remove the setting when you upgrade.

Upgrade Issues and Limitations