Navigator Metadata Server Tuning

This page can help you tune your Navigator Metadata Server instance for peak performance. See also Setting up Navigator Role Instances on Different Hosts.

Memory Sizing Considerations for Navigator Metadata Server

Unlike Navigator Audit Server for which configured Java heap size is rarely an issue, Navigator Metadata Server encompasses two processes that have a direct impact on memory, specifically:
  • Extracting metadata from the cluster and creating relationships among the metadata entities (facilitating lineage)
  • Querying to find entities

Navigator Metadata Server uses Solr to index, store, and query metadata. Indexing occurs during the extraction process, with the resulting Solr documents—data structure used by Solr for index and search—stored in the specified Navigator Metadata Server Storage Dir. Because the metadata is indexed, querying is fast and efficient. However, Solr indexing runs in-process with Navigator Metadata Server, so the amount of memory configured for the Java heap is critical.

There is a direct correlation between the number of Solr documents contained in the index and the size of the Java heap required by Navigator Metadata Server, so this setting may need to be changed over time as the number of Solr documents making up the index increases. To calculate optimal Java heap setting for your system, see Estimating Optimal Java Heap Size Using Solr Document Counts.

Estimating Optimal Java Heap Size Using Solr Document Counts

Each time it starts up, Navigator Metadata Server counts and logs the number of Solr documents in the datadir (the Navigator Metadata Server storage directory), as shown below:
2016-11-11 09:24:58,013 INFO com.cloudera.nav.server.NavServerUtil:
  Found 68813088 documents in solr core nav_elements
2016-11-11 09:24:58,705 INFO com.cloudera.nav.server.NavServerUtil:
  Found 78813930 documents in solr core nav_relations

These counts can be used to estimate optimal Java heap size for the server, as detailed below. If your normal setup provides less than 8 GB for the Navigator Metadata Server heap, consider increasing the heap before performing an upgrade. See Setting the Navigator Metadata Server Java Heap Size for details about using Cloudera Manager Admin Console to modify this setting when needed.

To estimate the Java heap size:
  1. Open the log file for Navigator Metadata Server. By default, logs are located in /var/log/cloudera-scm-navigator.
  2. Find the number of documents in solr core nav_elements line in the log.
  3. Find the number of documents in solr core nav_relations line in the log.
  4. Multiply the total number of element documents by 200 bytes per document and add to a baseline of 2 GB:
    (num_nav_elements * 200 bytes) + 2 GB
    For example, using the log shown above, the recommended Java heap size is ~14 GiB:
    (68813088 * 200) + 2 GB
    13762617600 bytes = ~12.8 GiB + 2 GB (~1.8 GiB) = ~ 14–15 GiB

Reducing the Metadata collected by Navigator Metadata Server

You can choose to leave some file system paths out of the scope of information tracked in Cloudera Navigator. Cloudera Manager provides a blacklist where you can specify file systems paths that should be filtered out of metadata extracted from HDFS and S3.

To filter file system paths from tracked metadata:

  1. Log in to Cloudera Manager Admin Console.
  2. Select Clusters > Cloudera Management Service.
  3. Click the Configuration tab.
  4. Select Navigator Metadata Server for the Scope filter.
  5. Select Extractor Filter for the Category filter.
  6. Enable the filter:
    • HDFS Filter Enable
    • S3 Filter Enable
  7. In the appropriate filter list, include the file system path that you want to exclude from Navigator Metadata Server tracking:
    • HDFS Filter Blacklist
    • S3 Filter list

    The entry can be a specific path or a Java regular expression specifying a path. For example, to specify a directory and all subdirectories, use an expression such as

    /path/to/dir(?:/.*)?
  8. Enter additional entries in the filter list by clicking to open another entry.
  9. For S3, set the S3 Filter Default Action to DISCARD.
  10. Click Save Changes.
  11. Click the Instances tab.
  12. Restart the role.
Navigator Heap Size calculations:
  • Normal operation
    (num_nav_elements * 200 bytes) + 2 GB
  • Upgrade between CM 5.9 and 5.10
    ((num_nav_elements + num_nav_relations) * 200 bytes) + 2 GB

Purging the Navigator Metadata Server of Deleted and Stale Metadata

Administrators can manage the Java heap requirements by clearing the Navigator Metadata Server of stale and deleted metadata prior to an upgrade or whenever system performance seems slow. Purging stale and deleted metadata also helps speed up display of lineage diagrams. Purge fully removes metadata that has been deleted.

For Cloudera Navigator 2.11 (and higher) releases—Cloudera Navigator console (Administration tab) provides a fully configurable Purge Settings page. See Managing Metadata Storage with Purge for details.

For Cloudera Navigator 2.10 (and prior releases)—The Purge capability can be directly invoked using the Cloudera Navigator APIs. See Using the Purge APIs for Metadata Maintenance Tasks for details.