Before You Install Sentry

Before you install Sentry, verify the prerequisites and performance guidelines.

Prerequisites

Verify the following prerequisites:

  • CDH 5.1.0 or higher managed by Cloudera Manager 5.1.0 or higher.
  • If you want to configure high availability for Sentry, you must have CDH 5.13.0 or higher and Cloudera Manager 5.13.0 or higher installed.
  • If you want to enable Sentry high availability, you must use a relational database, not a flat file, for the Sentry service database.
  • You must have a Java version installed that has JDK-8055949 fixed.
  • HiveServer2 and the Hive Metastore (HMS) running with strong authentication. For HiveServer2, strong authentication is either Kerberos or LDAP. For the Hive Metastore, only Kerberos is considered strong authentication (to override, see Securing the Hive Metastore).
  • If you want to use Sentry with Impala, you must have Impala 1.4.0 or higher running with strong authentication. With Impala, either Kerberos or LDAP can be configured to achieve strong authentication.
  • If you want to use Sentry with Cloudera Search, the Sentry service must be configured with a database. You must have Cloudera Search for CDH 5.1.0 or higher installed. Solr supports using Sentry beginning with CDH 5.1.0. The following features were added at different releases:
    • Sentry with policy files was added in CDH 5.1.0. Note that you cannot configure Sentry high availability with policy files because high availability requires Sentry to use a relational database.
    • Sentry with config support was added in CDH 5.5.0.
    • Sentry with a relational database-backed Sentry service was added with CDH 5.8.0. If you want to use high availability for Sentry with Solr, you must use this version of Solr or higher because Sentry must be configured with a relational database.
  • Implement Kerberos authentication on your cluster. For instructions, see Enabling Kerberos Authentication for CDH.

Performance Guidelines

Use the following guidelines for optimal performance:
  • Creating a large number of roles in Sentry can slow all aspects of Sentry performance. Use 5,000 or fewer roles for best performance.
  • Set the HMS heap size to at least 10 GB. This is required because by default, Sentry uses 12 connections to communicate with HMS. To verify the HMS heap size, open the Hive service, click the Configuration tab, and search for the Java Heap Size of Hive Meatstore Server in Bytes property.
  • Cloudera recommends that for each Sentry host, you have 2.25 GB memory per million objects in the Hive database. Hive objects include servers, databases, tables, partitions, columns, URIs, and views.

    Make sure that the JVM heap size is set to a value that is appropriate for the memory requirements. You can check the heap size in Cloudera Manager. Open the Sentry service, click the Configuration tab, and search for the Java Heap Size of Sentry Server in Bytes property. Set that property to the maximum size for the Java process heap memory.

    The amount of memory that Sentry requires increases linearly as the number of objects in the Hive database increases. The graph below shows the memory required for Sentry based on the number of Hive objects.

    Sentry Memory Usage Based on Hive Objects