Before You Install Sentry

Before you install Sentry, verify the prerequisites and performance guidelines.

Prerequisites

Before you install Sentry, verify the following prerequisites:

  • CDH 5.1.0 or higher managed by Cloudera Manager 5.1.0 or higher.
  • If you want to configure high availability for Sentry, you must have CDH 5.13.0 or higher and Cloudera Manager 5.13.0 or higher installed.
  • If you want to enable Sentry high availability, you must use a relational database, not a flat file, for the Sentry service database.
  • You must have a Java version installed that has JDK-8055949 fixed.
  • HiveServer2 and the Hive Metastore running with strong authentication. For HiveServer2, strong authentication is either Kerberos or LDAP. For the Hive Metastore, only Kerberos is considered strong authentication (to override, see Securing the Hive Metastore).
  • If you want to use Sentry with Impala, you must have Impala 1.4.0 or higher running with strong authentication. With Impala, either Kerberos or LDAP can be configured to achieve strong authentication.
  • If you want to use Sentry with Cloudera Search, the Sentry service must be configured with a database. You must have Cloudera Search for CDH 5.1.0 or higher installed. Solr supports using Sentry beginning with CDH 5.1.0. The following features were added at different releases:
    • Sentry with policy files was added in CDH 5.1.0. Note that you cannot configure Sentry high availability with policy files because high availability requires Sentry to use a relational database.
    • Sentry with config support was added in CDH 5.5.0.
    • Sentry with a relational database-backed Sentry service was added with CDH 5.8.0. If you want to use high availability for Sentry with Solr, you must use this version of Solr or higher because Sentry must be configured with a relational database.
  • Implement Kerberos authentication on your cluster. For instructions, see Enabling Kerberos Authentication Using the Wizard.

Performance Guidelines

Use the following guidelines for optimal performance:
  • Creating a large number of roles in Sentry can slow all aspects of Sentry performance. Use 5,000 or fewer roles for best performance.
  • Cloudera recommends that for each Sentry host, you have 2.25 GB memory per million objects in the Hive database. Hive objects include servers, databases, tables, partitions, columns, URIs, and views.

    The amount of memory required for Sentry increases linearly as the number of objects in the Hive database increases. The graph below shows the memory required for Sentry based on the number of Hive objects.

    Sentry Memory Usage Based on Hive Objects