Configuring the Sentry Service

Enabling the Sentry Service for Hive

Prerequisites

  • Ensure all the action items under Prerequisites are complete.
  • The Hive warehouse directory (/user/hive/warehouse or any path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
    • Permissions on the warehouse directory must be set as follows (see following Note for caveats):
      • 771 on the directory itself (for example, /user/hive/warehouse)
      • 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
      • All files and subdirectories should be owned by hive:hive
      For example:
      $ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
      $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
  • Disable impersonation for HiveServer2 in the Cloudera Manager Admin Console:
    1. Go to the Hive service.
    2. Click the Configuration tab.
    3. Under the HiveServer2 role group, uncheck the HiveServer2 Enable Impersonation property, and click Save Changes.
  • Enable the Hive user to submit MapReduce jobs.
    1. Go to the MapReduce service.
    2. Click the Configuration tab.
    3. Under a TaskTracker role group go to the Security category.
    4. Set the Minimum User ID for Job Submission property to zero (the default is 1000) and click Save Changes.
    5. Repeat steps 1-4 for every TaskTracker role group for the MapReduce service that is associated with Hive, if more than one exists.
    6. Restart the MapReduce service.
  • Enable the Hive user to submit YARN jobs.
    1. Go to the YARN service.
    2. Click the Configuration tab.
    3. Under a NodeManager role group go to the Security category.
    4. Ensure the Allowed System Users property includes the hive user. If not, add hive and click Save Changes.
    5. Repeat steps 1-4 for every NodeManager role group for the YARN service that is associated with Hive, if more than one exists.
    6. Restart the YARN service.

Enabling the Sentry service for Hive

  1. Go to the Hive service.
  2. Click the Configuration tab.
  3. In the Service-Wide category, set the Sentry Service property to Sentry.
  4. Restart the Hive service.

Configuring HiveServer2 for the Sentry Service

Add the following properties to hive-site.xml to allow the Hive service to communicate with the Sentry policy store.
<property>
   <name>hive.security.authorization.task.factory</name>
   <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
</property>
<property>
   <name>hive.server2.session.hook</name>
   <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
</property>
<property>
   <name>hive.sentry.conf.url</name>
   <value>file:///{{CMF_CONF_DIR}}/sentry-site.xml</value>
</property>

Configuring the Hive Metastore for the Sentry Service

Configuring Pig and HCatalog for the Sentry Service

Once you have the Sentry service up and running, and Hive has been configured to use the Sentry service, there are some configuration changes you must make to your cluster to allow Pig, MapReduce (using HCatLoader, HCatStorer) and WebHCat queries to access Sentry-secured data stored in Hive.

With HDFS extended ACLs enabled, Cloudera recommends you set the permissions for the Hive warehouse directory, /user/hive/warehouse, to 771 so users other than the owner and group only have execute permissions. Since by default, the /user/hive/warehouse directory is owned by hive:hive, this also restricts requests from any other users at the HDFS level.

With these permissions, other user requests may fail, such as commands coming through Pig jobs, WebHCat queries, and MapReduce jobs. In order to give these users access, perform the following configuration changes:
  • Use HDFS ACLs to define permissions on a specific directory or file of HDFS. This directory/file is generally mapped to a database, table, partition, or a data file.
  • Users running these jobs should have the required permissions in Sentry to add new metadata or read metadata from the Hive Metastore Server. For instructions on how to set up the required permissions, see Hive SQL Syntax for Use with Sentry. You can use HiveServer2's command line interface, Beeline to update the Sentry database with the user privileges.
Examples:
  • A user who is using Pig HCatLoader will require read permissions on a specific table or partition. In such a case, you can GRANT read access to the user in Sentry and set the ACL to read and execute, on the file being accessed.
  • A user who is using Pig HCatStorer will require ALL permissions on a specific table. In this case, you GRANT ALL access to the user in Sentry and set the ACL to write and execute, on the table being used.

Configuring the Hive Metastore to Communicate with Sentry

Add the following properties to hive-site.xml to allow the Hive metastore to communicate with the Sentry policy store.
<property>
    <name>hive.metastore.client.impl</name>
    <value>org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient</value>
    <description>Sets custom Hive Metastore client which Sentry uses to filter out metadata.</description>
</property>

<property>  
    <name>hive.metastore.pre.event.listeners</name>  
    <value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value>  
    <description>list of comma separated listeners for metastore events.</description>
</property>

<property>
    <name>hive.metastore.event.listeners</name>  
    <value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value>  
    <description>list of comma separated listeners for metastore, post events.</description>
</property>

Securing the Hive Metastore

It's important that the Hive metastore be secured. If you want to override the Kerberos prerequisite for the Hive metastore, set the sentry.hive.testing.mode property to true to allow Sentry to work with weaker authentication mechanisms. Add the following property to the HiveServer2 and Hive metastore's sentry-site.xml:
<property>
  <name>sentry.hive.testing.mode</name>
  <value>true</value>
</property>
Impala does not require this flag to be set.
You canturn on Hive metastore security using the instructions in Cloudera Security:

Configuring Impala for the Sentry Service

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

Enabling the Sentry Service for Impala

To use the Sentry service:
  1. Enable the Sentry service for Hive. For details on how to do this, see Enabling the Sentry Service for Hive.
  2. Go to the Impala service.
  3. Click the Configuration tab.
  4. In the Service-Wide category, set the Sentry Service property to Sentry.
  5. Restart Impala.

Configuring Impala as a Client for the Sentry Service

Set the following configuration properties in sentry-site.xml.
<property>
   <name>sentry.service.client.server.rpc-port</name>
   <value>3893</value>
</property>
<property>
   <name>sentry.service.client.server.rpc-address</name>
   <value>hostname</value>
</property>
<property>
   <name>sentry.service.client.server.rpc-connection-timeout</name>
   <value>200000</value>
</property>
<property>
   <name>sentry.service.security.mode</name>
   <value>none</value>
</property>
Other configuration changes required include:
  • To enable the Sentry policy service, the following flag should be set on the catalogd and the impalad.
    --sentry_config=<absolute path to sentry service configuration file>
  • To enable authorization based on policy server metadata set the following flag on the impalad.
    --server_name=<server name>
  • To enable authorization based on a file-based policy set the following flags on the impalad.
    --server_name=<server name>
    --authorization_policy_file=<path to policy file>

    If the --authorization_policy_file flag is set, Impala will use the policy file-based approach. Otherwise, the policy server metadata approach will be used to implement authorization.

  • The impala user also needs to be added to list of administrative users of the Sentry Policy Server. For more details, see SENTRY-191.