Configuring the Sentry Service

This topic describes how to enable the Sentry service for Hive and Impala, and configuring the Hive metastore to communicate with the service.

Enabling the Sentry Service Using Cloudera Manager
Enabling the Sentry Service Using the Command Line
HiveServer2 Restricted Properties
Configuring Pig and HCatalog for the Sentry Service
Securing the Hive Metastore
Using User-Defined Functions with HiveServer2

Enabling the Sentry Service Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

Before Enabling the Sentry Service
Enabling the Sentry Service for Hive
Enabling the Sentry Service for Impala
Enabling the Sentry Service for Hue
Add the Hive and Hue Groups to Sentry's Admin Groups

Before Enabling the Sentry Service

Ensure you satisfy all the Prerequisites for the Sentry service.
Important: If you are going to enable HDFS/Sentry synchronization, you do not need to perform the following step to explicitly set permissions for the Hive warehouse directory. With synchronization enabled, all Hive databases and tables will automatically be owned by hive:hive, and Sentry permissions on tables are translated to HDFS ACLs for the underlying table files.
The Hive warehouse directory (/user/hive/warehouse or any path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
- Using the default Hive warehouse: Permissions on the warehouse directory must be set as follows (see following Note for caveats):
  - 771 on the directory itself (for example, /user/hive/warehouse)
  - 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
  - All files and subdirectories should be owned by hive:hive
  For example:
```
$ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
$ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
```
  If you have enabled Kerberos on your cluster, you must kinit as the hdfs user before you set permissions. For example:
```
sudo -u hdfs kinit -kt <hdfs.keytab> hdfs
sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
$ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
```
  Note:
  - If you set hive.warehouse.subdir.inherit.perms to true in hive-site.xml, the permissions on the subdirectories will be set when you set permissions on the warehouse directory itself.
  - If a user has access to any object in the warehouse, that user will be able to execute use default. This ensures that use default commands issued by legacy applications work when Sentry is enabled.
  - The instructions described above for modifying permissions on the Hive warehouse directory override the recommendations in the Hive section of the CDH 5 Installation Guide.
- Using a non-default Hive warehouse: If you would like to use a different directory as the Hive warehouse, update the hive.metastore.warehouse.dir property, and make sure you set the required permissions on the new directory. For example, if the new warehouse directory is /data, set the permissions as follows:
```
$ hdfs dfs -chown hive:hive /data
$ hdfs dfs -chmod 771 /data
```
  Note that when you update the default Hive warehouse, previously created tables will not be moved over automatically. Therefore, tables created before the update will remain at /user/hive/warehouse/<old_table>. However, after the update, any new tables created in the default location will be found at /data/<new_table>.
  
  For Sentry/HDFS sync to work as expected, add the new warehouse URL to the list of Sentry Synchronization Path Prefixes.
Disable impersonation for HiveServer2 in the Cloudera Manager Admin Console:
1. Go to the Hive service.
2. Click the Configuration tab.
3. Select Scope > HiveServer2.
4. Select Category > Main.
5. Uncheck the HiveServer2 Enable Impersonation checkbox.
6. Click Save Changes to commit the changes.
If you are using MapReduce, enable the Hive user to submit MapReduce jobs.
1. Open the Cloudera Manager Admin Console and go to the MapReduce service.
2. Click the Configuration tab.
3. Select Scope > TaskTracker.
4. Select Category > Security.
5. Set the Minimum User ID for Job Submission property to zero (the default is 1000).
6. Click Save Changes to commit the changes.
7. Repeat steps 1-6 for every TaskTracker role group for the MapReduce service that is associated with Hive, if more than one exists.
8. Restart the MapReduce service.
If you are using YARN, enable the Hive user to submit YARN jobs.
1. Open the Cloudera Manager Admin Console and go to the YARN service.
2. Click the Configuration tab.
3. Select Scope > NodeManager.
4. Select Category > Security.
5. Ensure the Allowed System Users property includes the hive user. If not, add hive.
6. Click Save Changes to commit the changes.
7. Repeat steps 1-6 for every NodeManager role group for the YARN service that is associated with Hive, if more than one exists.
8. Restart the YARN service.
Block the external applications from accessing the Hive metastore:
1. In the Cloudera Manager Admin Console, select the Hive service.
2. On the Hive service page, click the Configuration tab.
3. In the search well on the right half of the Configuration page, search for Hive Metastore Access Control and Proxy User Groups Override to locate the hadoop.proxyuser.hive.groups parameter and click the plus sign.
4. Enter hive into the text box and click the plus sign again.
5. Enter hue into the text box.
6. Click Save Changes.
Setting this parameter blocks access to the Hive metastore for non-service users. This effectively disables Hive CLI, Spark, and Sqoop applications from interacting with the Hive service. These application will still run, but after setting this parameter as described here, they will no longer be able to access the Hive metastore and all Hive queries will fail. Users running these tools must be part of the hive or hue groups to access the Hive service. To allow greater access, additional user groups must be added to the proxy list.

Enabling the Sentry Service for Hive

Go to the Hive service.
Click the Configuration tab.
Select Scope > Hive (Service-Wide).
Select Category > Main.
Locate the Sentry Service property and select Sentry.
Click Save Changes to commit the changes.
Restart the Hive service.

Enabling Sentry on Hive service places several HiveServer2 properties on a restricted list properties that cannot be modified at runtime by clients. See HiveServer2 Restricted Properties.

Enabling the Sentry Service for Impala

Enable the Sentry service for Hive (as instructed above).
Go to the Impala service.
Click the Configuration tab.
Select Scope > Impala (Service-Wide).
Select Category > Main.
Locate the Sentry Service property and select Sentry.
Click Save Changes to commit the changes.
Restart Impala.

Enabling the Sentry Service for Hue

Hue uses a Security app to make it easier to interact with Sentry. When you set up Hue to manage Sentry permissions, make sure that users and groups are set up correctly. Every Hue user connecting to Sentry must have an equivalent OS-level user account on all hosts so that Sentry can authenticate Hue users. Each OS-level user should also be part of an OS-level group with the same name as the corresponding user's group in Hue.

For more information on using the Security app, see the related blog post.

Enable the Sentry service as follows:

Enable the Sentry service for Hive and Impala (as instructed above).
Go to the Hue service.
Click the Configuration tab.
Select Scope > Hue (Service-Wide).
Select Category > Main.
Locate the Sentry Service property and select Sentry.
Click Save Changes to commit the changes.
Restart Hue.

Add the Hive and Hue Groups to Sentry's Admin Groups

Go to the Sentry service.
Click the Configuration tab.
Select Scope > Sentry (Service-Wide).
Select Category > Main.
Locate the Admin Groups property and add the hive and hue groups to the list. If an end user is in one of these admin groups, that user has administrative privileges on the Sentry Server.
Click Save Changes to commit the changes.

Enabling the Sentry Service Using the Command Line

Before Enabling the Sentry Service
Configuring the Sentry Server
Configuring HiveServer2 for the Sentry Service
Configuring the Hive Metastore for the Sentry Service
Configuring Impala as a Client for the Sentry Service

Before Enabling the Sentry Service

Important: If you are going to enable HDFS/Sentry synchronization, you do not need to perform the following step to explicitly set permissions for the Hive warehouse directory. With synchronization enabled, all Hive databases and tables will automatically be owned by hive:hive, and Sentry permissions on tables are translated to HDFS ACLs for the underlying table files.
The Hive warehouse directory (/user/hive/warehouse or any path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
- Using the default Hive warehouse: Permissions on the warehouse directory must be set as follows (see following Note for caveats):
  - 771 on the directory itself (for example, /user/hive/warehouse)
  - 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
  - All files and subdirectories should be owned by hive:hive
  For example:
```
$ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
$ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
```
  If you have enabled Kerberos on your cluster, you must kinit as the hdfs user before you set permissions. For example:
```
sudo -u hdfs kinit -kt <hdfs.keytab> hdfs
sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
$ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
```
  Note:
  - If you set hive.warehouse.subdir.inherit.perms to true in hive-site.xml, the permissions on the subdirectories will be set when you set permissions on the warehouse directory itself.
  - If a user has access to any object in the warehouse, that user will be able to execute use default. This ensures that use default commands issued by legacy applications work when Sentry is enabled.
  - The instructions described above for modifying permissions on the Hive warehouse directory override the recommendations in the Hive section of the CDH 5 Installation Guide.
- Using a non-default Hive warehouse: If you would like to use a different directory as the Hive warehouse, update the hive.metastore.warehouse.dir property, and make sure you set the required permissions on the new directory. For example, if the new warehouse directory is /data, set the permissions as follows:
```
$ hdfs dfs -chown hive:hive /data
$ hdfs dfs -chmod 771 /data
```
  Note that when you update the default Hive warehouse, previously created tables will not be moved over automatically. Therefore, tables created before the update will remain at /user/hive/warehouse/<old_table>. However, after the update, any new tables created in the default location will be found at /data/<new_table>.
  
  For Sentry/HDFS sync to work as expected, add the new warehouse URL to the list of Sentry Synchronization Path Prefixes.
HiveServer2 impersonation must be turned off.
If you are using MapReduce, you must enable the Hive user to submit MapReduce jobs. You can ensure that this is true by setting the minimum user ID for job submission to 0. Edit the taskcontroller.cfg file and set min.user.id=0.
If you are using YARN, you must enable the Hive user to submit YARN jobs, add the user hive to the allowed.system.users configuration property. Edit the container-executor.cfg file and add hive to the allowed.system.users property. For example,
```
allowed.system.users = nobody,impala,hive
```
Important: You must restart the cluster and HiveServer2 after changing these values.
Block the Hive CLI user from accessing the Hive metastore by setting the following property in the cluster's core-site.xml file:
```
<property>
     <name>hadoop.proxyuser.hive.groups</name>
     <value>hive,hue</value>
     <description>Sets groups from which the hive user can impersonate other users.</description>
</property>
            
```
Setting this parameter blocks access to the Hive metastore for the user running the Hive CLI if they are not part of the hive or the hue groups. The Hive CLI can still run, but after setting this parameter as described here, the hive user can impersonate only members of the hive or the hue groups. If you are using Sqoop, the Sqoop user must also have access to the Hive metastore.
Add the hive, impala and hue groups to Sentry's sentry.service.admin.group in the sentry-site.xml file. If an end user is in one of these admin groups, that user has administrative privileges on the Sentry Server.
```
<property>
  <name>sentry.service.admin.group</name>
  <value>hive,impala,hue</value>
 </property>
```

Configuring the Sentry Server

Configure the following properties in sentry-site.xml on the Sentry Server host.

<property>
    <name>sentry.verify.schema.version</name>
    <value> </value>
    <description>
    value: true, false
    true Sentry store will verify the schema version in backed DB with expected version in jar.
    The service won't start if there's a mismatch
    </description>
  </property>

  <property>
    <name>sentry.service.server-max-threads</name>
    <value> </value>
    <description> Number of threads 500 Max worker threads to serve client requests</description>
  </property>

  <property>
    <name>sentry.service.server-min-threads</name>
    <value> </value>
    <description>Number of threads 10 Min worker threads to serve client requests</description>
  </property>

  <property>
    <name>sentry.service.allow.connect</name>
    <value> </value>
    <description>comma separated list of users - List of users that are allowed to connect to the service (eg Hive, Impala) </description>
  </property>

  <property>
    <name>sentry.store.jdbc.url</name>
    <value> </value>
    <description>JDBC connection URL for the backed DB</description>
  </property>

  <property>
    <name>sentry.store.jdbc.user</name>
    <value>sentry</value>
    <description>Userid for connecting to backend db </description>
  </property>

  <property>
    <name>sentry.store.jdbc.password</name>
    <value>Sentry</value>
    <description>Sentry password for backend JDBC user </description>
  </property>

  <property>
    <name>sentry.service.server.keytab</name>
    <value></value>
    <description>Keytab for service principal</description>
  </property>

  <property>
    <name>sentry.service.server.rpcport</name>
    <value>8038</value>
    <description> TCP port number for service</description>
  </property>

  <property>
    <name>sentry.service.server.rpcaddress</name>
    <value>0.0.0.0</value>
    <description> TCP interface for service to bind to</description>
  </property>

  <property>
    <name>sentry.store.jdbc.driver</name>
    <value>org.apache.derby.jdbc.EmbeddedDriver</value>
    <description>Backend JDBC driver - org.apache.derby.jdbc.EmbeddedDriver (only when dbtype = derby) JDBC Driver class for the backed DB</description>
  </property>

  <property>
    <name>sentry.service.admin.group</name>
    <value> </value>
    <description>Comma separates list of groups.  List of groups allowed to make policy updates</description>
  </property>

  <property>
    <name>sentry.store.group.mapping</name>
    <value>org.apache.sentry.provider.common.HadoopGroupMappingService</value>
    <description>

Group mapping class for Sentry service. org.apache.sentry.provider.file.LocalGroupMapping service can be used for local group mapping. </description>
  </property>

  <property>
    <name>sentry.store.group.mapping.resource</name>
    <value> </value>
    <description> Policy file for group mapping. Policy file path for local group mapping, when sentry.store.group.mapping is set to LocalGroupMapping Service class.</description>
  </property>

  <property>
    <name>sentry.service.security.mode</name>
    <value>kerberos</value>
    <description>Options: kerberos, none.  Authentication mode for Sentry service. Currently supports Kerberos and trusted mode </description>
  </property>

  <property>
    <name>sentry.service.server.principal</name>
    <value> </value>
    <description>Service Kerberos principal</description>
  </property>

Configuring HiveServer2 for the Sentry Service

Configure the following properties in sentry-site.xml on the HiveServer2 host.

<property>
    <name>hive.sentry.server</name>
    <value>server1</value>
</property>
<property>
    <name>sentry.service.server.principal</name>
    <value>sentry/_HOST@EXAMPLE.COM</value>
</property>
<property>
    <name>sentry.service.security.mode</name>
    <value>kerberos</value>
</property>
<property>
    <name>sentry.hive.provider.backend</name>
    <value>org.apache.sentry.provider.db.SimpleDBProviderBackend</value>
</property>
<property>
    <name>sentry.service.client.server.rpc-address</name>
    <value>example.cloudera.com</value>
</property>
<property>
    <name>sentry.service.client.server.rpc-port</name>
    <value>8038</value>
</property>
<property>
    <name>hive.sentry.provider</name>
    <value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>
</property>
<property>
    <name>hive.sentry.failure.hooks</name>
    <value>com.cloudera.navigator.audit.hive.HiveSentryOnFailureHook</value>
</property>

Add the following properties to hive-site.xml to allow the Hive service to communicate with the Sentry service.

<property>
   <name>hive.security.authorization.task.factory</name>
   <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
</property>
<property>
   <name>hive.server2.session.hook</name>
   <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
</property>
<property>
   <name>hive.sentry.conf.url</name>
   <value>file:///{{PATH/TO/DIR}}/sentry-site.xml</value>
</property>

Enabling Sentry on Hive service places several HiveServer2 properties on a restricted list properties that cannot be modified at runtime by clients. See HiveServer2 Restricted Properties.

Configuring the Hive Metastore for the Sentry Service

Add the following properties to hive-site.xml to allow the Hive metastore to communicate with the Sentry service.

<property>
<name>hive.metastore.filter.hook</name>
<value>org.apache.sentry.binding.metastore.SentryMetaStoreFilterHook</value>
</property>

<property>  
    <name>hive.metastore.pre.event.listeners</name>  
    <value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value>  
    <description>list of comma separated listeners for metastore events.</description>
</property>

<property>
    <name>hive.metastore.event.listeners</name>  
    <value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value>  
    <description>list of comma separated listeners for metastore, post events.</description>
</property>

Configuring Impala as a Client for the Sentry Service

Set the following configuration properties in ../impala-conf/sentry-site.xml on the Catalog Server.

<property>
   <name>sentry.service.client.server.rpc-port</name>
   <value>8038</value>
</property>
<property>
   <name>sentry.service.client.server.rpc-address</name>
   <value>hostname</value>
</property>
<property>
   <name>sentry.service.client.server.rpc-connection-timeout</name>
   <value>200000</value>
</property>
<property>
   <name>sentry.service.security.mode</name>
   <value>kerberos</value>
</property>

You must also add the following configuration properties to Impala's /etc/default/impala file. For more information , see Configuring Impala Startup Options through the Command Line.

On the catalogd and the impalad.

--sentry_config=<absolute path to sentry service configuration file>

On the impalad.
```
--server_name=<server name>
```
If the --authorization_policy_file flag is set, Impala will use the policy file-based approach. Otherwise, the database-backed approach will be used to implement authorization.

HiveServer2 Restricted Properties

Enabling Sentry on Hive service places several HiveServer2 properties on a restricted list properties that cannot be modified at runtime by clients. This list is denoted by the hive.conf.restricted.list property and these properties are only configurable on the server side. The list includes:

hive.enable.spark.execution.engine
hive.semantic.analyzer.hook
hive.exec.pre.hooks
hive.exec.scratchdir
hive.exec.local.scratchdir
hive.metastore.uris,
javax.jdo.option.ConnectionURL
hadoop.bin.path
hive.session.id
hive.aux.jars.path
hive.stats.dbconnectionstring
hive.scratch.dir.permission
hive.security.command.whitelist
hive.security.authorization.task.factory
hive.entity.capture.transform
hive.access.conf.url
hive.sentry.conf.url
hive.access.subject.name
hive.sentry.subject.name
hive.sentry.active.role.set

Configuring Pig and HCatalog for the Sentry Service

Once you have the Sentry service up and running, and Hive has been configured to use the Sentry service, there are some configuration changes you must make to your cluster to allow Pig, MapReduce (using HCatLoader, HCatStorer) and WebHCat queries to access Sentry-secured data stored in Hive.

Since the Hive warehouse directory is owned by hive:hive, with its permissions set to 771, with these settings, other user requests such as commands coming through Pig jobs, WebHCat queries, and MapReduce jobs, may fail. To give these users access, perform the following configuration changes:

Use HDFS ACLs to define permissions on a specific directory or file of HDFS. This directory/file is generally mapped to a database, table, partition, or a data file.
Users running these jobs should have the required permissions in Sentry to add new metadata or read metadata from the Hive Metastore Server. For instructions on how to set up the required permissions, see Hive SQL Syntax for Use with Sentry. You can use HiveServer2's command line interface, Beeline to update the Sentry database with the user privileges.

Examples:

A user who is using Pig HCatLoader will require read permissions on a specific table or partition. In such a case, you can GRANT read access to the user in Sentry and set the ACL to read and execute, on the file being accessed.
A user who is using Pig HCatStorer will require ALL permissions on a specific table. In this case, you GRANT ALL access to the user in Sentry and set the ACL to write and execute, on the table being used.

Securing the Hive Metastore

It's important that the Hive metastore be secured. If you want to override the Kerberos prerequisite for the Hive metastore, set the sentry.hive.testing.mode property to true to allow Sentry to work with weaker authentication mechanisms. Add the following property to the HiveServer2 and Hive metastore's sentry-site.xml:

<property>
  <name>sentry.hive.testing.mode</name>
  <value>true</value>
</property>

Impala does not require this flag to be set.

You can also set the property in Cloudera Manager. Go to the Hive service and open the Configuration tab. Search for the Hive Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml. Click the plus sign (+) to add a new property with the following values:

Name: sentry.hive.testing.mode
Value: true

You canturn on Hive metastore security using the instructions in Cloudera Security. To secure the Hive metastore; see Hive Metastore Server Security Configuration.

Using User-Defined Functions with HiveServer2

The ADD JAR command does not work with HiveServer2 and the Beeline client when Beeline runs on a different host. As an alternative to ADD JAR, Hive's auxiliary paths functionality should be used. There are some differences in the procedures for creating permanent functions and temporary functions when Sentry is enabled. For detailed instructions, see:

Migrating from Sentry Policy Files to the Sentry Service

Sentry Debugging and Failure Scenarios