Configuring the Sentry Service

Enabling the Sentry Service Using Cloudera Manager

Before Enabling the Sentry Service

  • Ensure you satisfy all the Prerequisites for the Sentry service.
  • The Hive warehouse directory (/user/hive/warehouse or any path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
    • Permissions on the warehouse directory must be set as follows (see following Note for caveats):
      • 771 on the directory itself (for example, /user/hive/warehouse)
      • 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
      • All files and subdirectories should be owned by hive:hive
      For example:
      $ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
      $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
  • Disable impersonation for HiveServer2 in the Cloudera Manager Admin Console:
    1. Go to the Hive service.
    2. Click the Configuration tab.
    3. Select Scope > HiveServer2.
    4. Select Category > Main.
    5. Uncheck the HiveServer2 Enable Impersonation checkbox.
    6. Click Save Changes to commit the changes.
  • If you are using MapReduce, enable the Hive user to submit MapReduce jobs.
    1. Open the Cloudera Manager Admin Console and go to the MapReduce service.
    2. Click the Configuration tab.
    3. Select Scope > TaskTracker.
    4. Select Category > Security.
    5. Set the Minimum User ID for Job Submission property to zero (the default is 1000).
    6. Click Save Changes to commit the changes.
    7. Repeat steps 1-6 for every TaskTracker role group for the MapReduce service that is associated with Hive, if more than one exists.
    8. Restart the MapReduce service.
  • If you are using YARN, enable the Hive user to submit YARN jobs.
    1. Open the Cloudera Manager Admin Console and go to the YARN service.
    2. Click the Configuration tab.
    3. Select Scope > NodeManager.
    4. Select Category > Security.
    5. Ensure the Allowed System Users property includes the hive user. If not, add hive.
    6. Click Save Changes to commit the changes.
    7. Repeat steps 1-6 for every NodeManager role group for the YARN service that is associated with Hive, if more than one exists.
    8. Restart the YARN service.

Enabling the Sentry Service for Hive

  1. Go to the Hive service.
  2. Click the Configuration tab.
  3. Select Scope > Hive (Service-Wide).
  4. Select Category > Main.
  5. Locate the Sentry Service property and select Sentry.
  6. Click Save Changes to commit the changes.
  7. Restart the Hive service.
Enabling Sentry on Hive service places several HiveServer2 properties on a restricted list properties that cannot be modified at runtime by clients. See HiveServer2 Restricted Properties.

Enabling the Sentry Service for Impala

  1. Enable the Sentry service for Hive (as instructed above).
  2. Go to the Impala service.
  3. Click the Configuration tab.
  4. Select Scope > Impala (Service-Wide).
  5. Select Category > Main.
  6. Locate the Sentry Service property and select Sentry.
  7. Click Save Changes to commit the changes.
  8. Restart Impala.

Enabling the Sentry Service for Hue

To interact with Sentry using Hue, enable the Sentry service as follows:
  1. Enable the Sentry service for Hive and Impala (as instructed above).
  2. Go to the Hue service.
  3. Click the Configuration tab.
  4. Select Scope > Hue (Service-Wide).
  5. Select Category > Main.
  6. Locate the Sentry Service property and select Sentry.
  7. Click Save Changes to commit the changes.
  8. Restart Hue.

Enabling the Sentry Service Using the Command Line

Before Enabling the Sentry Service

  • The Hive warehouse directory (/user/hive/warehouse or any path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
    • Permissions on the warehouse directory must be set as follows (see following Note for caveats):
      • 771 on the directory itself (for example, /user/hive/warehouse)
      • 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
      • All files and subdirectories should be owned by hive:hive
      For example:
      $ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
      $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
  • HiveServer2 impersonation must be turned off.
  • If you are using MapReduce, you must enable the Hive user to submit MapReduce jobs. You can ensure that this is true by setting the minimum user ID for job submission to 0. Edit the taskcontroller.cfg file and set min.user.id=0.
    If you are using YARN, you must enable the Hive user to submit YARN jobs, add the user hive to the allowed.system.users configuration property. Edit the container-executor.cfg file and add hive to the allowed.system.users property. For example,
    allowed.system.users = nobody,impala,hive

Configuring HiveServer2 for the Sentry Service

Add the following properties to hive-site.xml to allow the Hive service to communicate with the Sentry service.
<property>
   <name>hive.security.authorization.task.factory</name>
   <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
</property>
<property>
   <name>hive.server2.session.hook</name>
   <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
</property>
<property>
   <name>hive.sentry.conf.url</name>
   <value>file:///{{PATH/TO/DIR}}/sentry-site.xml</value>
</property>
Enabling Sentry on Hive service places several HiveServer2 properties on a restricted list properties that cannot be modified at runtime by clients. See HiveServer2 Restricted Properties.

Configuring the Hive Metastore for the Sentry Service

Add the following properties to hive-site.xml to allow the Hive metastore to communicate with the Sentry service.
<property>
<name>hive.metastore.filter.hook</name>
<value>org.apache.sentry.binding.metastore.SentryMetaStoreFilterHook</value>
</property>

<property>  
    <name>hive.metastore.pre.event.listeners</name>  
    <value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value>  
    <description>list of comma separated listeners for metastore events.</description>
</property>

<property>
    <name>hive.metastore.event.listeners</name>  
    <value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value>  
    <description>list of comma separated listeners for metastore, post events.</description>
</property>

Configuring Impala as a Client for the Sentry Service

Set the following configuration properties in sentry-site.xml.
<property>
   <name>sentry.service.client.server.rpc-port</name>
   <value>3893</value>
</property>
<property>
   <name>sentry.service.client.server.rpc-address</name>
   <value>hostname</value>
</property>
<property>
   <name>sentry.service.client.server.rpc-connection-timeout</name>
   <value>200000</value>
</property>
<property>
   <name>sentry.service.security.mode</name>
   <value>none</value>
</property>
You must also add the following configuration properties to Impala's /etc/default/impala file. For more information , see Configuring Impala Startup Options through the Command Line.
  • On the catalogd and the impalad.
    --sentry_config=<absolute path to sentry service configuration file>
  • On the impalad.
    --server_name=<server name>
    If the --authorization_policy_file flag is set, Impala will use the policy file-based approach. Otherwise, the database-backed approach will be used to implement authorization.

HiveServer2 Restricted Properties

Enabling Sentry on Hive service places several HiveServer2 properties on a restricted list properties that cannot be modified at runtime by clients. This list is denoted by the hive.conf.restricted.list property and these properties are only configurable on the server side. The list includes:
hive.enable.spark.execution.engine
hive.semantic.analyzer.hook
hive.exec.pre.hooks
hive.exec.scratchdir
hive.exec.local.scratchdir
hive.metastore.uris,
javax.jdo.option.ConnectionURL
hadoop.bin.path
hive.session.id
hive.aux.jars.path
hive.stats.dbconnectionstring
hive.scratch.dir.permission
hive.security.command.whitelist
hive.security.authorization.task.factory
hive.entity.capture.transform
hive.access.conf.url
hive.sentry.conf.url
hive.access.subject.name
hive.sentry.subject.name
hive.sentry.active.role.set

Configuring Pig and HCatalog for the Sentry Service

Once you have the Sentry service up and running, and Hive has been configured to use the Sentry service, there are some configuration changes you must make to your cluster to allow Pig, MapReduce (using HCatLoader, HCatStorer) and WebHCat queries to access Sentry-secured data stored in Hive.

Since the Hive warehouse directory is owned by hive:hive, with its permissions set to 771, with these settings, other user requests such as commands coming through Pig jobs, WebHCat queries, and MapReduce jobs, may fail. To give these users access, perform the following configuration changes:
  • Use HDFS ACLs to define permissions on a specific directory or file of HDFS. This directory/file is generally mapped to a database, table, partition, or a data file.
  • Users running these jobs should have the required permissions in Sentry to add new metadata or read metadata from the Hive Metastore Server. For instructions on how to set up the required permissions, see Hive SQL Syntax for Use with Sentry. You can use HiveServer2's command line interface, Beeline to update the Sentry database with the user privileges.
Examples:
  • A user who is using Pig HCatLoader will require read permissions on a specific table or partition. In such a case, you can GRANT read access to the user in Sentry and set the ACL to read and execute, on the file being accessed.
  • A user who is using Pig HCatStorer will require ALL permissions on a specific table. In this case, you GRANT ALL access to the user in Sentry and set the ACL to write and execute, on the table being used.

Securing the Hive Metastore

It's important that the Hive metastore be secured. If you want to override the Kerberos prerequisite for the Hive metastore, set the sentry.hive.testing.mode property to true to allow Sentry to work with weaker authentication mechanisms. Add the following property to the HiveServer2 and Hive metastore's sentry-site.xml:
<property>
  <name>sentry.hive.testing.mode</name>
  <value>true</value>
</property>
Impala does not require this flag to be set.
You can also set the property in Cloudera Manager. Go to the Hive service and open the Configuration tab. Search for the Hive Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml. Click the plus sign (+) to add a new property with the following values:
  • Name: sentry.hive.testing.mode
  • Value: true

You canturn on Hive metastore security using the instructions in Cloudera Security. To secure the Hive metastore; see Hive Metastore Server Security Configuration.

Using User-Defined Functions with HiveServer2

The ADD JAR command does not work with HiveServer2 and the Beeline client when Beeline runs on a different host. As an alternative to ADD JAR, Hive's auxiliary paths functionality should be used. There are some differences in the procedures for creating permanent functions and temporary functions when Sentry is enabled. For detailed instructions, see: .