This is the documentation for Cloudera Manager 5.1.x.
Documentation for other versions is available at Cloudera Documentation.

Sentry for Policy File-Based Hive Authorization

Required Role:

  Important: This is the documentation for configuring Hive authorization using Sentry policy files. Cloudera recommends you use the database-backed Sentry service introduced in CDH 5.1 to secure your data. See The Sentry Service for more information.

Sentry's policy file based authorization is not set up automatically by Cloudera Manager installation or upgrade wizards. If you enable Sentry authorization without using the Sentry service, you should enable it on all services, not just for Hive. If you don't enable on all services, all access won't be policed by Sentry, and users can bypass your security policies by using Impala instead of Hive.

The following section describes how you can use Sentry's policy file based approach to secure your data and does not require adding the Sentry service to your cluster. If you attempt to configure the Sentry service while still using the policy file approach, Cloudera Manager will throw a validation error.

Sentry policy files enable role-based, fine-grained authorization for HiveServer2 and the Hive Metastore. They provide classic database-style authorization for Hive and Cloudera Impala. For detailed information about Sentry, see the CDH 4 Sentry Guide or CDH 5 Sentry Guide.

When using Sentry, you must use HiveServer2 or Impala to access Hive tables. You can also use Hue Beeswax if Beeswax is configured to use HiveServer2. You cannot use the Hive CLI or WebHCat with Sentry.

Continue reading:

Prerequisites

The requirements for using Sentry's policy file approach for Hive authorization are:

  • CDH 4.3.0 or later. Impala 1.2.1 or later. Auditing of authentication failures is supported only with CDH 4.4.0 and Impala 1.2.1 or later.
  • HiveServer2 running with strong authentication (Kerberos or LDAP).
  • A secure Hadoop cluster.
  • The Hive warehouse directory (/user/hive/warehouse or the path you have specified as hive.metastore.warehouse.dir in your Hive configuration) must be owned by the hive user and group.
  • Permissions on the Hive warehouse directory and all subdirectories must be 771. All files and directories should be owned by hive:hive. For example:
    $ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
    $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse

Installing Sentry

Sentry is included with CDH 4.4.0 or later. To use Sentry policy files with CDH 4.3.0, install Sentry manually as follows:
  1. Do one of the following:
      1. Select Administration > Settings.
      2. Click Parcels.
      1. In the top navigation bar, click .
      2. Click Edit Settings.
  2. In the Remote Parcel Repository URLs property, click the and type http://archive.cloudera.com/sentry/parcels/latest/ in the text box.
  3. Click Save Changes.
  4. In the top navigation bar, click . The Sentry parcel should be in the Downloadable section of the Parcels page.
  5. Download, distribute, and activate the parcel. See Parcels for details about parcels.
If you have upgraded to CDH 4.4.0 from CDH 4.3.0 and had installed the separate Sentry parcel with CDH 4.3.0, you must remove the stand-alone parcel:
  1. Do one of the following:
      1. Select Administration > Settings.
      2. Click Parcels.
      1. In the top navigation bar, click .
      2. Click Edit Settings.
  2. Deactivate, remove, and delete the parcel.

Sentry Policy Files

See the Policy File section in the CDH 4 Sentry Guide or CDH 5 Sentry Guide for detailed information. The file must be owned by the hive user in the hive group with perms=640.

  Note: Using Sentry you can assign privileges only to groups, not individual users.
The following is an example of a simple policy file:
[groups]
# Assigns each Hadoop group to its set of roles. For example, the first group, manager, has been assigned the default_admin role. 
manager=default_admin
analyst=sample_reader
admin=admin_role
[roles]
# can read both sample tables
sample_reader = server=server1->db=default->table=sample_07->action=select, \
server=server1->db=default->table=sample_08->action=select
# implies everything on server1, default db
default_admin = server=server1->db=default
# implies everything on server1
admin_role = server=server1
By default Cloudera Manager assumes the policy file is in the HDFS location /user/hive/sentry. To configure the location:
  1. Go to the Hive service.
  2. Click the Configuration tab.
  3. Under the Service-Wide category, select Sentry and modify the path in the Sentry Global Policy File property.
  4. Click Save Changes.
To enable URI support in Sentry's per-db policy files:
  1. Go to the Hive service.
  2. Click the Configuration tab.
  3. Under the Service-Wide category, go to the Sentry section and check the Allow URIs in Database Policy File property.
  4. Click Save Changes.
  5. Restart the Hive service.
      Important: Enabling URIs in per-DB policy files introduces a security risk by allowing the owner of the db-level policy file to grant himself/herself load privileges to anything the hive user has read permissions for in HDFS (including data in other databases controlled by different db-level policy files).

Enabling Sentry Authorization using Policy Files

  1. Ensure the requirements are satisfied.
  2. (CDH 4.3) Install Sentry if it has not been installed.
  3. Disable impersonation for HiveServer2 in the Cloudera Manager Admin Console:
    1. Go to the Hive service.
    2. Click the Configuration tab.
    3. Under the HiveServer2 role group, uncheck the HiveServer2 Enable Impersonation property, and click Save Changes.
  4. Create the Sentry policy file sentry-provider.ini as an HDFS file.
  5. Make sure the Hive warehouse directory ownership and permissions are as described in Prerequisites.
  6. To enable the Hive user to submit MapReduce jobs, under TaskTracker role group(s) set the Minimum User ID for Job Submission to 0. You must do this for every TaskTracker role group for the MapReduce service that is associated with Hive, if more than one exists.
    1. Go to the MapReduce service.
    2. Click the Configuration tab.
    3. Under a TaskTracker role group go to the Security category.
    4. Set the Minimum User ID for Job Submission property to zero (the default is 1000) and click Save Changes.
    5. Restart the MapReduce service.
  7. To enable the Hive user to submit YARN jobs, ensure the Allowed System Users property includes the hive user. You must do this for every NodeManager role group for the YARN service that is associated with Hive, if more than one exists.
    1. Go to the YARN service.
    2. Click the Configuration tab.
    3. Under a NodeManager role group go to the Security category.
    4. Ensure the Allowed System Users property includes the hive user. If not, add hive and click Save Changes.
    5. Restart the YARN service.
  8. Go to your Hive service.
  9. Click the Configuration tab.
  10. Under the Service-Wide category, go to the Policy File Based Sentry section.
  11. Check Enable Sentry Authorization Using Policy Files, then click Save Changes.
  12. Restart the Hive service.

Configuring Sentry to Enable BDR Replication

Cloudera recommends the following steps when configuring Sentry and data replication is enabled.
  • Group membership should be managed outside of Sentry (as typically OS groups, LDAP groups, and so on are managed) and replication for them also should be handled outside of Cloudera Manager.
  • In Cloudera Manager, set up HDFS replication for the Sentry files of the databases that are being replicated (separately via Hive replication).
  • On the source cluster:
    • Use a separate Sentry policy file for every database
    • Avoid placing any group or role info (except for server admin info) in the global Sentry policy file (to avoid manual replication/merging with the global file on the target cluster)
    • In order to avoid manual fix up of URI privileges, ensure that the URIs for the data are the same on both the source and target cluster
  • On the target cluster:
    • In the global Sentry policy file, manually add the DB name - DB file mapping entries for the databases being replicated
    • Manually copy the server admin info from the global Sentry policy file on the source to the policy on the target cluster
    • For the databases being replicated, avoid adding more privileges (adding tables specific to target cluster may sometimes require adding extra privileges to allow access to those tables). If any target cluster specific privileges absolutely need to be added for a database, add them to the global Sentry policy file on the target cluster since the per database files would be overwritten periodically with source versions during scheduled replication.
For policy file examples, see Sentry Policy Files and the links therein.

Configuring Group Access to the Hive Metastore

You can configure the Hive Metastore to reject connections from users not listed in the Hive group proxy list. If you don't configure this override, the Hive Metastore will use the value in the core-site HDFS configuration. To configure the Hive group proxy list:
  1. Go to the Hive service.
  2. Click the Configuration tab.
  3. Click the Proxy category.
  4. In the Hive Metastore Access Control and Proxy User Groups Override property, specify a list of groups whose users are allowed to access the Hive Metastore. If you do not specify "*" (wildcard), you will be warned if the groups do not include hive and impala (if the Impala service is configured) in the list of groups.
  5. Click Save Changes.
  6. Restart the Hive service.

Configuring Sentry to Use LDAP User-to-Group Mappings

You can configure Sentry to use either Hadoop groups or groups defined in the policy file. By default, Sentry looks up groups locally, but it can be configured to look up Hadoop groups using LDAP (for Active Directory). Local groups will be looked up on the host Sentry runs on. For Hive, this will be the host running HiveServer2. Group mappings in Sentry can be summarized as in the figure below:

  Important: You can use either Hadoop groups or local groups, but not both at the same time. Use local groups if you want to do a quick proof-of-concept. For production, use Hadoop groups. Refer Configuring LDAP Group Mappings for details on configuring LDAP group mappings in Hadoop.
  1. Go to the Hive service.
  2. Click the Configuration tab.
  3. Under the Service-Wide category, go to the Sentry section.
  4. Set the Sentry User to Group Mapping Class property to org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider.
  5. Click Save Changes.
  6. Restart the Hive service.