This is the documentation for CDH 4.5.0.
Documentation for other versions is available at Cloudera Documentation.

Configuring Sentry

Sentry enables role-based, fine-grained authorization for HiveServer2 and Cloudera Impala. It provides classic database-style authorization for Hive and Impala. Follow the instructions below to install and configure Sentry manually under the current CDH release.

  Note:

Prerequisites

Sentry depends on an underlying authentication framework to reliably identify the requesting user. It requires:

  • CDH4.3.0 or later.
  • HiveServer2 with strong authentication (Kerberos or LDAP).
  • A secure Hadoop cluster.

    This is to prevent a user bypassing the authorization and gaining direct access to the underlying data.

In addition, make sure that the following are true:

  • The Hive warehouse directory (/user/hive/warehouse or any path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
    • Permissions on the warehouse directory must be set as follows:
      • 770 on the directory itself (for example, /user/hive/warehouse)
      • 770 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
      For example:
      $ sudo -u hdfs hdfs dfs -chmod -R 770 /user/hive/warehouse/*
        Note: If you set hive.warehouse.subdir.inherit.perms to true in hive-site.xml, the permissions on the subdirectories will be set when you set permissions on the warehouse directory itself.
        Important: These instructions override the recommendations in the Hive section of the CDH4 Installation Guide.
  • HiveServer2 impersonation must be turned off.
  • The Hive user must be able to submit MapReduce jobs. You can ensure that this is true by setting the minimum user ID for job submission to 0. Set this value in Cloudera Manager under MapReduce Properties, or (if you are not using Cloudera Manager) edit the taskcontroller.cfg file and set min.user.id=0.
      Important:
    • You must restart the cluster and HiveServer2 after changing this value, whether you use Cloudera Manager or edit taskcontroller.cfg.
    • These instructions override the instructions under "Configuring MRv1 Security" in the CDH4 Security Guide.

Roles and Privileges

Sentry uses a role-based privilege model. A role is a collection of rules for accessing a given Hive object. The objects supported in the current release are server, database, table, and URI. Access to each object is governed by privileges: Select, Insert, or All.

  Note: All is not supported explicitly in the table scope; you have to specify Select and Insert explicitly.
For example, a rule for the Select privilege on table customers from database sales would be formulated as follows:
server=server1->db=sales->table=customer->action=Select
Each object must be specified as a hierarchy of the containing objects, from server to table, followed by the privilege granted for that object. A role can contain multiple such rules, separated by commas. For example a role might contain the Select privilege for the customer and items tables in the sales database, and the Insert privilege for the sales_insights table in the reports database. You would specify this as follows:
sales_reporting =
\server=server1->db=sales->table=customer->action=Select, 
\server=server1->db=sales->table=items>action=Select, 
\server=server1->db=reports->table=sales_insights>action=Insert

Privilege Model

The Sentry privilege model has the following characteristics:

  • Allows any user to execute show function, desc function, and show locks
  • Allows the user to see only those tables and databases for which this user has privileges
  • Requires a user to have the necessary privileges on the URI to execute HiveQL operations that take in a location. Examples of such operations include LOAD, IMPORT, and EXPORT.

For more information, see Appendix: Authorization Privilege Model for Hive.

Users and Groups

  • A user is an entity that is permitted by the authentication subsystem to access the Hive service. This entity can be a Kerberos principal, an LDAP userid, or an artifact of some other pluggable authentication system supported by HiveServer2.
  • A group connects the authentication system with the authorization system. It is a collection of one or more users who have been granted one or more authorization roles. Sentry allows a set of roles to be configured for a group.
  • A configured group provider determines a user’s affiliation with a group. The current release supports HDFS-backed groups and locally configured groups. For example,
    analyst = sales_reporting, data_export, audit_report

Here the group analyst is granted the roles sales_reporting, data_export, and audit_report. The members of this group can run the HiveQL statements that are allowed by these roles. If this is an HDFS-backed group, then all the users belonging to the HDFS group analyst can run such queries.

User to Group Mapping

You can configure Sentry to use either Hadoop groups or groups defined in the policy file.

  Important: You can use either Hadoop groups or local groups, but not both at the same time. Use local groups if you want to do a quick proof-of-concept. For production, use Hadoop groups.

To configure Hadoop groups:

Set the hive.sentry.provider property in sentry-site.xml to org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider.

OR

To configure local groups:

  1. Define local groups in a [users] section of the Policy file. For example:
    [users]
    user1 = group1, group2, group3
    user2 = group2, group3
  2. In sentry-site.xml, set hive.sentry.provider as follows:
    <property>
        <name>hive.sentry.provider</name>
        <value>org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider</value>
      </property>
    

Setup and Configuration

This release of Sentry stores the configuration as well as privilege policies in files. The sentry-site.xml file contains configuration options such as group association provider, privilege policy file location, and so on. The Policy file contains the privileges and groups. It has a .ini file format and can be stored on a local file system or HDFS.

Sentry is plugged into Hive as session hooks which you configure in hive-site.xml. The sentry package must be installed; it contains the required JAR files. You must also configure properties in the Sentry Configuration File.

Installing Sentry

  Important:

If you have not already done so, install Cloudera's yum, zypper/YaST or apt repository before using the following commands. For instructions, see CDH4 Installation.

  1. Install Sentry as follows, depending on your operating system:
    • On Red Hat and similar systems:
      $ sudo yum install sentry
    • On SLES systems:
      $ sudo zypper install sentry
    • On Ubuntu and Debian systems:
      sudo apt-get update; sudo apt-get install sentry

Policy file

The sections that follow contain notes on creating and maintaining the policy file.

  Warning: An invalid configuration will disable all authorization while logging an exception. (But note that, as of CDH4.4, if only the per-DB policy file is invalid, it will invalidate only the policies in that file.)

Storing the Policy File

Considerations for storing the policy file(s) in HDFS include:

  1. Replication count - Because the file is read for each query, you should increase this; 10 is a reasonable value.
  2. Updating the file - Updates to the file are reflected immediately, so you should write them to a temporary copy of the file first, and then replace the existing file with the temporary one after all the updates are complete. This avoids race conditions caused by reads on an incomplete file.

Defining Roles

Keep in mind that role definitions are not cumulative; the newer definition replaces the older one. For example, the following results in role1 having privilege2, not privilege1 and privilege2.
role1 = privilege1
role1 = privilege2

URIs

URIs must start with either hdfs:// or file://. If a URI starts with anything else, it will cause an exception and the policy file will be invalid.

When defining URIs for HDFS, you must also specify the NameNode. For example:
data_read = server=server1->uri=file:///path/to/dir,\
server=server1->uri=hdfs://namenode:port/path/to/dir
  Important: Because the NameNode host and port must be specified, Cloudera strongly recommends you use High Availability (HA). This ensures that the URI will remain constant even if the NameNode changes.

Sample Configuration

This section provides a sample configuration.

Policy Files

The following is an example of a policy file with a per-DB policy file. In this example, the first policy file, sentry-provider.ini would exist in HDFS; hdfs://ha-nn-uri/etc/sentry/sentry-provider.ini might be an appropriate location. The per-DB policy file is for the customer's database. It is located at hdfs://ha-nn-uri/etc/sentry/customers.ini.

sentry-provider.ini
[databases]
# Defines the location of the per DB policy file for the customers DB/schema customers = hdfs://ha-nn-uri/etc/sentry/customers.ini 

[groups]
# Assigns each Hadoop group to its set of roles 
manager = analyst_role, junior_analyst_role
analyst = analyst_role
jranalyst = junior_analyst_role
customers_admin = customers_admin_role
admin = admin_role 

[roles]
# The uris below define a define a landing skid which
# the user can use to import or export data from the system.
# Since the server runs as the user "hive" files in that directory
# must either have the group hive and read/write set or
# be world read/write.
analyst_role = server=server1->db=analyst1, \
    server=server1->db=jranalyst1->table=*->action=select    
    server=server1->uri=hdfs://ha-nn-uri/landing/analyst1
junior_analyst_role = server=server1->db=jranalyst1, \
    server=server1->uri=hdfs://ha-nn-uri/landing/jranalyst1 

# Implies everything on server1 -> customers. Privileges for
# customers can be defined in the global policy file even though 
# customers has its only policy file. Note that the Privileges from
# both the global policy file and the per-DB policy file
# are merged. There is no overriding.
customers_admin_role = server=server1->db=customers 

# Implies everything on server1.
admin_role = server=server1
customers.ini
[groups]
manager = customers_insert_role, customers_select_role
analyst = customers_select_role 

[roles]
customers_insert_role = server=server1->db=customers->table=*->action=insert
customers_select_role = server=server1->db=customers->table=*->action=select

Sentry Configuration File

The following is an example of a sentry-site.xml file.

  Important: If you are using Cloudera Manager, make sure you do not store sentry-site.xml in /etc/hive/conf; that directory is regenerated whenever the Hive client configurations are redeployed. Instead, use a directory such as /etc/sentry to store the sentry file.

sentry-site.xml

<configuration>
  <property>
    <name>hive.sentry.provider</name>
    <value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>
  </property>

  <property>
    <name>hive.sentry.provider.resource</name>
    <value>/path/to/authz-provider.ini</value>
    <!-- 
       If the hdfs-site.xml points to HDFS, the path will be in HDFS;
       alternatively you could specify a full path, e.g.:
       hdfs://namenode:port/path/to/authz-provider.ini
       file:///path/to/authz-provider.ini
    -->
  </property>

  <property>
    <name>hive.sentry.server</name>
    <value>server1</value>
  </property>
</configuration>

Enabling Sentry in HiveServer2

To enable Sentry, add the following properties to hive-site.xml:
<property>
<name>hive.server2.session.hook</name>
<value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
</property>

<property>
<name>hive.sentry.conf.url</name>
<value></value>
<description>sentry-site.xml file location</description>
</property>

Securing the Hive Metastore

It's important that the Hive metastore be secured. Do this by turning on Hive metastore security, using the instructions in the CDH4 Security Guide:

Appendix: Authorization Privilege Model for Hive

Privileges can be granted on different objects in the Hive warehouse. Any privilege that can be granted is associated with a level in the object hierarchy. If a privilege is granted on a container object in the hierarchy, the base object automatically inherits it. For instance, if a user has ALL privileges on the database scope, then (s)he has ALL privileges on all of the base objects contained within that scope.

Object hierarchy in Hive

Server
     Database
         Table
             Partition
             Columns
         View
         Index
     Function/Routine
     Lock
Table 1. Valid Privilege types and objects they apply to
Privilege Object
INSERT TABLE, URI
SELECT TABLE, VIEW, URI
ALL SERVER, DB, URI
Table 2. Privilege hierarchy
Base Object Granular privileges on object Container object that contains the base object Privileges on container object that implies privileges on the base object
DATABASE ALL SERVER ALL
TABLE INSERT DATABASE ALL
TABLE SELECT DATABASE ALL
VIEW SELECT DATABASE ALL
Table 3. Privilege table for HiveQL operations
Hive SQL Operation Privileges Scope
EXPLAIN SELECT Table
LOAD INSERT, SELECT@URI Table
EXPORT SELECT, INSERT@URI Table
IMPORT ALL, SELECT@URI Server
ANALYZE INSERT, SELECT Table
CREATEDATABASE ALL Server
DROPDATABASE ALL Server
SWITCHDATABASE Any Any Table, View in the DB
DROPTABLE ALL Database
DESCTABLE SELECT or INSERT Table
DESCFUNCTION Not Restricted  
MSCK ALL Database
ALTER TABLE ADD COLS ALL Database
ALTER TABLE REPLACE COLS ALL Database
ALTER TABLE RENAME COL ALL Database
ALTER TABLE RENAME PART ALL Database
ALTER TABLE RENAME ALL Database
ALTER TABLE DROP PART ALL Database
ALTER TABLE ADD PART ALL, ALL@URI Database
ALTER TABLE ARCHIVE ALL Database
ALTER TABLE UNARCHIVE ALL Database
ALTER TABLE PROPERTIES ALL Database
ALTER TABLE SERIALIZER ALL Database
ALTER PARTITION SERIALIZER ALL Database
ALTER TABLE SERDEPROPS ALL Database
ALTER PARTITION SERDEPROPS ALL Database
ALTER TABLE CLUSTER SORT ALL Database
SHOW DATABASE Any Privilege Any obj in the DB
SHOW TABLES SELECT or INSERT Table
SHOW COLUMNS SELECT or INSERT Table
SHOW TABLE STATUS SELECT or INSERT Table
SHOW TABLE PROPERTIES SELECT or INSERT Table
SHOW CREATE TABLE SELECT or INSERT Table
SHOW FUNCTIONS* Not restricted  
SHOW PARTITIONS SELECT or INSERT Table
SHOW INDEXES SELECT or INSERT Table
SHOW LOCKS Not Restricted  
CREATE FUNCTION ALL Server
CREATE VIEW ALL Database
CREATE INDEX ALL Database
DROP FUNCTION Any Privilege Any Object
DROP VIEW ALL Database
DROP INDEX ALL Database
ALTER INDEX REBUILD ALL Database
ALTER VIEW PROPERTIES ALL Database
GRANT PRIVILEGE Allowed, but has no effect on HS2 auth  
REVOKE PRIVILEGE Allowed, but has no effect on HS2 auth  
SHOW GRANTS Allowed, but has no effect on HS2 auth  
ALTER TABLE PROTECT MODE ALL Database
ALTER TABLE FILE FORMAT ALL Database
ALTER TABLE LOCATION* ALL Server
ALTER PARTITION PROTECT MODE ALL Database
ALTER PARTITION FILE FORMAT ALL Database
ALTER PARTITION LOCATION ALL Server
CREATE TABLE ALL Database
CREATE TABLE EXTERNAL ALL, SELECT@URI Database
CREATE TABLE AS SELECT ALL, SELECT Database, Table/View
QUERY SELECT Table, View
ALTER INDEX PROP ALL Database
ALTER DATABASE ALL Database
DESC DATABASE ALL Database
ALTER TABLE MERGE FILE ALL Database
ALTER PARTITION MERGE FILE ALL Database
ALTER TABLE SKEWED ALL Database
ALTER TABLE PARTITION SKEWED LOCATION ALL Database
ADD JAR Restricted unless hive.server2.authorization.external.exec = true  
TRANSFORM ALL Server