This is the documentation for CDH 5.1.x.
Documentation for other versions is available at Cloudera Documentation.

Sentry Service Configuration

  Important: This is the documentation for the Sentry service introduced in CDH 5.1. If you want to use Sentry's previous policy file approach to secure your data, see Sentry Policy File Configuration for more information.

The Sentry service is an RPC server that stores the authorization metadata in an underlying relational database and provides RPC interfaces to retrieve and manipulate privileges. It supports secure access to services using Kerberos. The Hive and Impala services are clients of this service. The service provides authorization metadata from the database-backed storage; it does not handle actual privilege validation.

The motivation behind introducing a new Sentry service is to make it easier to handle user privileges than the existing policy file approach. Providing a database instead, allows you to use the more traditional GRANT/REVOKE statements to modify privileges.

Continue reading:

Prerequisites

Sentry depends on an underlying authentication framework to reliably identify the requesting user. It requires:

  • CDH 5.1.x
  • HiveServer2 with strong authentication (Kerberos or LDAP)
  • A secure Hadoop cluster

    This is to prevent a user bypassing the authorization and gaining direct access to the underlying data.

In addition to the above, make sure that the following are true:
  • The Hive warehouse directory (/user/hive/warehouse or any path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
    • Permissions on the warehouse directory must be set as follows (see following Note for caveats):
      • 771 on the directory itself (for example, /user/hive/warehouse)
      • 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
      • All files and subdirectories should be owned by hive:hive
      For example:
      $ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
      $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
        Note:
      • If you set hive.warehouse.subdir.inherit.perms to true in hive-site.xml, the permissions on the subdirectories will be set when you set permissions on the warehouse directory itself.
      • If a user has access to any object in the warehouse, that user will be able to execute use default. This ensures that use default commands issued by legacy applications work when Sentry is enabled. Note that you can protect objects in the default database (or any other database) by means of a policy file.
        Important: These instructions override the recommendations in the Hive section of the CDH 5 Installation Guide.
  • HiveServer2 impersonation must be turned off.
  • The Hive user must be able to submit MapReduce jobs. You can ensure that this is true by setting the minimum user ID for job submission to 0. Edit the taskcontroller.cfg file and set min.user.id=0.
    To enable the Hive user to submit YARN jobs, add the user hive to the allowed.system.users configuration property. Edit the container-executor.cfg file and add hive to the allowed.system.users property. For example,
    allowed.system.users=nobody,impala,hive
      Important:
    • You must restart the cluster and HiveServer2 after changing this value, whether you use Cloudera Manager or not.
    • These instructions override the instructions under Configuring MRv1 Security
    • These instructions override the instructions under Configuring YARN Security

Privilege Model

With CDH 5.1, the privilege model has undergone changes to accomodate the new grant/revoke syntax that is used with the Sentry service. These changes are common to both the new database-backed Sentry service, as well as the previous policy file approach.

The Sentry privilege model has the following characteristics:
  • Allows any user to execute show function, desc function, and show locks.
  • Allows the user to see only those tables and databases for which this user has privileges.
  • Requires a user to have the necessary privileges on the URI to execute HiveQL operations that take in a location. Examples of such operations include LOAD, IMPORT, and EXPORT.
  Important: When Sentry is enabled, a user with no privileges on a database will not be allowed to connect to HiveServer2. This is because the use <database> command is now executed as part of the connection to HiveServer2, which is why the connection fails. See HIVE-4256.

For more information, see Appendix: Authorization Privilege Model for Hive and Impala.

Users and Groups

  • A user is an entity that is permitted by the authentication subsystem to access the Hive service. This entity can be a Kerberos principal, an LDAP userid, or an artifact of some other pluggable authentication system supported by HiveServer2.
  • A group connects the authentication system with the authorization system. It is a collection of one or more users who have been granted one or more authorization roles. Sentry allows a set of roles to be configured for a group.
  • A configured group provider determines a user’s affiliation with a group. The current release supports HDFS-backed groups and locally configured groups.

User to Group Mapping

You can configure Sentry to use either Hadoop groups or groups defined in the policy file. By default, Sentry looks up groups locally, but it can be configured to look up Hadoop groups using LDAP (for Active Directory). Local groups will be looked up on the host Sentry runs on. For Hive, this will be the host running HiveServer2.

Group mappings in Sentry can be summarized as in the figure below:

  Important: You can use either Hadoop groups or local groups, but not both at the same time. Use local groups if you want to do a quick proof-of-concept. For production, use Hadoop groups. Refer Appendix I - Configuring LDAP Group Mappings for details on configuring LDAP group mappings in Hadoop.

Configuring Hadoop Groups

Set the hive.sentry.provider property in sentry-site.xml.
<property>
<name>hive.sentry.provider</name>
<value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>
</property>

Configuring Local Groups

  1. Define local groups in a [users] section of the Policy file. For example:
    [users]
    user1 = group1, group2, group3
    user2 = group2, group3
  2. In sentry-site.xml, set hive.sentry.provider as follows:
    <property>
    <name>hive.sentry.provider</name>
    <value>org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider</value>
    </property>

Setup and Configuration

Installing and Upgrading Sentry

Upgrading Sentry from CDH 4 to CDH 5

To upgrade Sentry from CDH 4 to CDH 5, you must uninstall the old version and install the new version. If you have already performed the steps to uninstall CDH 4 and all components, as described under Upgrading from CDH 4 to CDH 5, you can skip Step 1 below and proceed with installing the new CDH 5 version of Sentry.

  1. Remove the CDH 4 Version of Sentry

    Remove Sentry as follows, depending on your operating system:

    OS Command
    RHEL
    $ sudo yum remove sentry
    SLES
    $ sudo zypper remove sentry
    Ubuntu or Debian
    $ sudo apt-get remove sentry
  2. Install the New Version of Sentry

    Follow instructions in the next section to install the CDH 5 version of Sentry.

      Important: Configuration files
    • If you install a newer version of a package that is already on the system, configuration files that you have modified will remain intact.
    • If you uninstall a package, the package manager renames any configuration files you have modified from <file> to <file>.rpmsave. If you then re-install the package (probably to install a new version) the package manager creates a new <file> with applicable defaults. You are responsible for applying any changes captured in the original configuration file to the new configuration file. In the case of Ubuntu and Debian upgrades, you will be prompted if you have made changes to a file for which there is a new version; for details, see Automatic handling of configuration files by dpkg.

    The upgrade is now complete.

Installing Sentry

Install Sentry as follows, depending on your operating system:
OS Command
RHEL
$ sudo yum install sentry
SLES
$ sudo zypper install sentry
Ubuntu or Debian
$ sudo apt-get update; 
$ sudo apt-get install sentry

Starting the Sentry Service

Perform the following steps to start the Sentry service on your cluster.
  1. Set the SENTRY_HOME and HADOOP_HOME parameters.
  2. Create the Sentry database schema using the Sentry schematool. Sentry, by default, does not initialize the schema. The schematool is a built-in way for you to deploy the backend schema required by the Sentry service. For example, the following command uses the schematool to initialize the schema for a MySQL database.
    bin/sentry --command schema-tool --conffile <sentry-site.xml> --dbType mysql --initSchema
    Alternatively, you can set the sentry.verify.schema.version configuration property to false. However, this is not recommended.
  3. Start the Sentry service.
    bin/sentry --command service --conffile <sentry-site.xml>

Hive SQL Syntax

Permissions stored in the Sentry service are configured through Grant and Revoke statements issued either interactively or programmatically through the HiveServer2 SQL command line interface, Beeline. The syntax described below is very similar to the GRANT/REVOKE commands available in well-established relational database systems.

CREATE ROLE Statement

The CREATE ROLE statement creates a role to which privileges can be granted. Privileges can be granted to roles, which can then be assigned to users. A user that has been assigned a role will only be able to exercise the privileges of that role.

Only users that have administrative privileges can create/drop roles. By default, the hive, impala and hue users have admin privileges in Sentry.

CREATE ROLE [role_name];

DROP ROLE Statement

The DROP ROLE statement can be used to remove a role from the database. Once dropped, the role will be revoked for all users to whom it was previously assigned. Queries that are already executing will not be affected. However, since Hive checks user privileges before executing each query, active user sessions in which the role has already been enabled will be affected.
DROP ROLE [role_name];

GRANT ROLE Statement

The GRANT ROLE statement can be used to grant roles to groups. Only sentry admin users can grant the role to a group.
GRANT ROLE role_name [, role_name]    
    TO GROUP <groupName> [,GROUP <groupName>]

REVOKE ROLE Statement

The REVOKE ROLE statement can be used to revoke roles from groups. Only sentry admin users can revoke the role from a group.
REVOKE ROLE role_name [, role_name]    
    FROM GROUP <groupName> [,GROUP <groupName>]

GRANT <PRIVILEGE> Statement

In order to grant privileges on an object to a role, the user must be a sentry admin user.
GRANT    
    <PRIVILEGE> [, <PRIVILEGE> ]    
    ON <OBJECT> <object_name>    
    TO ROLE <roleName> [,ROLE <roleName>]

REVOKE <PRIVILEGE> Statement

Since only authorized admin users can create roles, consequently only sentry admin users can revoke privileges from a group.
REVOKE    
    <PRIVILEGE> [, <PRIVILEGE> ]    
    ON <OBJECT> <object_name>    
    FROM ROLE <roleName> [,ROLE <roleName>]

SET ROLE Statement

The SET ROLE statement can be used to specify a role to be enabled for the current session. A user can only enable a role that has been granted to them. Any roles not listed and not already enabled are disabled for the current session. If no roles are enabled, the user will have the privileges granted by any of the roles that (s)he belongs to.
To enable a specific role:
SET ROLE <roleName>;
To enable all roles:
SET ROLE ALL;
No roles enabled:
SET ROLE NONE;

SHOW Statement

To list all the roles in the system (only for sentry admin users):
SHOW ROLES;
To list all the roles in effect for the current user session:
SHOW CURRENT ROLES;
To list all the roles assigned to the given <groupName> (only for sentry admin users):
SHOW ROLE GRANT GROUP <groupName>;

The SHOW statement can also be used to list the privileges that have been granted to a role or all the grants given to a role for a particular object.

To list all the grants for the given <roleName> (only for sentry admin users):
SHOW GRANT ROLE <roleName>;
To list all the grants for a role on the given <objectName> (only for sentry admin users):
SHOW GRANT ROLE <roleName> on OBJECT <objectName>;

Example: Using Grant/Revoke Statements to Match an Existing Policy File

Here is a sample policy file:
[groups] 
# Assigns each Hadoop group to its set of roles  
manager = analyst_role, junior_analyst_role 
analyst = analyst_role 
jranalyst = junior_analyst_role 
customers_admin = customers_admin_role 
admin = admin_role  

[roles] # The uris below define a define a landing skid which 
# the user can use to import or export data from the system. 
# Since the server runs as the user "hive" files in that directory 
# must either have the group hive and read/write set or 
# be world read/write. 
analyst_role = server=server1->db=analyst1, \     
    server=server1->db=jranalyst1->table=*->action=select         
    server=server1->uri=hdfs://ha-nn-uri/landing/analyst1 
junior_analyst_role = server=server1->db=jranalyst1, \     
    server=server1->uri=hdfs://ha-nn-uri/landing/jranalyst1  

# Implies everything on server1. 
admin_role = server=server1

The following sections show how you can use the new GRANT statements to assign privileges to roles (and assign roles to groups) to match the sample policy file above.

Grant privileges to analyst_role:
CREATE ROLE analyst_role;
GRANT ALL ON DATABASE analyst1 TO ROLE analyst_role;
GRANT SELECT ON DATABASE jranalyst1 TO ROLE analyst_role;
GRANT ALL ON URI 'hdfs://ha-nn-uri/landing/analyst1' \
TO ROLE analyst_role;
Grant privileges to junior_analyst_role:
CREATE ROLE junior_analyst_role;
GRANT ALL ON DATABASE jranalyst1 TO ROLE junior_analyst_role;
GRANT ALL ON URI 'hdfs://ha-nn-uri/landing/jranalyst1' \
TO ROLE junior_analyst_role;
Grant privileges to admin_role:
CREATE ROLE admin_role
GRANT ALL ON SERVER server TO ROLE admin_role;
Grant roles to groups:
GRANT ROLE admin_role TO GROUP admin;
GRANT ROLE analyst_role TO GROUP analyst;
GRANT ROLE jranalyst_role TO GROUP jranalyst;

Configuring HiveServer2 for the Sentry Service

Add the following property to hive-site.xml to allow the Hive service to communicate with the Sentry policy store.
<property>
   <name>hive.security.authorization.task.factory</name>
   <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
</property>
<property>
   <name>hive.server2.session.hook</name>
   <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
</property>
<property>
   <name>hive.sentry.conf.url</name>
   <value>file:///{{CMF_CONF_DIR}}/sentry-site.xml</value>
</property>
<property>
   <name>hive.security.authorization.task.factory</name>
   <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
</property>

Configuring the Hive Metastore for the Sentry Service

Configuring Pig and HCatalog for the Sentry Service

Once you have the Sentry service up and running, and Hive has been configured to use the Sentry service, there are some configuration changes you must make to your cluster to allow Pig, MapReduce (using HCatLoader, HCatStorer) and WebHCat queries to access Sentry-secured data stored in Hive.

With HDFS extended ACLs enabled, Cloudera recommends you set the permissions for the Hive warehouse directory, /user/hive/warehouse, to 771 so users other than the owner and group only have execute permissions. Since by default, the /user/hive/warehouse directory is owned by hive:hive, this also restricts requests from any other users at the HDFS level.

With these permissions, other user requests may fail, such as commands coming through Pig jobs, WebHCat queries, and MapReduce jobs. In order to give these users access, perform the following configuration changes:
  • Use HDFS ACLs to define permissions on a specific directory or file of HDFS. This directory/file is generally mapped to a database, table, partition, or a data file.
  • Users running these jobs should have the required permissions in Sentry to add new metadata or read metadata from the Hive Metastore Server. For instructions on how to set up the required permissions, see Hive SQL Syntax. You can use HiveServer2's command line interface, Beeline to update the Sentry database with the user privileges.
Examples:
  • A user who is using Pig HCatLoader will require read permissions on a specific table or partition. In such a case, you can GRANT read access to the user in Sentry and set the ACL to read and execute, on the file being accessed.
  • A user who is using Pig HCatStorer will require ALL permissions on a specific table. In this case, you GRANT ALL access to the user in Sentry and set the ACL to write and execute, on the table being used.

Configuring the Hive Metastore to Communicate with Sentry

Add the following properties to hive-site.xml to allow the Hive Metastore to communicate with the Sentry policy store.
<property>  
    <name>hive.metastore.pre.event.listeners</name>  
    <value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value>  
    <description>list of comma seperated listeners for metastore events.</description>
</property>

<property>
    <name>hive.metastore.event.listeners</name>  
    <value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value>  
    <description>list of comma seperated listeners for metastore, post events.</description>
</property>

Configuring Impala for the Sentry Service

To configure Impala as a client of the Sentry service, set the following configuration properties in sentry-site.xml.
<property>
   <name>sentry.service.client.server.rpc-port</name>
   <value>3893</value>
</property>
<property>
   <name>sentry.service.client.server.rpc-address</name>
   <value>hostname</value>
</property>
<property>
   <name>sentry.service.client.server.rpc-connection-timeout</name>
   <value>200000</value>
</property>
<property>
   <name>sentry.service.security.mode</name>
   <value>none</value>
</property>
Other configuration changes required include:
  • To enable the Sentry policy service, the following flag should be set on the catalogd and the impalad.
    --sentry_config=<absolute path to sentry service configuration file>
  • To enable authorization based on policy server metadata set the following flag on the impalad.
    --server_name=<server name>
  • To enable authorization based on a file-based policy set the following flags on the impalad.
    --server_name=<server name>
    --authorization_policy_file=<path to policy file>

    If the --authorization_policy_file flag is set, Impala will use the policy file-based approach. Otherwise, the policy server metadata approach will be used to implement authorization.

  • The impala user also needs to be added to list of administrative users of the Sentry Policy Server. For more details, see SENTRY-191.

Appendix: Authorization Privilege Model for Hive and Impala

Privileges can be granted on different objects in the Hive warehouse. Any privilege that can be granted is associated with a level in the object hierarchy. If a privilege is granted on a container object in the hierarchy, the base object automatically inherits it. For instance, if a user has ALL privileges on the database scope, then (s)he has ALL privileges on all of the base objects contained within that scope.

Object Hierarchy in Hive

Server
     Database
         Table
             Partition
             Columns
         View
         Index
     Function/Routine
     Lock
Table 1. Valid privilege types and objects they apply to
Privilege Object
INSERT DB, TABLE
SELECT DB, TABLE
ALL SERVER, TABLE, DB, URI
Table 2. Privilege hierarchy
Base Object Granular privileges on object Container object that contains the base object Privileges on container object that implies privileges on the base object
DATABASE ALL SERVER ALL
TABLE INSERT DATABASE ALL
TABLE SELECT DATABASE ALL
VIEW SELECT DATABASE ALL
Table 3. Privilege table for Hive & Impala operations
Operation Scope Privileges URI Others
CREATE DATABASE SERVER ALL    
DROP DATABASE DATABASE ALL
CREATE TABLE DATABASE ALL
DROP TABLE TABLE ALL
CREATE VIEW DATABASE; SELECT on TABLE ALL SELECT on TABLE
DROP VIEW VIEW/TABLE ALL
CREATE INDEX TABLE ALL
DROP INDEX TABLE ALL
ALTER TABLE .. ADD COLUMNS TABLE ALL
ALTER TABLE .. REPLACE COLUMNS TABLE ALL
ALTER TABLE .. CHANGE column TABLE ALL
ALTER TABLE .. RENAME TABLE ALL
ALTER TABLE .. SET TBLPROPERTIES TABLE ALL
ALTER TABLE .. SET FILEFORMAT TABLE ALL
ALTER TABLE .. SET LOCATION TABLE ALL URI  
ALTER TABLE .. ADD PARTITION TABLE ALL
ALTER TABLE .. ADD PARTITION location TABLE ALL URI  
ALTER TABLE .. DROP PARTITION TABLE ALL
ALTER TABLE .. PARTITION SET FILEFORMAT TABLE ALL
SHOW TBLPROPERTIES TABLE SELECT/INSERT    
SHOW CREATE TABLE TABLE SELECT/INSERT    
SHOW PARTITIONs TABLE SELECT/INSERT    
DESCRIBE TABLE TABLE SELECT/INSERT    
DESCRIBE TABLE .. PARTITION TABLE SELECT/INSERT    
LOAD DATA TABLE INSERT URI  
SELECT TABLE SELECT    
INSERT OVERWRITE TABLE TABLE INSERT    
CREATE TABLE .. AS SELECT DATABASE; SELECT on TABLE ALL SELECT on TABLE
USE <dbName> Any
ALTER TABLE .. SET SERDEPROPERTIES TABLE ALL
ALTER TABLE .. PARTITION SET SERDEPROPERTIES TABLE ALL
Hive-Only Operations
INSERT OVERWRITE DIRECTORY TABLE INSERT URI  
Analyze TABLE TABLE SELECT + INSERT    
IMPORT TABLE DATABASE ALL URI  
EXPORT TABLE TABLE SELECT URI  
ALTER TABLE TOUCH TABLE ALL
ALTER TABLE TOUCH PARTITION TABLE ALL
ALTER TABLE .. CLUSTERED BY SORTED BY TABLE ALL
ALTER TABLE .. ENABLE/DISABLE TABLE ALL
ALTER TABLE .. PARTITION ENABLE/DISABLE TABLE ALL
ALTER TABLE .. PARTITION.. RENAME TO PARTITION TABLE ALL    
ALTER DATABASE DATABASE ALL    
DESCRIBE DATABASE DATABASE SELECT/INSERT    
SHOW COLUMNS TABLE SELECT/INSERT    
SHOW INDEXES TABLE SELECT/INSERT    
GRANT PRIVILEGE Allowed only for Sentry admin users
REVOKE PRIVILEGE Allowed only for Sentry admin users
SHOW GRANTS Allowed only for Sentry admin users
ADD JAR Not Allowed      
ADD FILE Not Allowed      
DFS Not Allowed      
Impala-Only Operations
EXPLAIN TABLE SELECT    
INVALIDATE METADATA SERVER ALL    
INVALIDATE METADATA <table name> TABLE SELECT/INSERT    
REFRESH <table name> TABLE SELECT/INSERT    
CREATE FUNCTION SERVER ALL    
DROP FUNCTION SERVER ALL    
COMPUTE STATS TABLE ALL