This is the documentation for Cloudera 5.2.x.
Documentation for other versions is available at Cloudera Documentation.

Sentry Policy File Authorization

  Important: This is the documentation for configuring Sentry using the policy file approach. Cloudera recommends you use the database-backed Sentry service introduced in CDH 5.1 to secure your data. See The Sentry Service for more information.

Sentry enables role-based, fine-grained authorization for HiveServer2, Cloudera Impala and Cloudera Search.

For more information on installing, upgrading and configuring policy file authorization, see:

Prerequisites

Sentry depends on an underlying authentication framework to reliably identify the requesting user. It requires:
  • CDH 4.3.0 or later.
  • HiveServer2 and the Hive Metastore running with strong authentication. For HiveServer2, strong authentication is either Kerberos or LDAP. For the Hive Metastore, only Kerberos is considered strong authentication (to override, see Securing the Hive Metastore).
  • Impala 1.2.1 (or later) running with strong authentication. With Impala, either Kerberos or LDAP can be configured to achieve strong authentication. Auditing of authentication failures is supported only with CDH 4.4.0 and Impala 1.2.1 or later.
  • Implement Kerberos authentication on your cluster. This is to prevent a user bypassing the authorization and gaining direct access to the underlying data.

Roles and Privileges

Sentry uses a role-based privilege model. A role is a collection of rules for accessing a given Hive object. The objects supported in the current release are server, database, table, and URI. Access to each object is governed by privileges: Select, Insert, or All.

  Note: All is not supported explicitly in the table scope; you have to specify Select and Insert explicitly.
For example, a rule for the Select privilege on table customers from database sales would be formulated as follows:
server=server1->db=sales->table=customer->action=Select
Each object must be specified as a hierarchy of the containing objects, from server to table, followed by the privilege granted for that object. A role can contain multiple such rules, separated by commas. For example, a role might contain the Select privilege for the customer and items tables in the sales database, and the Insert privilege for the sales_insights table in the reports database. You would specify this as follows:
sales_reporting =
\server=server1->db=sales->table=customer->action=Select, 
\server=server1->db=sales->table=items>action=Select, 
\server=server1->db=reports->table=sales_insights>action=Insert

Privilege Model

With CDH 5.1, the privilege model has undergone changes to accommodate the new grant/revoke syntax that is used with the Sentry service. These changes are common to both the new database-backed Sentry service, as well as the previous policy file approach.

The Sentry privilege model has the following characteristics:
  • Allows any user to execute show function, desc function, and show locks.
  • Allows the user to see only those tables and databases for which this user has privileges.
  • Requires a user to have the necessary privileges on the URI to execute HiveQL operations that take in a location. Examples of such operations include LOAD, IMPORT, and EXPORT.
  Important: When Sentry is enabled, a user with no privileges on a database will not be allowed to connect to HiveServer2. This is because the use <database> command is now executed as part of the connection to HiveServer2, which is why the connection fails. See HIVE-4256.

For more information, see Authorization Privilege Model for Hive and Impala.

Users and Groups

  • A user is an entity that is permitted by the authentication subsystem to access the Hive service. This entity can be a Kerberos principal, an LDAP userid, or an artifact of some other pluggable authentication system supported by HiveServer2.
  • A group connects the authentication system with the authorization system. It is a collection of one or more users who have been granted one or more authorization roles. Sentry allows a set of roles to be configured for a group.
  • A configured group provider determines a user’s affiliation with a group. The current release supports HDFS-backed groups and locally configured groups.
For example,
analyst = sales_reporting, data_export, audit_report
Here the group analyst is granted the roles sales_reporting, data_export, and audit_report. The members of this group can run the HiveQL statements that are allowed by these roles. If this is an HDFS-backed group, then all the users belonging to the HDFS group analyst can run such queries.

User to Group Mapping

You can configure Sentry to use either Hadoop groups or groups defined in the policy file. By default, Sentry looks up groups locally, but it can be configured to look up Hadoop groups using LDAP (for Active Directory). Local groups will be looked up on the host Sentry runs on. For Hive, this will be the host running HiveServer2. Group mappings in Sentry can be summarized as in the figure below:

  Important: You can use either Hadoop groups or local groups, but not both at the same time. Use local groups if you want to do a quick proof-of-concept. For production, use Hadoop groups. Refer Configuring LDAP Group Mappings for details on configuring LDAP group mappings in Hadoop.

Policy File

The sections that follow contain notes on creating and maintaining the policy file, and using URIs to load external data and JARs.

  Warning: An invalid policy file will be ignored while logging an exception. This will lead to a situation where users will lose access to all Sentry-protected data, since default Sentry behaviour is deny unless a user has been explicitly granted access. (Note that if only the per-DB policy file is invalid, it will invalidate only the policies in that file.)

Storing the Policy File

Considerations for storing the policy file(s) in HDFS include:

  1. Replication count - Because the file is read for each query in Hive and read once every five minutes by all Impala daemons, you should increase this value; since it is a small file, setting the replication count equal to the number of slave nodes in the cluster is reasonable.
  2. Updating the file - Updates to the file are reflected immediately, so you should write them to a temporary copy of the file first, and then replace the existing file with the temporary one after all the updates are complete. This avoids race conditions caused by reads on an incomplete file.

Defining Roles

Keep in mind that role definitions are not cumulative; the the definition that is further down in the file replaces the older one. For example, the following results in role1 having privilege2, not privilege1 and privilege2.
role1 = privilege1
role1 = privilege2
Role names are scoped to a specific file. For example, if you give role1 the ALL privilege on db1 in the global policy file and give role1 ALL on db2 in the per-db db2 policy file, the user will be given both privileges.

URIs

Any command which references a URI such as CREATE TABLE EXTERNAL, LOAD, IMPORT, EXPORT, and more, in addition to CREATE TEMPORARY FUNCTION requires the URI privilege. This is an important security control because without this users could simply create an external table over an existing table they do not have access to and bypass Sentry.

URIs must start with either hdfs:// or file://. If a URI starts with anything else, it will cause an exception and the policy file will be invalid.

When defining URIs for HDFS, you must also specify the NameNode. For example:
data_read = server=server1->uri=file:///path/to/dir,\
server=server1->uri=hdfs://namenode:port/path/to/dir
  Important: Because the NameNode host and port must be specified, Cloudera strongly recommends you use High Availability (HA). This ensures that the URI will remain constant even if the NameNode changes.

Loading Data

Data can be loaded using a landing skid, either in HDFS or via a local/NFS directory where HiveServer2/Impala run. The following privileges can be used to grant a role access to a loading skid:
  • Load data from a local/NFS directory:
    server=server1->uri=file:///path/to/nfs/local/to/nfs
  • Load data from HDFS (MapReduce, Pig, and so on):
    server=server1->uri=hdfs://ha-nn-uri/data/landing-skid

In addition to the privilege in Sentry, the hive or impala user will require the appropriate file permissions to access the data being loaded. Groups can be used for this purpose. For example, create a group hive-users, and add the hive and impala users along with the users who will be loading data, to this group.

The example usermod and groupadd commands below are only applicable to locally defined groups on the NameNode, JobTracker, and ResourceManager. If you use another system for group management, equivalent changes should be made in your group management system.
$ groupadd hive-users
$ usermod -G someuser,hive-users someuser
$ usermod -G hive,hive-users hive

External Tables

External tables require the ALL@database privilege in addition to the URI privilege. When data is being inserted through the EXTERNAL TABLE statement, or is referenced from an HDFS location outside the normal Hive database directories, the user needs appropriate permissions on the URIs corresponding to those HDFS locations. This means that the URI location must either be owned by the hive:hive user OR the hive/impala users must be members of the group that owns the directory.

You can configure access to the directory using a URI as follows:
[roles]
someuser_home_dir_role = server=server1->uri=hdfs://ha-nn-uri/user/someuser
You should now be able to create an external table:
CREATE EXTERNAL TABLE ... 
LOCATION 'hdfs://ha-nn-uri/user/someuser/mytable';

Sample Sentry Configuration Files

This section provides a sample configuration.

Policy Files

The following is an example of a policy file with a per-DB policy file. In this example, the first policy file, sentry-provider.ini would exist in HDFS; hdfs://ha-nn-uri/etc/sentry/sentry-provider.ini might be an appropriate location. The per-DB policy file is for the customer's database. It is located at hdfs://ha-nn-uri/etc/sentry/customers.ini.

sentry-provider.ini
[databases]
# Defines the location of the per DB policy file for the customers DB/schema 
customers = hdfs://ha-nn-uri/etc/sentry/customers.ini 

[groups]
# Assigns each Hadoop group to its set of roles 
manager = analyst_role, junior_analyst_role
analyst = analyst_role
jranalyst = junior_analyst_role
customers_admin = customers_admin_role
admin = admin_role 

[roles]
# The uris below define a define a landing skid which
# the user can use to import or export data from the system.
# Since the server runs as the user "hive" files in that directory
# must either have the group hive and read/write set or
# be world read/write.
analyst_role = server=server1->db=analyst1, \
    server=server1->db=jranalyst1->table=*->action=select    
    server=server1->uri=hdfs://ha-nn-uri/landing/analyst1
junior_analyst_role = server=server1->db=jranalyst1, \
    server=server1->uri=hdfs://ha-nn-uri/landing/jranalyst1 

# Implies everything on server1 -> customers. Privileges for
# customers can be defined in the global policy file even though 
# customers has its only policy file. Note that the Privileges from
# both the global policy file and the per-DB policy file
# are merged. There is no overriding.
customers_admin_role = server=server1->db=customers 

# Implies everything on server1.
admin_role = server=server1
customers.ini
[groups]
manager = customers_insert_role, customers_select_role
analyst = customers_select_role 

[roles]
customers_insert_role = server=server1->db=customers->table=*->action=insert
customers_select_role = server=server1->db=customers->table=*->action=select
  Important: Sentry does not support using the view keyword in policy files. If you want to define a role against a view, use the keyword table instead. For example, to define the role analyst_role against the view col_test_view:
[roles]
analyst_role = server=server1->db=default->table=col_test_view->action=select

Sentry Configuration File

The following is an example of a sentry-site.xml file.

  Important: If you are using Cloudera Manager 4.6 (or earlier), make sure you do not store sentry-site.xml in /etc/hive/conf; that directory is regenerated whenever the Hive client configurations are redeployed. Instead, use a directory such as /etc/sentry to store the sentry file.

If you are using Cloudera Manager 4.7 (or later), Cloudera Manager will create and deploy sentry-site.xml for you. See The Sentry Service for more details on configuring Sentry with Cloudera Manager.

sentry-site.xml

<configuration>
  <property>
    <name>hive.sentry.provider</name>
    <value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>
  </property>

  <property>
    <name>hive.sentry.provider.resource</name>
    <value>/path/to/authz-provider.ini</value>
    <!-- 
       If the hdfs-site.xml points to HDFS, the path will be in HDFS;
       alternatively you could specify a full path, e.g.:
       hdfs://namenode:port/path/to/authz-provider.ini
       file:///path/to/authz-provider.ini
    -->
  </property>

  <property>
    <name>sentry.hive.server</name>
    <value>server1</value>
  </property>
</configuration>

Accessing Sentry-Secured Data Outside Hive/Impala

When Sentry is enabled, the hive user owns all data within the Hive warehouse. However, unlike traditional database systems the enterprise data hub allows for multiple engines to execute over the same dataset.
  Note: Cloudera strongly recommends you use Hive/Impala SQL queries to access data secured by Sentry, as opposed to accessing the data files directly.

However, there are scenarios where fully vetted and reviewed jobs will also need to access the data stored in the Hive warehouse. A typical scenario would be a secured MapReduce transformation job that is executed automatically as an application user. In such cases it's important to know that the user executing this job will also have full access to the data in the Hive warehouse.

Scenario One: Authorizing Jobs

Problem

A reviewed, vetted, and automated job requires access to the Hive warehouse and cannot use Hive/Impala to access the data.

Solution

Create a group which contains hive, impala, and the user executing the automated job. For example, if the etl user is executing the automated job, you can create a group called hive-users which contains the hive, impala, and etl users.

The example usermod and groupadd commands below are only applicable to locally defined groups on the NameNode, JobTracker, and ResourceManager. If you use another system for group management, equivalent changes should be made in your group management system.
$ groupadd hive-users
$ usermod -G hive,impala,hive-users hive
$ usermod -G hive,impala,hive-users impala
$ usermod -G etl,hive-users etl
Once you have added users to the hive-users group, change directory permissions in the HDFS:
$ hadoop fs -chgrp -R hive:hive-users /user/hive/warehouse
$ hadoop fs -chmod -R 770 /user/hive/warehouse

Scenario Two: Authorizing Group Access to Databases

Problem

One group of users, grp1 should have full access to the database, db1, outside of Sentry. The database, db1 should not be accessible to any other groups, outside of Sentry. Sentry should be used for all other authorization needs.

Solution

Place the hive and impala users in grp1.
$ usermod -G hive,impala,grp1 hive
$ usermod -G hive,impala,grp1 impala
Then change group ownerships of all directories and files in db1 to grp1, and modify directory permissions in the HDFS. This example is only applicable to local groups on a single host.
$ hadoop fs -chgrp -R hive:grp1 /user/hive/warehouse/db1.db 
$ hadoop fs -chmod -R 770 /user/hive/warehouse/db1.db

Debugging Failed Sentry Authorization Requests

Sentry logs all facts that lead up to authorization decisions at the debug level. If you do not understand why Sentry is denying access, the best way to debug is to temporarily turn on debug logging:
  • In Cloudera Manager, add log4j.logger.org.apache.sentry=DEBUG to the logging settings for your service through the corresponding Logging Safety Valve field for the Impala, Hive Server 2, or Solr Server services.
  • On systems not managed by Cloudera Manager, add log4j.logger.org.apache.sentry=DEBUG to the log4j.properties file on each host in the cluster, in the appropriate configuration directory for each service.
Specifically, look for exceptions and messages such as:
FilePermission server..., RequestPermission server...., result [true|false]
which indicate each evaluation Sentry makes. The FilePermission is from the policy file, while RequestPermission is the privilege required for the query. A RequestPermission will iterate over all appropriate FilePermission settings until a match is found. If no matching privilege is found, Sentry returns false indicating "Access Denied".

Authorization Privilege Model for Hive and Impala

Privileges can be granted on different objects in the Hive warehouse. Any privilege that can be granted is associated with a level in the object hierarchy. If a privilege is granted on a container object in the hierarchy, the base object automatically inherits it. For instance, if a user has ALL privileges on the database scope, then (s)he has ALL privileges on all of the base objects contained within that scope.

Object Hierarchy in Hive

Server
     Database
         Table
             Partition
             Columns
         View
         Index
     Function/Routine
     Lock
Table 1. Valid privilege types and objects they apply to
Privilege Object
INSERT DB, TABLE
SELECT DB, TABLE
ALL SERVER, TABLE, DB, URI
Table 2. Privilege hierarchy
Base Object Granular privileges on object Container object that contains the base object Privileges on container object that implies privileges on the base object
DATABASE ALL SERVER ALL
TABLE INSERT DATABASE ALL
TABLE SELECT DATABASE ALL
VIEW SELECT DATABASE ALL
Table 3. Privilege table for Hive & Impala operations
Operation Scope Privileges URI Others
CREATE DATABASE SERVER ALL    
DROP DATABASE DATABASE ALL
CREATE TABLE DATABASE ALL
DROP TABLE TABLE ALL
CREATE VIEW DATABASE; SELECT on TABLE ALL SELECT on TABLE
DROP VIEW VIEW/TABLE ALL
CREATE INDEX TABLE ALL
DROP INDEX TABLE ALL
ALTER TABLE .. ADD COLUMNS TABLE ALL
ALTER TABLE .. REPLACE COLUMNS TABLE ALL
ALTER TABLE .. CHANGE column TABLE ALL
ALTER TABLE .. RENAME TABLE ALL
ALTER TABLE .. SET TBLPROPERTIES TABLE ALL
ALTER TABLE .. SET FILEFORMAT TABLE ALL
ALTER TABLE .. SET LOCATION TABLE ALL URI  
ALTER TABLE .. ADD PARTITION TABLE ALL
ALTER TABLE .. ADD PARTITION location TABLE ALL URI  
ALTER TABLE .. DROP PARTITION TABLE ALL
ALTER TABLE .. PARTITION SET FILEFORMAT TABLE ALL
SHOW TBLPROPERTIES TABLE SELECT/INSERT    
SHOW CREATE TABLE TABLE SELECT/INSERT    
SHOW PARTITIONs TABLE SELECT/INSERT    
DESCRIBE TABLE TABLE SELECT/INSERT    
DESCRIBE TABLE .. PARTITION TABLE SELECT/INSERT    
LOAD DATA TABLE INSERT URI  
SELECT TABLE SELECT    
INSERT OVERWRITE TABLE TABLE INSERT    
CREATE TABLE .. AS SELECT DATABASE; SELECT on TABLE ALL SELECT on TABLE
USE <dbName> Any
ALTER TABLE .. SET SERDEPROPERTIES TABLE ALL
ALTER TABLE .. PARTITION SET SERDEPROPERTIES TABLE ALL
Hive-Only Operations
INSERT OVERWRITE DIRECTORY TABLE INSERT URI  
Analyze TABLE TABLE SELECT + INSERT    
IMPORT TABLE DATABASE ALL URI  
EXPORT TABLE TABLE SELECT URI  
ALTER TABLE TOUCH TABLE ALL
ALTER TABLE TOUCH PARTITION TABLE ALL
ALTER TABLE .. CLUSTERED BY SORTED BY TABLE ALL
ALTER TABLE .. ENABLE/DISABLE TABLE ALL
ALTER TABLE .. PARTITION ENABLE/DISABLE TABLE ALL
ALTER TABLE .. PARTITION.. RENAME TO PARTITION TABLE ALL    
ALTER DATABASE DATABASE ALL    
DESCRIBE DATABASE DATABASE SELECT/INSERT    
SHOW COLUMNS TABLE SELECT/INSERT    
SHOW INDEXES TABLE SELECT/INSERT    
GRANT PRIVILEGE Allowed only for Sentry admin users
REVOKE PRIVILEGE Allowed only for Sentry admin users
SHOW GRANTS Allowed only for Sentry admin users
ADD JAR Not Allowed      
ADD FILE Not Allowed      
DFS Not Allowed      
Impala-Only Operations
EXPLAIN TABLE SELECT    
INVALIDATE METADATA SERVER ALL    
INVALIDATE METADATA <table name> TABLE SELECT/INSERT    
REFRESH <table name> TABLE SELECT/INSERT    
CREATE FUNCTION SERVER ALL    
DROP FUNCTION SERVER ALL    
COMPUTE STATS TABLE ALL