This is the documentation for Cloudera 5.3.x. Documentation for other versions is available at Cloudera Documentation.

The Sentry Service

  Important: This is the documentation for the Sentry service introduced in CDH 5.1. If you want to use Sentry's previous policy file approach to secure your data, see Sentry Policy File Authorization.

The Sentry service is a RPC server that stores the authorization metadata in an underlying relational database and provides RPC interfaces to retrieve and manipulate privileges. It supports secure access to services using Kerberos. The service serves authorization metadata from the database backed storage; it does not handle actual privilege validation. The Hive and Impala services are clients of this service and will enforce Sentry privileges when configured to use Sentry.

The motivation behind introducing a new Sentry service is to make it easier to handle user privileges than the existing policy file approach. Providing a database instead, allows you to use the more traditional GRANT/REVOKE statements to modify privileges.

Continue reading:

For more information on installing, upgrading and configuring the Sentry service, see:

Prerequisites

Privilege Model

With CDH 5.1, the privilege model has undergone changes to accommodate the new grant/revoke syntax that is used with the Sentry service. These changes are common to both the new database-backed Sentry service, as well as the previous policy file approach.

The Sentry privilege model has the following characteristics:
  • Allows any user to execute show function, desc function, and show locks.
  • Allows the user to see only those tables and databases for which this user has privileges.
  • Requires a user to have the necessary privileges on the URI to execute HiveQL operations that take in a location. Examples of such operations include LOAD, IMPORT, and EXPORT.
  Important: When Sentry is enabled, a user with no privileges on a database will not be allowed to connect to HiveServer2. This is because the use <database> command is now executed as part of the connection to HiveServer2, which is why the connection fails. See HIVE-4256.

For more information, see Appendix: Authorization Privilege Model for Hive and Impala.

Users and Groups

  • Auseris an entity that is permitted by the authentication subsystem to access the Hive service. This entity can be a Kerberos principal, an LDAP userid, or an artifact of some other pluggable authentication system supported by HiveServer2.
  • Agroupconnects the authentication system with the authorization system. It is a collection of one or more users who have been granted one or more authorization roles. Sentry allows a set of roles to be configured for a group.
  • A configuredgroup providerdetermines a user’s affiliation with a group. The current release supports HDFS-backed groups and locally configured groups.
For example,
analyst = sales_reporting, data_export, audit_report
Here the group analyst is granted the roles sales_reporting, data_export, and audit_report. The members of this group can run the HiveQL statements that are allowed by these roles. If this is an HDFS-backed group, then all the users belonging to the HDFS group analyst can run such queries.

User to Group Mappings

Required Role:

Group mappings in Sentry can be summarized as in the figure below.

The Sentry service in particular, only uses Hadoop user-group mappings. You can refer Configuring LDAP Group Mappings for details on configuring LDAP group mappings in Hadoop.
  Important: Cloudera strongly recommends against using Hadoop's LdapGroupsMapping provider. LdapGroupsMapping should only be used in cases where OS-level integration is not possible. Production clusters require an identity provider that works well with all applications, not just Hadoop. Hence, often the preferred mechanism is to use tools such as SSSD, VAS or Centrify to replicate LDAP groups.

Debugging Failed Sentry Authorization Requests

Sentry logs all facts that lead up to authorization decisions at the debug level. If you do not understand why Sentry is denying access, the best way to debug is to temporarily turn on debug logging:
  • In Cloudera Manager, add log4j.logger.org.apache.sentry=DEBUG to the logging settings for your service through the corresponding Logging Safety Valve field for Impala or HiveServer2.
  • On systems not managed by Cloudera Manager, add log4j.logger.org.apache.sentry=DEBUG to the log4j.properties file on each host in the cluster, in the appropriate configuration directory for each service.
Specifically, look for exceptions and messages such as:
FilePermission server..., RequestPermission server...., result [true|false]
which indicate each evaluation Sentry makes. The FilePermission is from the policy file, while RequestPermission is the privilege required for the query. A RequestPermission will iterate over all appropriate FilePermission settings until a match is found. If no matching privilege is found, Sentry returns false indicating "Access Denied" .

Cluster Components' Behavior When Sentry Service Fails

If the Sentry service fails and you attempt to access the Hive warehouse, Hive, Impala and HDFS will behave as follows:
  • Hive: Queries to the Hive warehouse will fail with an authentication error.
  • Impala: The Impala Catalog server caches Sentry privileges. If Sentry goes down, Impala queries will continue to work and will be authorized against this cached copy of the metadata. However, authorization DDLs such as CREATE ROLE or GRANT ROLE will fail.
  • HDFS/Sentry Synchronized Permissions: Affected HDFS files will continue to use a cached copy of the synchronized ACLs for a configurable period of time (by default, 60 seconds), after which they will fall back to NameNode ACLs.
  • Solr: Solr does not use the Sentry service, hence there will be no impact.

Appendix: Authorization Privilege Model for Hive and Impala

Privileges can be granted on different objects in the Hive warehouse. Any privilege that can be granted is associated with a level in the object hierarchy. If a privilege is granted on a container object in the hierarchy, the base object automatically inherits it. For instance, if a user has ALL privileges on the database scope, then (s)he has ALL privileges on all of the base objects contained within that scope.

Object Hierarchy in Hive

Server
     Database
         Table
             Partition
             Columns
         View
         Index
     Function/Routine
     Lock
Table 1. Valid privilege types and objects they apply to
Privilege Object
INSERT DB, TABLE
SELECT DB, TABLE
ALL SERVER, TABLE, DB, URI
Table 2. Privilege hierarchy
Base Object Granular privileges on object Container object that contains the base object Privileges on container object that implies privileges on the base object
DATABASE ALL SERVER ALL
TABLE INSERT DATABASE ALL
TABLE SELECT DATABASE ALL
VIEW SELECT DATABASE ALL
Table 3. Privilege table for Hive & Impala operations
Operation Scope Privileges URI Others
CREATE DATABASE SERVER ALL    
DROP DATABASE DATABASE ALL    
CREATE TABLE DATABASE ALL    
DROP TABLE TABLE ALL    
CREATE VIEW DATABASE; SELECT on TABLE ALL   SELECT on TABLE
DROP VIEW VIEW/TABLE ALL    
CREATE INDEX TABLE ALL    
DROP INDEX TABLE ALL    
ALTER TABLE .. ADD COLUMNS TABLE ALL    
ALTER TABLE .. REPLACE COLUMNS TABLE ALL    
ALTER TABLE .. CHANGE column TABLE ALL    
ALTER TABLE .. RENAME TABLE ALL    
ALTER TABLE .. SET TBLPROPERTIES TABLE ALL    
ALTER TABLE .. SET FILEFORMAT TABLE ALL    
ALTER TABLE .. SET LOCATION TABLE ALL URI  
ALTER TABLE .. ADD PARTITION TABLE ALL    
ALTER TABLE .. ADD PARTITION location TABLE ALL URI  
ALTER TABLE .. DROP PARTITION TABLE ALL    
ALTER TABLE .. PARTITION SET FILEFORMAT TABLE ALL    
SHOW TBLPROPERTIES TABLE SELECT/INSERT    
SHOW CREATE TABLE TABLE SELECT/INSERT    
SHOW PARTITIONs TABLE SELECT/INSERT    
DESCRIBE TABLE TABLE SELECT/INSERT    
DESCRIBE TABLE .. PARTITION TABLE SELECT/INSERT    
LOAD DATA TABLE INSERT URI  
SELECT TABLE SELECT    
INSERT OVERWRITE TABLE TABLE INSERT    
CREATE TABLE .. AS SELECT DATABASE; SELECT on TABLE ALL   SELECT on TABLE
USE <dbName> Any      
ALTER TABLE .. SET SERDEPROPERTIES TABLE ALL    
ALTER TABLE .. PARTITION SET SERDEPROPERTIES TABLE ALL    
Hive-Only Operations
INSERT OVERWRITE DIRECTORY TABLE INSERT URI  
Analyze TABLE TABLE SELECT + INSERT    
IMPORT TABLE DATABASE ALL URI  
EXPORT TABLE TABLE SELECT URI  
ALTER TABLE TOUCH TABLE ALL    
ALTER TABLE TOUCH PARTITION TABLE ALL    
ALTER TABLE .. CLUSTERED BY SORTED BY TABLE ALL    
ALTER TABLE .. ENABLE/DISABLE TABLE ALL    
ALTER TABLE .. PARTITION ENABLE/DISABLE TABLE ALL    
ALTER TABLE .. PARTITION.. RENAME TO PARTITION TABLE ALL    
MSCK REPAIR TABLE TABLE ALL    
ALTER DATABASE DATABASE ALL    
DESCRIBE DATABASE DATABASE SELECT/INSERT    
SHOW COLUMNS TABLE SELECT/INSERT    
SHOW INDEXES TABLE SELECT/INSERT    
GRANT PRIVILEGE Allowed only for Sentry admin users      
REVOKE PRIVILEGE Allowed only for Sentry admin users      
SHOW GRANTS Allowed only for Sentry admin users      
ADD JAR Not Allowed      
ADD FILE Not Allowed      
DFS Not Allowed      
Impala-Only Operations
EXPLAIN TABLE SELECT    
INVALIDATE METADATA SERVER ALL    
INVALIDATE METADATA <table name> TABLE SELECT/INSERT    
REFRESH <table name> TABLE SELECT/INSERT    
CREATE FUNCTION SERVER ALL    
DROP FUNCTION SERVER ALL    
COMPUTE STATS TABLE ALL