Apache Sentry Known Issues
- YARN Dynamic Resource Pools Do Not Work with Hive When Sentry Is Enabled
- Sentry does not manage permissions of Hive directories after certain commands
- CREATE FUNCTION ... USING JAR does not work on Sentry-secured clusters
- With Sentry enabled, only Hive admin users have access to YARN job logs
- Moving a partitioned table to a new location on the filesystem does not affect ACLs set on the previous location
- Column-level privileges are not supported on Hive Metastore views
- SELECT privilege on all columns does not equate to SELECT privilege on table
- The EXPLAIN SELECT operation works without table or column-level privileges
- With HDFS sync enabled, unexpected directory permissions are set when the NameNode plugin cannot communicate with the Sentry Server
- Hive authorization (Grant/Revoke/Show) statements do not support fully qualified table names (default.tab1)
- Object types Server and URI are not supported in SHOW GRANT ROLE roleName on OBJECT objectName
- Relative URI paths not supported by Sentry
YARN Dynamic Resource Pools Do Not Work with Hive When Sentry Is Enabled
Hive jobs are not submitted into the correct YARN queue when Hive is using Sentry because Hive does not use the YARN API to resolve the user or group of the job's original submitter. This causes the job to be placed in a queue using the placement rules based on the Hive user. The HiveServer2 fair scheduler queue mapping used for "non-impersonation" mode does not handle the primary-secondary queue mappings correctly.
Products Affected: Hive, YARN, Sentry
This issue is fixed by HIVE-8634, which is available in CDH 5.3 and higher. However, in CDH 5.3, 5.4, 5.5, 5.6, and 5.7, you must configure Hive to load the scheduler configuration into HiveServer2. In CDH 5.8 or higher managed clusters, Cloudera Manager automatically deploys the fair scheduler configuration to HiveServer2, but you must make sure that your configuration complies with the Placement Rule and Submission Access requirements described below.
Resolution: Use the workaround for your version of CDH.
Workaround for CDH 5.3, 5.4, 5.5, 5.6, 5.7:
- In the Cloudera Manager Admin Console, click Cluster n, select the YARN service, and click the Instances tab.
- In the YARN Instances page, click the ResourceManager(Active) role, and then click the Processes tab.
- In the ResourceManager Processes page, click fair-scheduler.xml, which opens in a new browser tab. Copy the contents of the fair-scheduler.xml page.
- Create a new file named fair-scheduler.xml and paste the contents of the fair-scheduler.xml browser page into it. Save the file.
- In a terminal window, create a new directory on the HiveServer2 host where you can store the fair-scheduler.xml file you created in the previous step. For example, /etc/hive/fsxml.
- Upload the fair-scheduler.xml file to the directory you created in the previous step.
- In the Cloudera Manager Admin Console, click Cloudera Manager to return to the home page, and click .
- On the Hive service Configuration page, select Service-Wide scope and Advanced category.
- In the panel on the right, scroll down until you find the Hive Service Advanced Configuration Snipped (Safety Valve) for hive-site.xml, and add the
yarn.scheduler.fair.allocation.file property with the path to the fair-scheduler.xml file you created in the previous steps. For example,
if you put the fair-scheduler.xml file in the /etc/hive/fsxml directory, add the following:
<property> <name>yarn.scheduler.fair.allocation.file</name> <value>/etc/hive/fsxml/fair-scheduler.xml</value> </property>
- Click Save Changes.
- Restart the Hive service.
- Log in to a node that runs the ResourceManager.
- Navigate to the $HADOOP_HOME/conf/ directory and locate the fair-scheduler.xml file.
- Copy the fair-scheduler.xml file to the HiveServer2 node in the /etc/hive/conf directory.
Open the /etc/hive/conf/hive-site.xml file and add the following property:
<property> <name>yarn.scheduler.fair.allocation.file</name> <value>/etc/hive/conf/fair-scheduler.xml</value> </property>
- Save the hive-site.xml file.
Restart the Hive service:
$ sudo service hive-server2 stop $ sudo service hive-server2 start
Placement Rule Requirement
After HiveServer2 determines into which queue the job can be placed, it submits the job with a queue specified in the submission request. This placement request is only honored if the placement rules in the fair-scheduler.xml file has the following rule defined as the first rule in the fair-scheduler policy:
<rule name="specified" />
Make sure that this is the first rule defined in the fair-scheduler.xml file. If another rule is defined as the first rule, placement behavior cannot be determined.
Job Submission Access Requirement
The hive user must be in the list of users allowed to submit jobs to the pool. The submitting user is used to determine pool placement, but the hive user is still the user that is expected to run the job in the pool.
Sentry does not manage permissions of Hive directories after certain commands
Sentry stops managing HDFS permissions of Hive directories that were the targets of certain DDL/DML commands.
Alter tables, which are not rename or set location. For example "alter table set property"
Insert on a unpartitioned table.
Products affected: Sentry, HDFS, Hive
Affected Versions: All versions of CDH 5, except for those indicated in the ‘Addressed in release/refresh/patch’ section below.
Users affected: Users running Sentry with Hive or Impala using the HDFS ACL synchronization feature.
The table/partition directory ACLs and permissions will be reverted to the underlying HDFS values after either of these workflows
Insert on unpartitioned tables
Alter table property on any table
Immediate action required: Upgrade to fixed version as soon as possible. If affected prior to an upgrade restart the HMS to act as a reset until the DDL/DML commands are rerun.
Addressed in release/refresh/patch: CDH5.5.5 and higher, CDH5.7.2 and higher, CDH5.8.0 and higher
CREATE FUNCTION ... USING JAR does not work on Sentry-secured clusters
In a cluster without Sentry, a user is able to create a UDF using the CREATE FUNCTION ... USING <hdfs location> command in Hive, with a JAR located on HDFS. However, once Sentry is enabled, this command does not work even if the user is granted the ALL privilege to the URI on HDFS.
Affected Versions: CDH 5.7, 5.6, 5.5, 5.4
With Sentry enabled, only Hive admin users have access to YARN job logs
As a prerequisite of enabling Sentry, Hive impersonation is turned off, which means all YARN jobs are submitted to the Hive job queue, and are run as the hive user. This is an issue because the YARN History Server now has to block users from accessing logs for their own jobs, since their own usernames are not associated with the jobs. As a result, end users cannot access any job logs unless they can get sudo access to the cluster as the hdfs, hive or other admin users.
In CDH 5.8 (and higher), Hive overrides the default configuration, mapred.job.queuename, and places incoming jobs into the connected user's job queue, even though the submitting user remains hive. Hive obtains the relevant queue/username information for each job by using YARN's fair-scheduler.xml file.
Affected Versions: CDH 5.7 and lower
Fixed Versions: CDH 5.8
Moving a partitioned table to a new location on the filesystem does not affect ACLs set on the previous location
With HDFS/Sentry sync enabled, if you move a partitioned table to a new location on the filesystem using the ALTER TABLE .. SET LOCATION command, ACLs set on the previous location remain unchanged. This occurs irrespective of whether the table is managed by Sentry.
Column-level privileges are not supported on Hive Metastore views
GRANT and REVOKE for column level privileges is not supported on Hive Metastore views.
SELECT privilege on all columns does not equate to SELECT privilege on table
Users who have been explicitly granted the SELECT privilege on all columns of a table, will not have the permission to perform table-level operations. For example, operations such as SELECT COUNT (1) or SELECT COUNT (*) will not work even if you have the SELECT privilege on all columns.
There is one exception to this. The SELECT * FROM TABLE command will work even if you do not have explicit table-level access.
The EXPLAIN SELECT operation works without table or column-level privileges
Users are able to run the EXPLAIN SELECT operation, exposing metadata for all columns, even for tables/columns to which they weren't explicitly granted access.
With HDFS sync enabled, unexpected directory permissions are set when the NameNode plugin cannot communicate with the Sentry Server
With HDFS-Sentry sync enabled, if the NameNode plugin is unable to communicate with the Sentry Service for a particular period of time (configurable by the sentry.authorization-provider.cache-stale-threshold.ms property), permissions for all directories under Sentry-managed path prefixes, irrespective of whether those file paths correspond to Hive warehouse objects, will be set to the Hive System User and the Hive System Group.
Hive authorization (Grant/Revoke/Show) statements do not support fully qualified table names (default.tab1)
Workaround: Switch to the database before granting privileges on the table.
Object types Server and URI are not supported in SHOW GRANT ROLE roleName on OBJECT objectName
Workaround:Use SHOW GRANT ROLE roleNameto list all privileges granted to the role.
Relative URI paths not supported by Sentry
Sentry supports only absolute (not relative) URI paths in permission grants. Although some early releases (for example, CDH 5.7.0) may not have raised explicit errors when relative paths were set, upgrading a system that uses relative paths causes the system to lose Sentry permissions.
Affected Versions: All versions. Relative paths are not supported in Sentry for permission grants.
Resolution: Revoke privileges that have been set using relative paths, and grant permissions using absolute paths before upgrading.
|Absolute (Use this form)||Relative (Do not use this form)|