Apache Sentry Known Issues
After upgrade, HDFS ACLs are not synched due to Sentry taking too many snapshots
The schema change that was introduced in CDH 5.13.0 requires Sentry to take an HMS snapshot when you upgrade. Sentry might take too many snapshots, resulting in a period of time in which ACLs are not synched with HDFS. This can cause job failures.
Note that this issue is similar to CDH-67922, but has a different root cause.
Cloudera JIRA: CDH-70480
- CDH 5.13.0, 5.13.1, 5.13.2
- CDH 5.14.0, 5.14.1
Who is Affected: Users that currently have any CDH version below 5.13.0 and are upgrading to any of the affected versions. For example, users upgrading from CDH 5.12.2 to CDH 5.13.0 are affected, but users upgrading from 5.12.2 to 5.13.3 are not affected.
Action Required: Cloudera recommends that if you have Sentry installed, you do not upgrade to any of the affected versions. Instead, upgrade to one of the fixed versions.
- CDH 5.13.3
- CDH 5.14.2, 5.14.3
- CDH 5.15.0 and above
After upgrade, HDFS ACLs are not synched due to Sentry taking too long to retrieve the snapshot from the database
The schema change that was introduced in 5.13.0 requires Sentry to take an HMS snapshot and save it to the Sentry database when you upgrade. When the NameNode service starts after the upgrade and asks for a path update, Sentry must retrieve a full snapshot from the Sentry database and send it to the NameNode. In the Sentry database, AUTHZ_PATH does not have an index on the foreign key AUTHZ_OBJ_ID. This can cause the snapshot retrieval from the Sentry database to take a long time, resulting in ACLs not being synched with HDFS until the snapshot is complete. After Sentry retrieves the full snapshot and sends it to the NameNode, Sentry only needs to retrieve and send delta changes.
Note that this issue is similar to CDH-70480, but has a different root cause.
Cloudera JIRA: CDH-67922
- CDH 5.13.0, 5.13.1, 5.13.2, 5.13.3
- CDH 5.14.0, 5.14.1, 5.14.2, 5.14.3
- CDH 5.15.0
Who is Affected: Users that currently have any CDH version below 5.13.0 and are upgrading to any of the affected versions. For example, users upgrading from CDH 5.12.2 to CDH 5.13.3 are affected. MySQL users are not affected because MySQL automatically creates the index for the foreign key.
Action Required: After you upgrade, manually add the index to your schema with the appropriate command for the database:
CREATE INDEX "AUTHZ_PATH_FK_IDX" ON "AUTHZ_PATH" ("AUTHZ_OBJ_ID");
CREATE INDEX "AUTHZ_PATH_FK_IDX" ON "AUTHZ_PATH" USING btree ("AUTHZ_OBJ_ID");
- Apache Derby:
CREATE INDEX AUTHZ_PATH_FK_IDX ON AUTHZ_PATH (AUTHZ_OBJ_ID);
- IBM Db2:
CREATE INDEX AUTHZ_PATH_FK_IDX ON AUTHZ_PATH (AUTHZ_OBJ_ID);
Fixed Versions: This will be fixed in a future release.
You must use a relational database with Sentry HA
A flat-file cannot scale in a meaningful way with regards to search, replace, add, and delete operations - but a relational database can.
Affected Versions: CDH 5.13.0
Workaround: Use a relational database.
GRANT/REVOKE operations could fail if there are too many concurrent requests
Under a significant workload, Grant/Revoke operations can have issues.
Affected Versions: 5.13.0
Workaround: Given that grant and revoke actions are largely an administrative concern, plan en masse privilege changes accordingly.
Creating large set of Sentry roles results in performance problems
Using more then a thousand roles/permissions may cause significant performance problems.
Affected Versions: All CDH 5 versions
Workaround: In general it is better for users to have only a few roles associated as possible; then allowing for administrator roles to have very broad privileges.
Sentry takes a long time to finish HMS sync on larger object count (>2M)
When there are many Hive objects (tables and partitions) and many is in millions, it may take Sentry a few minutes (or a few tens of minutes) to fully synchronize table information between HMS and Sentry and during this time HDFS sync would not be functional. This delay us usually a one-time cost during the upgrade to 5.13 and should not be a problem during Sentry server restart.
Affected Versions: 5.13.0
Workaround: Be cognizant of the expected time to processes many objects. Scheduling the installation/upgrade during a low-usage time and when the system can simply process all of the objects would be best.
Sentry process crashes with OOM upon reaching array length limit
To prevent Sentry from crashing with OOM upon reaching array length limit, use a Java version that has that has JDK-8055949 fixed. This also affects HMS when Sentry is enabled and there are a lot of partitions.
Affected Versions: All CDH 5 versions
Workaround: Use a Java version that has that has JDK-8055949 fixed.
YARN Dynamic Resource Pools Do Not Work with Hive When Sentry Is Enabled
Hive jobs are not submitted into the correct YARN queue when Hive is using Sentry because Hive does not use the YARN API to resolve the user or group of the job's original submitter. This causes the job to be placed in a queue using the placement rules based on the Hive user. The HiveServer2 fair scheduler queue mapping used for "non-impersonation" mode does not handle the primary-secondary queue mappings correctly.
Products Affected: Hive, YARN, Sentry
This issue is fixed by HIVE-8634, which is available in CDH 5.3 and higher. However, in CDH 5.3, 5.4, 5.5, 5.6, and 5.7, you must configure Hive to load the scheduler configuration into HiveServer2. In CDH 5.8 or higher managed clusters, Cloudera Manager automatically deploys the fair scheduler configuration to HiveServer2, but you must make sure that your configuration complies with the Placement Rule and Submission Access requirements described below.
Resolution: Use the workaround for your version of CDH.
Workaround for CDH 5.3, 5.4, 5.5, 5.6, 5.7:
- In the Cloudera Manager Admin Console, click Cluster n, select the YARN service, and click the Instances tab.
- In the YARN Instances page, click the ResourceManager(Active) role, and then click the Processes tab.
- In the ResourceManager Processes page, click fair-scheduler.xml, which opens in a new browser tab. Copy the contents of the fair-scheduler.xml page.
- Create a new file named fair-scheduler.xml and paste the contents of the fair-scheduler.xml browser page into it. Save the file.
- In a terminal window, create a new directory on the HiveServer2 host where you can store the fair-scheduler.xml file you created in the previous step. For example, /etc/hive/fsxml.
- Upload the fair-scheduler.xml file to the directory you created in the previous step.
- In the Cloudera Manager Admin Console, click Cloudera Manager to return to the home page, and click .
- On the Hive service Configuration page, select Service-Wide scope and Advanced category.
- In the panel on the right, scroll down until you find the Hive Service Advanced Configuration Snipped (Safety Valve) for hive-site.xml, and add the
yarn.scheduler.fair.allocation.file property with the path to the fair-scheduler.xml file you created in the previous steps. For example,
if you put the fair-scheduler.xml file in the /etc/hive/fsxml directory, add the following:
<property> <name>yarn.scheduler.fair.allocation.file</name> <value>/etc/hive/fsxml/fair-scheduler.xml</value> </property>
- Click Save Changes.
- Restart the Hive service.
- Log in to a node that runs the ResourceManager.
- Navigate to the $HADOOP_HOME/conf/ directory and locate the fair-scheduler.xml file.
- Copy the fair-scheduler.xml file to the HiveServer2 node in the /etc/hive/conf directory.
Open the /etc/hive/conf/hive-site.xml file and add the following property:
<property> <name>yarn.scheduler.fair.allocation.file</name> <value>/etc/hive/conf/fair-scheduler.xml</value> </property>
- Save the hive-site.xml file.
Restart the Hive service:
$ sudo service hive-server2 stop $ sudo service hive-server2 start
Placement Rule Requirement
After HiveServer2 determines into which queue the job can be placed, it submits the job with a queue specified in the submission request. This placement request is only honored if the placement rules in the fair-scheduler.xml file has the following rule defined as the first rule in the fair-scheduler policy:
<rule name="specified" />
Make sure that this is the first rule defined in the fair-scheduler.xml file. If another rule is defined as the first rule, placement behavior cannot be determined.
Job Submission Access Requirement
The hive user must be in the list of users allowed to submit jobs to the pool. The submitting user is used to determine pool placement, but the hive user is still the user that is expected to run the job in the pool.
Sentry does not manage permissions of Hive directories after certain commands
Sentry stops managing HDFS permissions of Hive directories that were the targets of certain DDL/DML commands.
Alter tables, which are not rename or set location. For example "alter table set property"
Insert on a unpartitioned table.
Products affected: Sentry, HDFS, Hive
Affected Versions: All versions of CDH 5, except for those indicated in the ‘Addressed in release/refresh/patch’ section below.
Users affected: Users running Sentry with Hive or Impala using the HDFS ACL synchronization feature.
The table/partition directory ACLs and permissions will be reverted to the underlying HDFS values after either of these workflows
Insert on unpartitioned tables
Alter table property on any table
Immediate action required: Upgrade to fixed version as soon as possible. If affected prior to an upgrade restart the HMS to act as a reset until the DDL/DML commands are rerun.
Addressed in release/refresh/patch: CDH 5.5.5 and higher, CDH 5.7.2 and higher, CDH 5.8.0 and higher
CREATE FUNCTION ... USING JAR does not work on Sentry-secured clusters
In a cluster without Sentry, a user is able to create a UDF using the CREATE FUNCTION ... USING <hdfs location> command in Hive, with a JAR located on HDFS. However, once Sentry is enabled, this command does not work even if the user is granted the ALL privilege to the URI on HDFS.
Affected Versions: CDH 5.7, 5.6, 5.5, 5.4
With Sentry enabled, only Hive admin users have access to YARN job logs
As a prerequisite of enabling Sentry, Hive impersonation is turned off, which means all YARN jobs are submitted to the Hive job queue, and are run as the hive user. This is an issue because the YARN History Server now has to block users from accessing logs for their own jobs, since their own usernames are not associated with the jobs. As a result, end users cannot access any job logs unless they can get sudo access to the cluster as the hdfs, hive or other admin users.
In CDH 5.8 (and higher), Hive overrides the default configuration, mapred.job.queuename, and places incoming jobs into the connected user's job queue, even though the submitting user remains hive. Hive obtains the relevant queue/username information for each job by using YARN's fair-scheduler.xml file.
Moving a partitioned table to a new location on the filesystem does not affect ACLs set on the previous location
With HDFS/Sentry sync enabled, if you move a partitioned table to a new location on the filesystem using the ALTER TABLE .. SET LOCATION command, ACLs set on the previous location remain unchanged. This occurs irrespective of whether the table is managed by Sentry.
Column-level privileges are not supported on Hive Metastore views
GRANT and REVOKE for column level privileges is not supported on Hive Metastore views.
SELECT privilege on all columns does not equate to SELECT privilege on table
Users who have been explicitly granted the SELECT privilege on all columns of a table, will not have the permission to perform table-level operations. For example, operations such as SELECT COUNT (1) or SELECT COUNT (*) will not work even if you have the SELECT privilege on all columns.
There is one exception to this. The SELECT * FROM TABLE command will work even if you do not have explicit table-level access.
The EXPLAIN SELECT operation works without table or column-level privileges
Users are able to run the EXPLAIN SELECT operation, exposing metadata for all columns, even for tables/columns to which they weren't explicitly granted access.
With HDFS sync enabled, unexpected directory permissions are set when the NameNode plugin cannot communicate with the Sentry Server
With HDFS-Sentry sync enabled, if the NameNode plugin is unable to communicate with the Sentry Service, affected HDFS files will continue to use a cached copy of the synchronized ACLs for a configurable period of time, after which they will fall back to the Hive System User and the Hive System Group (for example, hive:hive). The timeout value can be modified by adding the sentry.authorization-provider.cache-stale-threshold.ms parameter to the hdfs-site.xml Safety Valve in Cloudera Manager. The default timeout value is 60 seconds, but you can increase this value from several minutes to a few hours, as needed to accommodate large clusters.
Hive authorization (Grant/Revoke/Show) statements do not support fully qualified table names (default.tab1)
Workaround: Switch to the database before granting privileges on the table.
Object types Server and URI are not supported in SHOW GRANT ROLE roleName on OBJECT objectName
Workaround:Use SHOW GRANT ROLE roleNameto list all privileges granted to the role.
Relative URI paths not supported by Sentry
Sentry supports only absolute (not relative) URI paths in permission grants. Although some early releases (for example, CDH 5.7.0) may not have raised explicit errors when relative paths were set, upgrading a system that uses relative paths causes the system to lose Sentry permissions.
Affected Versions: All versions. Relative paths are not supported in Sentry for permission grants.
Resolution: Revoke privileges that have been set using relative paths, and grant permissions using absolute paths before upgrading.
|Absolute (Use this form)||Relative (Do not use this form)|