Apache Sentry Known Issues

You must use a relational database with Sentry HA

A flat-file cannot scale in a meaningful way with regards to search, replace, add, and delete operations - but a relational database can.

Bug: None

Affected Versions: CDH 5.13.0

Workaround: Use a relational database.

GRANT/REVOKE operations could fail if there are too many concurrent requests

Under a significant workload, Grant/Revoke operations can have issues.

Bug: SENTRY-1855

Affected Versions: 5.13.0

Workaround: Given that grant and revoke actions are largely an administrative concern, plan en masse privilege changes accordingly.

Creating large set of Sentry roles results in performance problems

Using more then a thousand roles/permissions may cause significant performance problems.

Bug: None

Affected Versions: All CDH 5 versions

Workaround: In general it is better for users to have only a few roles associated as possible; then allowing for administrator roles to have very broad privileges.

Sentry takes a long time to finish HMS sync on larger object count (>2M)

When there are many Hive objects (tables and partitions) and many is in millions, it may take Sentry a few minutes (or a few tens of minutes) to fully synchronize table information between HMS and Sentry and during this time HDFS sync would not be functional. This delay us usually a one-time cost during the upgrade to 5.13 and should not be a problem during Sentry server restart.

Bug: None

Affected Versions: 5.13.0

Workaround: Be cognizant of the expected time to processes many objects. Scheduling the installation/upgrade during a low-usage time and when the system can simply process all of the objects would be best.

Sentry process crashes with OOM upon reaching array length limit

To prevent Sentry from crashing with OOM upon reaching array length limit, use a Java version that has that has JDK-8055949 fixed. This also affects HMS when Sentry is enabled and there are a lot of partitions.

Bug: None

Affected Versions: All CDH 5 versions

Workaround: Use a Java version that has that has JDK-8055949 fixed.

YARN Dynamic Resource Pools Do Not Work with Hive When Sentry Is Enabled

Hive jobs are not submitted into the correct YARN queue when Hive is using Sentry because Hive does not use the YARN API to resolve the user or group of the job's original submitter. This causes the job to be placed in a queue using the placement rules based on the Hive user. The HiveServer2 fair scheduler queue mapping used for "non-impersonation" mode does not handle the primary-secondary queue mappings correctly.

Products Affected: Hive, YARN, Sentry

Affected Versions:

This issue is fixed by HIVE-8634, which is available in CDH 5.3 and higher. However, in CDH 5.3, 5.4, 5.5, 5.6, and 5.7, you must configure Hive to load the scheduler configuration into HiveServer2. In CDH 5.8 or higher managed clusters, Cloudera Manager automatically deploys the fair scheduler configuration to HiveServer2, but you must make sure that your configuration complies with the Placement Rule and Submission Access requirements described below.

Bug: HIVE-8634

Resolution: Use the workaround for your version of CDH.

Workaround for CDH 5.3, 5.4, 5.5, 5.6, 5.7:

Also see Placement Rule Requirement and Job Submission Access Requirement for additional configuration information.

  • Managed clusters:

    1. In the Cloudera Manager Admin Console, click Cluster n, select the YARN service, and click the Instances tab.
    2. In the YARN Instances page, click the ResourceManager(Active) role, and then click the Processes tab.
    3. In the ResourceManager Processes page, click fair-scheduler.xml, which opens in a new browser tab. Copy the contents of the fair-scheduler.xml page.
    4. Create a new file named fair-scheduler.xml and paste the contents of the fair-scheduler.xml browser page into it. Save the file.
    5. In a terminal window, create a new directory on the HiveServer2 host where you can store the fair-scheduler.xml file you created in the previous step. For example, /etc/hive/fsxml.
    6. Upload the fair-scheduler.xml file to the directory you created in the previous step.
    7. In the Cloudera Manager Admin Console, click Cloudera Manager to return to the home page, and click Hive > Configuration.
    8. On the Hive service Configuration page, select Service-Wide scope and Advanced category.
    9. In the panel on the right, scroll down until you find the Hive Service Advanced Configuration Snipped (Safety Valve) for hive-site.xml, and add the yarn.scheduler.fair.allocation.file property with the path to the fair-scheduler.xml file you created in the previous steps. For example, if you put the fair-scheduler.xml file in the /etc/hive/fsxml directory, add the following:
      <property>
          <name>yarn.scheduler.fair.allocation.file</name>
          <value>/etc/hive/fsxml/fair-scheduler.xml</value>
      </property>
                    
    10. Click Save Changes.
    11. Restart the Hive service.
  • Unmanaged clusters:

    1. Log in to a node that runs the ResourceManager.
    2. Navigate to the $HADOOP_HOME/conf/ directory and locate the fair-scheduler.xml file.
    3. Copy the fair-scheduler.xml file to the HiveServer2 node in the /etc/hive/conf directory.
    4. Open the /etc/hive/conf/hive-site.xml file and add the following property:

      <property>
         <name>yarn.scheduler.fair.allocation.file</name>
         <value>/etc/hive/conf/fair-scheduler.xml</value>
      </property>
                    
    5. Save the hive-site.xml file.
    6. Restart the Hive service:

      $ sudo service hive-server2 stop
      $ sudo service hive-server2 start
                    

Placement Rule Requirement

After HiveServer2 determines into which queue the job can be placed, it submits the job with a queue specified in the submission request. This placement request is only honored if the placement rules in the fair-scheduler.xml file has the following rule defined as the first rule in the fair-scheduler policy:

<rule name="specified" />

Make sure that this is the first rule defined in the fair-scheduler.xml file. If another rule is defined as the first rule, placement behavior cannot be determined.

Job Submission Access Requirement

The hive user must be in the list of users allowed to submit jobs to the pool. The submitting user is used to determine pool placement, but the hive user is still the user that is expected to run the job in the pool.

Sentry does not manage permissions of Hive directories after certain commands

Sentry stops managing HDFS permissions of Hive directories that were the targets of certain DDL/DML commands.

Examples:

Alter tables, which are not rename or set location. For example "alter table set property"

Insert on a unpartitioned table.

Products affected: Sentry, HDFS, Hive

Affected Versions: All versions of CDH 5, except for those indicated in the ‘Addressed in release/refresh/patch’ section below.

Users affected: Users running Sentry with Hive or Impala using the HDFS ACL synchronization feature.

Impact:

The table/partition directory ACLs and permissions will be reverted to the underlying HDFS values after either of these workflows

Insert on unpartitioned tables

Alter table property on any table

Immediate action required: Upgrade to fixed version as soon as possible. If affected prior to an upgrade restart the HMS to act as a reset until the DDL/DML commands are rerun.

Addressed in release/refresh/patch: CDH 5.5.5 and higher, CDH 5.7.2 and higher, CDH 5.8.0 and higher

CREATE FUNCTION ... USING JAR does not work on Sentry-secured clusters

In a cluster without Sentry, a user is able to create a UDF using the CREATE FUNCTION ... USING <hdfs location> command in Hive, with a JAR located on HDFS. However, once Sentry is enabled, this command does not work even if the user is granted the ALL privilege to the URI on HDFS.

Affected Versions: CDH 5.7, 5.6, 5.5, 5.4

With Sentry enabled, only Hive admin users have access to YARN job logs

As a prerequisite of enabling Sentry, Hive impersonation is turned off, which means all YARN jobs are submitted to the Hive job queue, and are run as the hive user. This is an issue because the YARN History Server now has to block users from accessing logs for their own jobs, since their own usernames are not associated with the jobs. As a result, end users cannot access any job logs unless they can get sudo access to the cluster as the hdfs, hive or other admin users.

In CDH 5.8 (and higher), Hive overrides the default configuration, mapred.job.queuename, and places incoming jobs into the connected user's job queue, even though the submitting user remains hive. Hive obtains the relevant queue/username information for each job by using YARN's fair-scheduler.xml file.

Moving a partitioned table to a new location on the filesystem does not affect ACLs set on the previous location

With HDFS/Sentry sync enabled, if you move a partitioned table to a new location on the filesystem using the ALTER TABLE .. SET LOCATION command, ACLs set on the previous location remain unchanged. This occurs irrespective of whether the table is managed by Sentry.

Bug: SENTRY-1373

Column-level privileges are not supported on Hive Metastore views

GRANT and REVOKE for column level privileges is not supported on Hive Metastore views.

Bug: SENTRY-754

SELECT privilege on all columns does not equate to SELECT privilege on table

Users who have been explicitly granted the SELECT privilege on all columns of a table, will not have the permission to perform table-level operations. For example, operations such as SELECT COUNT (1) or SELECT COUNT (*) will not work even if you have the SELECT privilege on all columns.

There is one exception to this. The SELECT * FROM TABLE command will work even if you do not have explicit table-level access.

Bug: SENTRY-838

The EXPLAIN SELECT operation works without table or column-level privileges

Users are able to run the EXPLAIN SELECT operation, exposing metadata for all columns, even for tables/columns to which they weren't explicitly granted access.

Bug: SENTRY-849

With HDFS sync enabled, unexpected directory permissions are set when the NameNode plugin cannot communicate with the Sentry Server

With HDFS-Sentry sync enabled, if the NameNode plugin is unable to communicate with the Sentry Service for a particular period of time (configurable by the sentry.authorization-provider.cache-stale-threshold.ms property), permissions for all directories under Sentry-managed path prefixes, irrespective of whether those file paths correspond to Hive warehouse objects, will be set to the Hive System User and the Hive System Group.

Hive authorization (Grant/Revoke/Show) statements do not support fully qualified table names (default.tab1)

Bug: None

Workaround: Switch to the database before granting privileges on the table.

Object types Server and URI are not supported in SHOW GRANT ROLE roleName on OBJECT objectName

Bug: None

Workaround:Use SHOW GRANT ROLE roleNameto list all privileges granted to the role.

Relative URI paths not supported by Sentry

Sentry supports only absolute (not relative) URI paths in permission grants. Although some early releases (for example, CDH 5.7.0) may not have raised explicit errors when relative paths were set, upgrading a system that uses relative paths causes the system to lose Sentry permissions.

Affected Versions: All versions. Relative paths are not supported in Sentry for permission grants.

Resolution: Revoke privileges that have been set using relative paths, and grant permissions using absolute paths before upgrading.

Absolute (Use this form) Relative (Do not use this form)
hdfs://absolute/path/ hdfs://relative/path
s3a://bucketname/ s3a://bucketname