Apache Sentry Known Issues

Sentry doesn't support Kafka topic name with more than 64 characters

A Kafka topic name can have 249 characters, but Sentry only supports topic names up to 64 characters.

Workaround: Keep Kafka topic names to 64 charcters or less.

Affected Versions: All CDH 5.x and 6.x versions

Cloudera Issue: CDH-64317

When granting privileges, a single transaction per grant causes long delays

Sentry takes a long time to grant or revoke a large number of column-level privileges that are requested in a single statement. For example if you execute the following command:

GRANT SELECT(col1, col2, …) ON TABLE table1;

Sentry applies the grants to each column separately and the refresh process causes long delays.

Workaround: Split the grant statement up into smaller chunks. This prevents the refresh process from causing delays.

Affected Versions:

CDH: 5.14.4
CDH: 5.15.1
CDH: 5.16.0
CDH: 6.1.0

Fixed Versions:

CDH 5.16.1 and above
CDH 6.2.0 and above

Cloudera Issue: CDH-74982

Snapshot doesn't complete with null SDS entries

If you have a partition with null SDS entries, a full snapshot will never complete.

Workaround: Delete the null SDS entries from HMS and restart Sentry.

Affected Versions: CDH 5.16.1

Fixed Versions: CDH 5.16.1 and above

Cloudera Issue: CDH-75158

SHOW ROLE GRANT GROUP raises exception for a group that was never granted a role

If you run the command SHOW ROLE GRANT GROUP for a group that has never been granted a role, beeline raises an exception. However, if you run the same command for a group that does not have any roles, but has at one time been granted a role, you do not get an exception, but instead get an empty list of roles granted to the group.

Workaround: Adding a role will prevent the exception.

Affected Versions: CDH 5.16.1

Cloudera Issue: CDH-71694

Sentry allows the ALTER TABLE EXCHANGE PARTITION operation on a restricted database

If a user has ALL permissions on a database, the ALTER TABLE EXCHANGE PARTITION command allows the user to move partitions from a table that the user does not have access to. For example, if a user has ALL permissions on database A, but no permissions on database B, the user can create a table with a schema in database A that is identical to a table in database B. The user can then move partitions from database B into database A, which allows the user to view restricted data and remove that data from the source database.

After you upgrade to a version of CDH listed in the "Addressed in release" section below, a user that tries to use the EXCHANGE PARTITION command to move a partition from a restricted database will receive a "No valid privileges" error.

Products affected: Hive services running Sentry

Affected Versions:

CDH 5.13.x and below
CDH 5.14.0, 5.14.2, 5.14.3
CDH 5.15.0

Users affected: Hive users running Sentry

Date/time of detection: May 10, 2018

Severity (Low/Medium/High): 8.1 High (CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:N)

Impact: Sensitive data exposure

CVE: CVE-2018-8028

Immediate action required: Upgrade to a version of CDH with the fix.

Addressed in release/refresh/patch:

CDH 5.14.4
CDH 5.15.1
CDH 5.16.1
CDH 6.0.0

For the latest update on this issue see the corresponding Knowledge article:

TSB 2018-312: Sentry allows the "ALTER TABLE EXCHANGE PARTITION" operation on a restricted database

Performance Issue: MPath is queried for each MAuthzPathsMapping in full snapshot

MAuthzPathsMapping contains list of MPath instances. From log message, when getting path full snapshot at SentryStore.retrieveFullPathsImageCore(), DataNucleus issues a query for all MPath instances associated with each MAuthzPathsMapping. Therefore, getting full path image may take a very long time.

Workaround: The solution is to get MPath in a batch when getting full path image.

Affected Versions: CDH 5.13 and below

Apache Issue: SENTRY-2184

Cloudera Issue: None

The REVOKE GRANT OPTION also revokes privileges with no grant option

When you use the REVOKE GRANT OPTION command to revoke the GRANT privilege from a role, Hive revokes the privilege that has the GRANT OPTION and if the role also has the privilege without the grant option, that is revoked as well.

For example, if a role has SELECT and SELECT WITH GRANT OPTION privileges, and you run the REVOKE GRANT OPTION command, the role will have no SELECT privileges on that object. Both the SELECT and SELECT WITH GRANT OPTION privileges will be revoked.

Workaround: Revoke the GRANT OPTION and the privilege, and then grant the privilege.

Affected Versions: All CDH 5.x versions

Apache Issue: SENTRY-2402

Cloudera Issue: CDH-72979

After upgrade, HDFS ACLs are not synched due to Sentry taking too many snapshots

The schema change that was introduced in CDH 5.13.0 requires Sentry to take an HMS snapshot when you upgrade. Sentry might take too many snapshots, resulting in a period of time in which ACLs are not synched with HDFS. This can cause job failures.

Note that this issue is similar to CDH-67922, but has a different root cause.

Who is Affected: Users that currently have any CDH version below 5.13.0 and are upgrading to any of the affected versions. For example, users upgrading from CDH 5.12.2 to CDH 5.13.0 are affected, but users upgrading from 5.12.2 to 5.13.3 are not affected.

Action Required: Cloudera recommends that if you have Sentry installed, you do not upgrade to any of the affected versions. Instead, upgrade to one of the fixed versions.

Affected Versions:

CDH 5.13.0, 5.13.1, 5.13.2
CDH 5.14.0, 5.14.1

Fixed Versions:

CDH 5.13.3
CDH 5.14.2, 5.14.3
CDH 5.15.0 and above

Apache Issue: SENTRY-2210

Cloudera Issue: CDH-70480

After upgrade, HDFS ACLs are not synched due to Sentry taking too long to retrieve the snapshot from the database

The schema change that was introduced in 5.13.0 requires Sentry to take an HMS snapshot and save it to the Sentry database when you upgrade. When the NameNode service starts after the upgrade and asks for a path update, Sentry must retrieve a full snapshot from the Sentry database and send it to the NameNode. In the Sentry database, AUTHZ_PATH does not have an index on the foreign key AUTHZ_OBJ_ID. This can cause the snapshot retrieval from the Sentry database to take a long time, resulting in ACLs not being synched with HDFS until the snapshot is complete. After Sentry retrieves the full snapshot and sends it to the NameNode, Sentry only needs to retrieve and send delta changes.

Note that this issue is similar to CDH-70480, but has a different root cause.

Who is Affected: Users that currently have any CDH version below 5.13.0 and are upgrading to any of the affected versions. For example, users upgrading from CDH 5.12.2 to CDH 5.13.3 are affected. MySQL users are not affected because MySQL automatically creates the index for the foreign key.

Action Required: After you upgrade, manually add the index to your schema with the appropriate command for the database:

Oracle:

CREATE INDEX "AUTHZ_PATH_FK_IDX" ON "AUTHZ_PATH" ("AUTHZ_OBJ_ID");

PostgreSQL:

CREATE INDEX "AUTHZ_PATH_FK_IDX" ON "AUTHZ_PATH" USING btree ("AUTHZ_OBJ_ID");

Apache Derby:

CREATE INDEX AUTHZ_PATH_FK_IDX ON AUTHZ_PATH (AUTHZ_OBJ_ID);

IBM Db2:

CREATE INDEX AUTHZ_PATH_FK_IDX ON AUTHZ_PATH (AUTHZ_OBJ_ID);

Affected Versions:

CDH 5.13.0, 5.13.1, 5.13.2, 5.13.3
CDH 5.14.0, 5.14.1, 5.14.2, 5.14.3
CDH 5.15.0

Fixed Versions: CDH 5.16.1

Cloudera Issue: CDH-67922

You must use a relational database with Sentry HA

A flat-file cannot scale in a meaningful way with regards to search, replace, add, and delete operations - but a relational database can.

Workaround: Use a relational database.

Affected Versions: CDH 5.13.0 and above

Cloudera Issue: None

GRANT/REVOKE operations could fail if there are too many concurrent requests

Under a significant workload, Grant/Revoke operations can have issues.

Workaround: Given that grant and revoke actions are largely an administrative concern, plan en masse privilege changes accordingly.

Affected Versions: 5.13.0 and above

Apache Issue: SENTRY-1855

Cloudera Issue: CDH-56553

Creating large set of Sentry roles results in performance problems

Using more then a thousand roles/permissions may cause significant performance problems.

Workaround: In general it is better for users to have only a few roles associated as possible; then allowing for administrator roles to have very broad privileges.

Affected Versions: All CDH 5.x versions

Cloudera Issue: CDH-59010

Sentry takes a long time to finish HMS sync on larger object count (>2M)

When there are many Hive objects (tables and partitions) and many is in millions, it may take Sentry a few minutes (or a few tens of minutes) to fully synchronize table information between HMS and Sentry and during this time HDFS sync would not be functional. This delay is usually a one-time cost during the upgrade to 5.13 and should not be a problem during Sentry server restart.

Workaround: Be cognizant of the expected time to processes many objects. Scheduling the installation/upgrade during a low-usage time and when the system can simply process all of the objects would be best.

Affected Versions: 5.13.0 and above

Cloudera Issue: CDH-59310

Sentry process crashes with OOM upon reaching array length limit

To prevent Sentry from crashing with OOM upon reaching array length limit, use a Java version that has that has JDK-8055949 fixed. This also affects HMS when Sentry is enabled and there are a lot of partitions.

Workaround: Use a Java version that has that has JDK-8055949 fixed.

Affected Versions: All CDH 5 versions

Cloudera Issue: None

YARN Dynamic Resource Pools Do Not Work with Hive When Sentry Is Enabled

Hive jobs are not submitted into the correct YARN queue when Hive is using Sentry because Hive does not use the YARN API to resolve the user or group of the job's original submitter. This causes the job to be placed in a queue using the placement rules based on the Hive user. The HiveServer2 fair scheduler queue mapping used for "non-impersonation" mode does not handle the primary-secondary queue mappings correctly.

Products Affected: Hive, YARN, Sentry

Affected Versions:

This issue is fixed by HIVE-8634, which is available in CDH 5.3 and higher. However, in CDH 5.3, 5.4, 5.5, 5.6, and 5.7, you must configure Hive to load the scheduler configuration into HiveServer2. In CDH 5.8 or higher managed clusters, Cloudera Manager automatically deploys the fair scheduler configuration to HiveServer2, but you must make sure that your configuration complies with the Placement Rule and Submission Access requirements described below.

Apache Issue: HIVE-8634

Resolution: Use the workaround for your version of CDH.

Workaround for CDH 5.3, 5.4, 5.5, 5.6, 5.7:

Also see Placement Rule Requirement and Job Submission Access Requirement for additional configuration information.

Managed clusters:
1. In the Cloudera Manager Admin Console, click Cluster n, select the YARN service, and click the Instances tab.
2. In the YARN Instances page, click the ResourceManager(Active) role, and then click the Processes tab.
3. In the ResourceManager Processes page, click fair-scheduler.xml, which opens in a new browser tab. Copy the contents of the fair-scheduler.xml page.
4. Create a new file named fair-scheduler.xml and paste the contents of the fair-scheduler.xml browser page into it. Save the file.
5. In a terminal window, create a new directory on the HiveServer2 host where you can store the fair-scheduler.xml file you created in the previous step. For example, /etc/hive/fsxml.
  Important: Do not place the fair_scheduler.xml file into the standard Hive configuration directory. That directory is managed by Cloudera Manager and the file might be removed when you change other configuration settings.
6. Upload the fair-scheduler.xml file to the directory you created in the previous step.
7. In the Cloudera Manager Admin Console, click Cloudera Manager to return to the home page, and click Hive > Configuration.
8. On the Hive service Configuration page, select Service-Wide scope and Advanced category.
9. In the panel on the right, scroll down until you find the Hive Service Advanced Configuration Snipped (Safety Valve) for hive-site.xml, and add the yarn.scheduler.fair.allocation.file property with the path to the fair-scheduler.xml file you created in the previous steps. For example, if you put the fair-scheduler.xml file in the /etc/hive/fsxml directory, add the following:
```
<property>
    <name>yarn.scheduler.fair.allocation.file</name>
    <value>/etc/hive/fsxml/fair-scheduler.xml</value>
</property>
              
```
10. Click Save Changes.
11. Restart the Hive service.
  Note: An updated configuration file must be deployed each time a change is made in the fair scheduler configuration. The new file is read automatically by HiveServer2.
Unmanaged clusters:
1. Log in to a node that runs the ResourceManager.
2. Navigate to the $HADOOP_HOME/conf/ directory and locate the fair-scheduler.xml file.
3. Copy the fair-scheduler.xml file to the HiveServer2 node in the /etc/hive/conf directory.
4. Open the /etc/hive/conf/hive-site.xml file and add the following property:
```
<property>
   <name>yarn.scheduler.fair.allocation.file</name>
   <value>/etc/hive/conf/fair-scheduler.xml</value>
</property>
              
```
5. Save the hive-site.xml file.
6. Restart the Hive service:
```
$ sudo service hive-server2 stop
$ sudo service hive-server2 start
              
```
  Note: An updated configuration file must be deployed each time a change is made in the fair scheduler configuration. The new file is read automatically by HiveServer2.

Placement Rule Requirement

After HiveServer2 determines into which queue the job can be placed, it submits the job with a queue specified in the submission request. This placement request is only honored if the placement rules in the fair-scheduler.xml file has the following rule defined as the first rule in the fair-scheduler policy:

<rule name="specified" />

Make sure that this is the first rule defined in the fair-scheduler.xml file. If another rule is defined as the first rule, placement behavior cannot be determined.

Job Submission Access Requirement

The hive user must be in the list of users allowed to submit jobs to the pool. The submitting user is used to determine pool placement, but the hive user is still the user that is expected to run the job in the pool.

Sentry does not manage permissions of Hive directories after certain commands

Sentry stops managing HDFS permissions of Hive directories that were the targets of certain DDL/DML commands.

Examples:

Alter tables, which are not rename or set location. For example "alter table set property"

Insert on a unpartitioned table.

Products affected: Sentry, HDFS, Hive

Affected Versions: All versions of CDH 5, except for those indicated in the ‘Addressed in release/refresh/patch’ section below.

Users affected: Users running Sentry with Hive or Impala using the HDFS ACL synchronization feature.

Impact:

The table/partition directory ACLs and permissions will be reverted to the underlying HDFS values after either of these workflows

Insert on unpartitioned tables

Alter table property on any table

Immediate action required: Upgrade to fixed version as soon as possible. If affected prior to an upgrade restart the HMS to act as a reset until the DDL/DML commands are rerun.

Addressed in release/refresh/patch: CDH 5.5.5 and higher, CDH 5.7.2 and higher, CDH 5.8.0 and higher

`CREATE FUNCTION ... USING JAR` does not work on Sentry-secured clusters

In a cluster without Sentry, a user is able to create a UDF using the CREATE FUNCTION ... USING <hdfs location> command in Hive, with a JAR located on HDFS. However, once Sentry is enabled, this command does not work even if the user is granted the ALL privilege to the URI on HDFS.

Affected Versions: CDH 5.7, 5.6, 5.5, 5.4

Fixed Versions: CDH 5.9.0 and above

Cloudera Issue: CDH-33816

With Sentry enabled, only Hive admin users have access to YARN job logs

As a prerequisite of enabling Sentry, Hive impersonation is turned off, which means all YARN jobs are submitted to the Hive job queue, and are run as the hive user. This is an issue because the YARN History Server now has to block users from accessing logs for their own jobs, since their own usernames are not associated with the jobs. As a result, end users cannot access any job logs unless they can get sudo access to the cluster as the hdfs, hive or other admin users.

In CDH 5.8 (and higher), Hive overrides the default configuration, mapred.job.queuename, and places incoming jobs into the connected user's job queue, even though the submitting user remains hive. Hive obtains the relevant queue/username information for each job by using YARN's fair-scheduler.xml file.

Affected Versions: All CDH 5.x versions

Cloudera Issue: CDH-22890

Moving a partitioned table to a new location on the filesystem does not affect ACLs set on the previous location

With HDFS/Sentry sync enabled, if you move a partitioned table to a new location on the filesystem using the ALTER TABLE .. SET LOCATION command, ACLs set on the previous location remain unchanged. This occurs irrespective of whether the table is managed by Sentry.

Affected Versions: All CDH 5.x versions

Apache Issue: SENTRY-1373

Cloudera Issue: CDH-41784

Column-level privileges are not supported on Hive Metastore views

GRANT and REVOKE for column level privileges is not supported on Hive Metastore views.

Apache Issue: SENTRY-754

SELECT privilege on all columns does not equate to SELECT privilege on table

Users who have been explicitly granted the SELECT privilege on all columns of a table, will not have the permission to perform table-level operations. For example, operations such as SELECT COUNT (1) or SELECT COUNT (*) will not work even if you have the SELECT privilege on all columns.

There is one exception to this. The SELECT * FROM TABLE command will work even if you do not have explicit table-level access.

Apache Issue: SENTRY-838

The EXPLAIN SELECT operation works without table or column-level privileges

Users are able to run the EXPLAIN SELECT operation, exposing metadata for all columns, even for tables/columns to which they weren't explicitly granted access.

Apache Issue: SENTRY-849

With HDFS sync enabled, unexpected directory permissions are set when the NameNode plugin cannot communicate with the Sentry Server

With HDFS-Sentry sync enabled, if the NameNode plugin is unable to communicate with the Sentry Service, affected HDFS files will continue to use a cached copy of the synchronized ACLs for a configurable period of time, after which they will fall back to the Hive System User and the Hive System Group (for example, hive:hive). The timeout value can be modified by adding the sentry.authorization-provider.cache-stale-threshold.ms parameter to the hdfs-site.xml Safety Valve in Cloudera Manager. The default timeout value is 60 seconds, but you can increase this value from several minutes to a few hours, as needed to accommodate large clusters.

Affected Versions: CDH 5.14.7, 5.5.2

Fixed Versions: CDH 5.5.0 and above

Cloudera Issue: CDH-33623

Hive authorization (Grant/Revoke/Show) statements do not support fully qualified table names (`default.tab1`)

Workaround: Switch to the database before granting privileges on the table.

Affected Versions: CDH 5.12.x and below

Fixed Versions: CDH 5.13.0 and above

Cloudera Issue: CDH-19530

Object types Server and URI are not supported in `SHOW GRANT ROLE roleName on OBJECT objectName`

Workaround:Use SHOW GRANT ROLE roleNameto list all privileges granted to the role.

Affected Versions: All CDH 5.x versions

Cloudera Issue: CDH-19430

Relative URI paths not supported by Sentry

Sentry supports only absolute (not relative) URI paths in permission grants. Although some early releases (for example, CDH 5.7.0) may not have raised explicit errors when relative paths were set, upgrading a system that uses relative paths causes the system to lose Sentry permissions.

Affected Versions: All versions. Relative paths are not supported in Sentry for permission grants.

Resolution: Revoke privileges that have been set using relative paths, and grant permissions using absolute paths before upgrading.

Absolute (Use this form)	Relative (Do not use this form)
hdfs://absolute/path/	hdfs://relative/path
s3a://bucketname/	s3a://bucketname

Cloudera Search Known Issues

Apache Spark Known Issues

Apache Sentry Known Issues

Sentry doesn't support Kafka topic name with more than 64 characters

When granting privileges, a single transaction per grant causes long delays

Snapshot doesn't complete with null SDS entries

SHOW ROLE GRANT GROUP raises exception for a group that was never granted a role

Sentry allows the ALTER TABLE EXCHANGE PARTITION operation on a restricted database

Performance Issue: MPath is queried for each MAuthzPathsMapping in full snapshot

The REVOKE GRANT OPTION also revokes privileges with no grant option

After upgrade, HDFS ACLs are not synched due to Sentry taking too many snapshots

After upgrade, HDFS ACLs are not synched due to Sentry taking too long to retrieve the snapshot from the database

You must use a relational database with Sentry HA

GRANT/REVOKE operations could fail if there are too many concurrent requests

Creating large set of Sentry roles results in performance problems

Sentry takes a long time to finish HMS sync on larger object count (>2M)

Sentry process crashes with OOM upon reaching array length limit

YARN Dynamic Resource Pools Do Not Work with Hive When Sentry Is Enabled

Placement Rule Requirement

Job Submission Access Requirement

Sentry does not manage permissions of Hive directories after certain commands

CREATE FUNCTION ... USING JAR does not work on Sentry-secured clusters

With Sentry enabled, only Hive admin users have access to YARN job logs

Moving a partitioned table to a new location on the filesystem does not affect ACLs set on the previous location

Column-level privileges are not supported on Hive Metastore views

SELECT privilege on all columns does not equate to SELECT privilege on table

The EXPLAIN SELECT operation works without table or column-level privileges

With HDFS sync enabled, unexpected directory permissions are set when the NameNode plugin cannot communicate with the Sentry Server

Hive authorization (Grant/Revoke/Show) statements do not support fully qualified table names (default.tab1)

Object types Server and URI are not supported in SHOW GRANT ROLE roleName on OBJECT objectName

Relative URI paths not supported by Sentry

`CREATE FUNCTION ... USING JAR` does not work on Sentry-secured clusters

Hive authorization (Grant/Revoke/Show) statements do not support fully qualified table names (`default.tab1`)

Object types Server and URI are not supported in `SHOW GRANT ROLE roleName on OBJECT objectName`