Known Issues and Limitations in CDH 6.0.1

The following sections describe the known issues in CDH 6.0.1, grouped by component:

Operating System Known Issues

Known issues and workarounds related to operating systems are listed below.

Continue reading:

Linux kernel security patch and CDH services crashes CVE-2017-10000364
Leap-Second Events

Linux kernel security patch and CDH services crashes CVE-2017-10000364

After applying a recent Linux kernel security patch for CVE-2017-1000364, CDH services that use the JSVC set of libraries crash with a Java Virtual Machine (JVM) error such as:

A fatal error has been detected by the Java Runtime Environment:
SIGBUS (0x7) at pc=0x00007fe91ef6cebc, pid=30321, tid=0x00007fe930c67700

Cloudera services for HDFS and Impala cannot start after applying the patch.

Commonly used Linux distributions are shown in the table below. However, the issue affects any CDH release that runs on RHEL, CentOS, Oracle Linux, SUSE Linux, or Ubuntu and that has had the Linux kernel security patch for CVE-2017-1000364 applied.

If you have already applied the patch for your OS according to the advisories for CVE-2017-1000364, apply the kernel update that contains the fix for your operating system (some of which are listed in the table). If you cannot apply the kernel update, you can workaround the issue by increasing the Java thread stack size as detailed in the steps below.

Distribution	Advisories for CVE-2017-1000364	Advisory updates
Oracle Linux 6	ELSA-2017-1486	Oracle has fixed this problem in ELSA-2017-1723.
Oracle Linux 7	ELSA-2017-1484	Oracle has also added the fix for Oracle Linux 7 in ELBA-2017-1674.
RHEL 6	RHSA-2017-1486	RedHat has fixed this problem for RHEL 6, marked this as outdated and superseded by RHSA-2017-1723.
RHEL 7	RHSA-2017-1484	RedHat has fixed this problem for RHEL 7 and has marked this patch as outdated and superseded by RHBA-2017-1674.
SLES	CVE-2017-1000364	SUSE has also fixed this problem and the patch names are included in this same advisory.

Workaround

If you cannot apply the kernel update, you can set the Java thread stack size to -Xss1280k for the affected services using the appropriate Java configuration option or the environment advanced configuration snippet, as detailed below.

For role instances that have specific Java configuration options properties:

Log in to Cloudera Manager Admin Console.
Select Clusters > Impala, and then click the Configuration tab.
Type java in the search field to display Java related configuration parameters. The Java Configuration Options for Catalog Server property field displays. Type -Xss1280k in the entry field, adding to any existing settings.
Click Save Changes.
Navigate to the HDFS service by selecting Clusters > HDFS.
Click the Configuration tab.
Click the Scope filter DataNode. The Java Configuration Options for DataNode field displays among the properties listed. Enter -Xss1280k into the field, adding to any existing properties.
Click Save Changes.
Select the Scope filter NFS Gateway. The Java Configuration Options for NFS Gateway field displays among the properties listed. Enter -Xss1280k into the field, adding to any existing properties.
Click Save Changes.
Restart the affected roles (or configure the safety valves in next section and restart when finished with all configurations).

For role instances that do not have specific Java configuration options:

Log in to Cloudera Manager Admin Console.
Select Clusters > Impala, and then click the Configuration tab.
Click the Scope filter Impala Daemon and Category filter Advanced.
Type impala daemon environment in the search field to find the safety valve entry field.
In the Impala Daemon Environment Advanced Configuration Snippet (Safety Valve), enter:
```
JAVA_TOOL_OPTIONS=-Xss1280K
```
Click Save Changes.
Click the Scope filter Impala StateStore and Category filter Advanced.
In the Impala StateStore Environment Advanced Configuration Snippet (Safety Valve), enter:
```
JAVA_TOOL_OPTIONS=-Xss1280K
```
Click Save Changes.
Restart the affected roles.

The table below summarizes the parameters that can be set for the affected services:

Service	Settable Java Configuration Option
HDFS DataNode	Java Configuration Options for DataNode
HDFS NFS Gateway	Java Configuration Options for NFS Gateway
Impala Catalog Server	Java Configuration Options for Catalog Server
Impala Daemon	Impala Daemon Environment Advanced Configuration Snippet (Safety Valve)
Impala Daemon	`JAVA_TOOL_OPTIONS=-Xss1280K`
Impala StateStore	Impala StateStore Environment Advanced Configuration Snippet (Safety Valve)
Impala StateStore	`JAVA_TOOL_OPTIONS=-Xss1280K`

Cloudera Issue: CDH-55771

Leap-Second Events

Impact: After a leap-second event, Java applications (including CDH services) using older Java and Linux kernel versions, may consume almost 100% CPU. See https://access.redhat.com/articles/15145.

Leap-second events are tied to the time synchronization methods of the Linux kernel, the Linux distribution and version, and the Java version used by applications running on affected kernels.

Although Java is increasingly agnostic to system clock progression (and less susceptible to a kernel's mishandling of a leap-second event), using JDK 7 or 8 should prevent issues at the CDH level (for CDH components that use the Java Virtual Machine).

Immediate action required:

(1) Ensure that the kernel is up to date.

RHEL6/7, CentOS 6/7 - 2.6.32-298 or higher
Oracle Enterprise Linux (OEL) - Kernels built in 2013 or later
SLES12 - No action required.

(2) Ensure that your Java JDKs are current (especially if the kernel is not up to date and cannot be upgraded).

Java 8 - No action required.

(3) Ensure that your systems use either NTP or PTP synchronization.

For systems not using time synchronization, update both the OS tzdata and Java tzdata packages to the tzdata-2016g version, at a minimum. For OS tzdata package updates, contact OS support or check updated OS repositories. For Java tzdata package updates, see Oracle's Timezone Updater Tool.

Cloudera Issue: CDH-44788, TSB-189

Apache Accumulo Known Issues

Running Apache Accumulo on top of a CDH 6.0.x cluster is not currently supported. If you try to upgrade to CDH 6.0.x you will be asked to remove the Accumulo service from your cluster. Running Accumulo on top of CDH 6 will be supported in a future release.

Affected Versions: CDH 6.0.x

Cloudera Data Science Workbench

Cloudera Data Science Workbench is not supported with CDH 6.0.x. Cloudera Data Science Workbench 1.5.0 (and higher) is supported with CDH 6.1.x (and higher).

Cloudera Issue: DSE-2769

Apache Crunch Known Issues

Apache Flume Known Issues

Fast Replay does not work with encrypted File Channel

If an encrypted file channel is set to use fast replay, the replay will fail and the channel will fail to start.

Workaround: Disable fast replay for the encrypted channel by setting use-fast-replay to false.

Apache Issue: FLUME-1885

Apache Hadoop Known Issues

This page includes known issues and related topics, including:

Deprecated Properties
Hadoop Common
HDFS
MapReduce2 and YARN

Deprecated Properties

Several Hadoop and HDFS properties have been deprecated as of Hadoop 3.0 and later. For details, see Deprecated Properties.

Hadoop Common

KMS Load Balancing Provider Fails to invalidate Cache on Key Delete

The KMS Load balancing Provider has not been correctly invalidating the cache on key delete operations. The failure to invalidate the cache on key delete operations can result in the possibility that data can be leaked from the framework for a short period of time based on the value of the hadoop.kms.current.key.cache.timeout.ms property. Its default value is 30,000ms. When the KMS is deployed in an HA pattern the KMSLoadBalancingProvider class will only send the delete operation to one KMS role instance in a round-robin fashion. The code lacks a call to invalidate the cache across all instances and can leave key information including the metadata and key stored (the deleted key) in the cache on one or more KMS instances up to the key cache timeout.

Apache issue:

Products affected:

Releases affected:

CDH 5.x
CDH 6.x
CDP 7.0.x
CDP 7.1.4 and earlier
HDP 2.6 and later

Users affected: Customers with Data-at-rest encryption enabled that have more than 1 kms role instance and the services Key Cache enabled.

Impact: Key Meta-data and Key material may remain active within the service cache.

Severity: Medium

Action required:

CDH customers: Upgrade to CDP 7.1.5 or request a patch
HDP customers: Request a patch

Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2020-434: KMS Load Balancing Provider Fails to invalidate Cache on Key Delete

Hadoop LdapGroupsMapping does not support LDAPS for self-signed LDAP server

Hadoop LdapGroupsMapping does not work with LDAP over SSL (LDAPS) if the LDAP server certificate is self-signed. This use case is currently not supported even if Hadoop User Group Mapping LDAP TLS/SSL Enabled, Hadoop User Group Mapping LDAP TLS/SSL Truststore, and Hadoop User Group Mapping LDAP TLS/SSL Truststore Password are filled properly.

Affected Versions: CDH 5.x and 6.0.x versions

Fixed Versions: CDH 6.1.0

Apache Issue: HADOOP-12862

Cloudera Issue: CDH-37926

HDFS

Possible HDFS Erasure Coded (EC) Data Files Corruption in EC Reconstruction

Cloudera has detected two bugs that can cause corruption of HDFS Erasure Coded (EC) files during the data reconstruction process.

The first bug can be hit during DataNode decommissioning. Due to a bug in the data reconstruction logic during decommissioning, some parity blocks may be generated with a content of all zeros.

Usually the NameNode makes a simple copy of the block when re-replicating it during decommissioning. However, if a decommissioning DataNode is already assigned with more than the replication streams hard limit (It can be set by using the dfs.namenode.replication.max-streams-hard-limit property. Its default value is 4.), the node will be treated as busy and instead of performing a simple copy, the parity blocks may be reconstructed as all zeros.

Subsequently if any other data blocks in the same EC group are lost (due to node failure or disk failure), the reconstruction may use a bad parity block to generate bad data blocks. So, once parity blocks are corrupted, any further reconstruction in the same block group can propagate further corruptions in the same block group.

The second issue occurs in a corner case when a DataNode times out in the reconstruction process. It will reschedule a read from another good DataNode. However, the stale DataNode reader may have polluted the buffer and subsequent reconstruction which uses the polluted buffer will suffer from EC block corruption.

Products affected:

CDH
HDP
CDP Private Cloud Base

Releases affected: All Cloudera releases based on Apache Hadoop 3.0 and later

CDH 6.0.x
CDH 6.1.x
CDH 6.2.x
CDH 6.3.x
HDP 3.1.x
CDP 7.1.x

Users affected: A customer may be affected by this corruption if they are:

Using an affected version of the product.
Have enabled EC policy on one or more HDFS directories and have some EC files.
Decommissioned DataNodes after enabling the EC policy will increase the probability of corruption.
Rarely EC reconstructions can create dirty buffer issues which will lead to data corruption.

To determine whether you have any EC files on your cluster, run the following fsck command:

hdfs fsck / -files | grep "erasure-coded: policy="
/ectest/dirWithPolicy/sample-sales-1.csv 215 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s): OK

If there are any file paths listed in the output of the above command, and if you have decommissioned DataNodes after creating those files, your EC files may have been affected by this bug.

If no files were listed by the above command, then your data is not affected. However, if you plan to use EC or if you have enabled EC policy on any directory in the past, then we strongly recommend requesting a hotfix from Cloudera.

Severity: High

Impact: With erasure coded files in the cluster, if you have done the decommission, the data files are potentially corrupted. HDFS/NameNode cannot self-detect and self-recover the corrupted files. This is because checksums are also updated during reconstruction. So, the HDFS client may not detect the corruption while reading the affected blocks, however applications may be impacted. Even in the case of normal reconstruction, the second dirty buffer issue can trigger corruption.

Workaround:

If EC is enabled, request for a hotfix immediately from Cloudera.
In case EC was enabled and decommission of DataNodes was performed in the past after enabling EC, Cloudera has implemented tools to check the possibility of corruption. Contact Cloudera support in such a situation.
If no decommission was done in the past after enabling EC, then it is recommended not to perform decommission of DataNodes until the hotfix is applied.

Knowledge article: For the latest update on this issue see the corresponding Knowledge article: Cloudera Customer Advisory: Possible HDFS Erasure Coded (EC) Data Files Corruption in EC Reconstruction

HDFS Snapshot corruption

A fix to HDFS snapshot functionality caused a regression in the affected CDH releases. When a snapshot is deleted, internal data structure in the NameNode can become inconsistent and the checkpoint operation on the Standby NameNode can fail.

Products affected: HDFS

Releases affected:

CDH 5.4.0 - 5.15.1, 5.16.0
CDH 6.0.0 - 6.2.1, 6.3.0, 6.3.1, 6.3.2

Users affected: Any clusters with HDFS Snapshots enabled

Impact: A fix to HDFS snapshot functionality caused a regression in the affected CDH releases. When a snapshot is deleted, internal data structure in the NameNode can become inconsistent and the checkpoint operation on the Standby NameNode can fail.

Standby NameNode detects the inconsistent snapshot data structure and shuts itself down. To recover from this situation, the fsimage must be repaired and put back into both NameNodes' fsimage directory for the Standby NameNode to start normally. The Active NameNode stays up. However no fsimage checkpoint is performed because the Standby NameNode is down.

This problem can also prevent snapshots from being deleted or files within snapshots being listed. The following is an example of a typical error:

hdfs dfs -deleteSnapshot /path snapshot_123
deleteSnapshot: java.lang.IllegalStateException

The recovery of the corrupt fsimage can result in the loss of snapshots.

Immediate action required:

Upgrade: Update to a version of CDH containing the fix.
Workaround: Alternatively, avoid using snapshots. Cloudera BDR uses snapshots automatically when the relevant directories are snapshottable. Hence, we strongly recommend avoiding the upgrade to the affected releases if you are using BDR. For information and instructions, see Enabling and Disabling HDFS Snapshots.

Addressed in release/refresh/patch: CDH 6.3.3

Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2020-390: HDFS Snapshot corruption

CVE-2018-1296 Permissive Apache Hadoop HDFS listXAttr Authorization Exposes Extended Attribute Key/Value Pairs

AHDFS exposes extended attribute key/value pairs during listXAttrs, verifying only path-level search access to the directory rather than path-level read permission to the referent.

Products affected: Apache HDFS

Releases affected:

CDH 5.4.0 - 5.15.1, 5.16.0
CDH 6.0.0, 6.0.1, 6.1.0

Users affected: Users who store sensitive data in extended attributes, such as users of HDFS encryption.

Date/time of detection: Dcember 12, 2017

Detected by: Rushabh Shah, Yahoo! Inc., Hadoop committer

Severity (Low/Medium/High): Medium

Impact: HDFS exposes extended attribute key/value pairs during listXAttrs, verifying only path-level search access to the directory rather than path-level read permission to the referent. This affects features that store sensitive data in extended attributes.

CVE: CVE-2018-1296

Immediate action required:

Upgrade: Update to a version of CDH containing the fix.
Workaround: If a file contains sensitive data in extended attributes, users and admins need to change the permission to prevent others from listing the directory that contains the file.

Addressed in release/refresh/patch:

CDH 5.15.2, 5.16.1
CDH 6.1.1, 6.2.0

OIV ReverseXML processor fails

The HDFS OIV ReverseXML processor fails if the XML file contains escaped characters.

Affected Versions: CDH 6.x

Apache Issue: HDFS-12828

Cannot move encrypted files to trash

With HDFS encryption enabled, you cannot move encrypted files or directories to the trash directory.

Workaround: To remove encrypted files/directories, use the following command with the -skipTrash flag specified to bypass trash.

rm -r -skipTrash /testdir

Affected Versions: All CDH versions

Apache Issue: HADOOP-10902

HDFS NFS gateway and CDH installation (using packages) limitation

HDFS NFS gateway works as shipped ("out of the box") only on RHEL-compatible systems, but not on SLES or Ubuntu. Because of a bug in native versions of portmap/rpcbind, the HDFS NFS gateway does not work out of the box on SLES or Ubuntu systems when CDH has been installed from the command-line, using packages. It does work on supported versions of RHEL-compatible systems on which rpcbind-0.2.0-10.el6 or later is installed, and it does work if you use Cloudera Manager to install CDH, or if you start the gateway as root. For more information, see CDH and Cloudera Manager Supported Operating Systems.

Workarounds and caveats:

On Red Hat and similar systems, make sure rpcbind-0.2.0-10.el6 or later is installed.
On SLES and Ubuntu systems, do one of the following:
- Install CDH using Cloudera Manager; or
- Start the NFS gateway as root; or
- Start the NFS gateway without using packages; or
- You can use the gateway by running rpcbind in insecure mode, using the -i option, but keep in mind that this allows anyone from a remote host to bind to the portmap.

Upstream Issue: 731542 (Red Hat), 823364 (SLES)

No error when changing permission to 777 on .snapshot directory

Snapshots are read-only; running chmod 777 on the .snapshots directory does not change this, but does not produce an error (though other illegal operations do).

Affected Versions: All CDH versions

Apache Issue: HDFS-4981

Snapshot operations are not supported by ViewFileSystem

Affected Versions: All CDH versions

Snapshots do not retain directories' quotas settings

Affected Versions: All CDH versions

Apache Issue: HDFS-4897

Permissions for dfs.namenode.name.dir incorrectly set

Hadoop daemons should set permissions for the dfs.namenode.name.dir (or dfs.name.dir) directories to drwx------ (700), but in fact these permissions are set to the file-system default, usually drwxr-xr-x (755).

Workaround: Use chmod to set permissions to 700.

Affected Versions: All CDH versions

Apache Issue: HDFS-2470

hadoop fsck -move does not work in a cluster with host-based Kerberos

Workaround: Use hadoop fsck -delete

Affected Versions: All CDH versions

Apache Issue: None

Block report can exceed maximum RPC buffer size on some DataNodes

On a DataNode with a large number of blocks, the block report may exceed the maximum RPC buffer size.

Workaround: Increase the value ipc.maximum.data.length in hdfs-site.xml:

<property>
  <name>ipc.maximum.data.length</name>
  <value>268435456</value>
</property>

Affected Versions: All CDH versions

Apache Issue: None

MapReduce2 and YARN

YARN Resource Managers will stay in standby state after failover or startup

On startup or failover the YARN Resource Manager will stay in the standby state due to a failure to load the recovery data. The failure is logged as a Null Pointer exception in the YARN Resource Manager log:

ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to load/recover state
java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt

This issue is fixed as YARN-7913.

Products affected: CDH with Fair Scheduler

Releases affected:

CDH 6.0.x
CDH 6.1.x
CDH 6.2.0, CDH 6.2.1
CDH 6.3.0, CDH 6.3.1, CDH 6.3.2, CDH 6.3.3

User affected:

Any cluster running the Hadoop YARN service with the following configuration:

Scheduler set to Fair Scheduler
The YARN Resource Manager Work Preserving Recovery feature is enabled. That includes High Available setups.

Impact:

On startup or failover the YARN Resource Manager will process the state store to recover the workload that is currently running in the cluster. The recovery fails with a “null pointer exception” being logged.

Due to the recovery failure the YARN Resource Manager will not become active. In a cluster with High Availability configured the standby YARN Resource Manager will fail with the same exception leaving both YARN Resource Managers in a standby state. Even if the YARN Resource Managers are restarted, they still stay in standby state.

Immediate action required:

Customers requiring an urgent fix who are using CDH 6.2.x or earlier: Raise a support case to request a new patch.
Customers on CDH 6.3.x: Upgrade to the latest maintenance release.

Addressed in release/refresh/patch:

CDH 6.3.4

Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2020-408: YARN Resource Managers will stay in standby state after failover or startup snapshot

NodeManager fails because of the changed default location of container executor binary

The default location of container-executor binary and .cfg files was changed to /var/lib/yarn-ce. It used to be /opt/cloudera/parcels/<CDH_parcel_version>. Because of this change, if you did not have the mount options -noexec and -nosuid set on /opt, the NodeManager can fail to start up as these options are set on /var.

Affected versions CDH 5.16.1, All CDH 6 versions

Workaround: Either remove the -noexec and -nosuid mount options on /var or change the container-executor binary and .cdf path using the CMF_YARN_SAFE_CONTAINER_EXECUTOR_DIR environment variable.

The Standby Resource Manager redirects /jmx and /metrics requests to the Active Resource Manager.

When ResourceManager high availability is enabled the Standby Resource Manager redirects /jmx and /metrics requests to the Active Resource Manager. This causes the following issues in Cloudera Manager:

If Enable Kerberos Authentication for HTTP Web-Console is disabled: Cloudera Manager shows statistics for the wrong server.
If Enable Kerberos Authentication for HTTP Web-Console is enabled: connection from the agent to the standby fails with the HTTPError: HTTP Error 401: Authentication required error message. As a result, the health of the Standby Resource Manager will become bad.

Workaround: N/A

Affected Versions: CDH 6.0.x, CDH 6.1.0

Fixed Version: CDH 6.1.1

Cloudera Issue: CDH-76040

YARN's Continuous Scheduling can cause slowness in Oozie

When Continuous Scheduling is enabled in Yarn, this can cause slowness in Oozie due to long delays in communicating with Yarn. In Cloudera Manager 5.9.0 and higher, Enable Fair Scheduler Continuous Scheduler is turned off by default.

Workaround: Turn off Enable Fair Scheduler Continuous Scheduling in Cloudera Manager YARN Configuration. To keep equivalent benefits of this feature, turn on Fair Scheduler Assign Multiple Tasks.

Affected Versions: All CDH versions

Cloudera Issue: CDH-60788

JobHistory URL mismatch after server relocation

After moving the JobHistory Server to a new host, the URLs listed for the JobHistory Server on the ResourceManager web UI still point to the old JobHistory Server. This affects existing jobs only. New jobs started after the move are not affected.

Workaround: For any existing jobs that have the incorrect JobHistory Server URL, there is no option other than to allow the jobs to roll off the history over time. For new jobs, make sure that all clients have the updated mapred-site.xml that references the correct JobHistory Server.

Affected Versions: All CDH versions

Apache Issue: None

History link in ResourceManager web UI broken for killed Spark applications

When a Spark application is killed, the history link in the ResourceManager web UI does not work.

Workaround: To view the history for a killed Spark application, see the Spark HistoryServer web UI instead.

Affected Versions: All CDH versions

Apache Issue: None

Cloudera Issue: CDH-49165

Routable IP address required by ResourceManager

ResourceManager requires routable host:port addresses for yarn.resourcemanager.scheduler.address, and does not support using the wildcard 0.0.0.0 address.

Workaround: Set the address, in the form host:port, either in the client-side configuration, or on the command line when you submit the job.

Affected Versions: All CDH versions

Apache Issue: None

Cloudera Issue: CDH-6808

Amazon S3 copy may time out

The Amazon S3 filesystem does not support renaming files, and performs a copy operation instead. If the file to be moved is very large, the operation can time out because S3 does not report progress during the operation.

Workaround: Use -Dmapred.task.timeout=15000000 to increase the MR task timeout.

Affected Versions: All CDH versions

Apache Issue: MAPREDUCE-972

Cloudera Issue: CDH-17955

Apache HBase Known Issues

Cloudera Navigator plugin impacts HBase performance

Navigator Audit logging for HBase access can have a big impact on HBase performance most noticeable during data ingestion.

Component affected: HBase

Products affected: CDH

Releases affected: CDH 6.x

Impact: 4x performance increase was observed in batchMutate calls after disabling Navigator Audit.

Severity: High

Workaround:

In Cloudera Manager, navigate to HBase > Configuration.
Find the Enable Audit Collection property and clear it.
Restart the HBase service.

Upgrade: Upgrade to CDP where Navigator is no longer used.

HBASE-25206: snapshot and cloned table corruption when original table is deleted

HBASE-25206 can cause data loss either through corrupting an existing hbase snapshot or destroying data that backs a clone of a previous snapshot.

Component affected: HBase

Products affected:

Releases affected:

CDH 6.x.x
HDP 3.1.5
CDP PVC Base 7.1.x
Cloudera Runtime (Public Cloud) 7.0.x
Cloudera Runtime (Public Cloud) 7.1.x
Cloudera Runtime (Public Cloud) 7.2.0
Cloudera Runtime (Public Cloud) 7.2.1
Cloudera Runtime (Public Cloud) 7.2.2

Users affected: Users of the affected releases.

Impact: Potential risk of Data Loss.

Severity: High

Workaround:

Make HBase do the clean up work for the splits:
- Before dropping a table that has any snapshots, first ensure that any regions that resulted from a split have fully rewritten their data and cleanup has happened for the original host region.
- If there are any remaining children of a split that have links to their parent still, then we first need to issue a major compaction for those regions (or the entire table).
- After doing the major compaction we need to ensure it has finished before proceeding. There should no longer be any split pointers (named like "<target hfile>.<target region>").
- Whether or not we needed to do a major compaction we must always tell the catalog janitor to run to ensure the hfiles from any parent regions are moved to the archive.
- We must wait for the catalog janitor to finish.
- At this point it is safe to delete the original table without data loss.
Manually do the archiving:
- Alternatively, as a part of deleting a table we can manually move all of its files into the archive. First disable the table. Next make sure each region and family combination that is present in the active data area is present in the archive. Finally move all hfiles and links from the active area to the archive.
- At this point it is safe to drop the table.

Upgrade: Upgrade to a CDP version contianing the fix.

Addressed in release/refresh/patch: Cloudera Runtime 7.2.6.0

Apache issue: HBASE-25206

KB article: For the latest update on this issue see the corresponding Knowledge article: TSB 2021-453: HBASE-25206 "snapshot and cloned table corruption when original table is deleted"

HBase Performance Issue

The HDFS short-circuit setting dfs.client.read.shortcircuit is overwritten to disabled by hbase-default.xml. HDFS short-circuit reads bypass access to data in HDFS by using a domain socket (file) instead of a network socket. This alleviates the overhead of TCP to read data from HDFS which can have a meaningful improvement on HBase performance (as high as 30-40%).

Users can restore short-circuit reads by explicitly setting dfs.client.read.shortcircuit in HBase configuration via the configuration management tool for their product (e.g. Cloudera Manager or Ambari).

Products affected:

Releases affected:

CDP 7.x
CDH 6.x
HDP 3.x

Impact: HBase reads with high data-locality will not execute as fast as previously. HBase random read performance is heavily affected as random reads are expected to have low latency (e.g. Get, Multi-Get). Scan workloads would also be affected, but may be less impacted as latency of scans is greater.

Severity: High

Action required: The following workaround can be taken to enable short-circuit read.

Cloudera Manager:
HBase → Configurations → HBase (Service-wide) → HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml→

dfs.client.read.shortcircuit=true

dfs.domain.socket.path=< Add same value which is configured in hdfs-site.xml >
Ambari:
HBase → CONFIGS → Advanced → Custom hbase-site →

dfs.client.read.shortcircuit=true

dfs.domain.socket.path=< Add same value which is configured in hdfs-site.xml >

After making these configuration changes, restart the HBase service.

Cloudera will continue to pursue product changes which may alleviate the need to make these configuration changes.

For CDP 7.1.1.0 and newer, the metric shortCircuitBytesRead can be viewed for each RegionServer under the RegionServer/Server JMX metrics endpoint. When short circuit reads are not enabled, this metric will be zero. When short circuit reads are enabled and the data locality for this RegionServer is greater than zero, the metric should be greater than zero.

Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2021-463: HBase Performance Issue

Default limits for PressureAwareCompactionThroughputController are too low

HDP and CDH releases suffer from low compaction throughput limits, which cause storefiles to back up faster than compactions can re-write them. This was originally identified upstream in HBASE-21000.

Products affected:

Releases affected:

HDP 3.0.0 through HDP 3.1.2
CDH 6.0.x
CDH 6.1.x
CDH 6.2.x
CDH 6.3.0, 6.3.1, 6.3.2, 6.3.3

Users affected: Users of above mentioned HDP and CDH versions.

Severity: Medium

Impact: For non-read-only workloads, this will eventually cause back-pressure onto new writes when the blocking store files limit is reached.

Action required:

Upgrade: Upgrade to the latest release version: CDP 7.1.4, HDP 3.1.5, CDH 6.3.4
Workaround:
- Set the hbase.hstore.compaction.throughput.higher.bound property to 104857600 and the hbase.hstore.compaction.throughput.lower.bound property to 52428800 in hbase-site.xml.
- An alternative solution is to set the hbase.regionserver.throughput.controller property to org.apache.hadoop.hbase.regionserver.throttle.NoLimitThroughputController which will remove all compaction throughput limitations (which has been observed to cause other pressure).

Apache issue: HBASE-21000

Knowledge article: For the latest update on this issue see the corresponding Knowledge article: Cloudera Customer Advisory: Default limits for PressureAwareCompactionThroughputController are too low

Data loss with restore snapshot

The restore snapshot command causes data loss when the target table was split or truncated after snapshot creation.

Products affected: HBase

Releases affected:

CDH 6.0.x
CDH 6.1.x
CDH 6.2.0
CDH 6.3.0

User affected: Users relying on Restore Snapshot functionality.

Impact: Restored table could have missing data when split or truncate happened after snapshot creation.

Immediate action required: Update to a version of CDH containing the fix.

Workaround:Do not use Restore Snapshot. The same functionality can be achieved if the table is deleted and Clone Snapshot is used instead of restoring a table’s state to the snapshot.

hbase> disable 'table'
hbase> drop 'table'
hbase> clone_snapshot 'snapshot_name', 'table'
hbase> enable 'table'

Addressed in release/refresh/patch:

CDH 6.2.1
CDH 6.3.2

Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2020-379: Data loss with restore snapshot

CDH users must not use Apache HBase's OfflineMetaRepair tool

OfflineMetaRepair helps you to rebuild the HBase meta table from the underlying file system. This tool is often used to correct meta table corruption or loss. It is designed to work only with hbase-1.x (CDH 5.x). Users must not run the OfflineMetaRepair tool against CDH 6.x since it uses hbase-2.x. If a user runs OfflineMetaRepair tool in CDH 6.x, then it will break or corrupt the HBase meta table.

If you have already corrupted your meta table or you believe your meta table requires the use of something like the former OfflineMetaRepair tool, do not attempt any further changes, contact Cloudera Support.

Products affected: CDH

Releases affected:

CDH 6.0.0, 6.0.1
CDH 6.1.0, 6.1.1
CDH 6.2.0
CDH 6.3.0

User affected: Clusters with HBase installed.

Impact: Cluster becomes inoperable.

Immediate action required: Update to a version of CDH containing the fix.

Workaround: Do not run OfflineMetaRepair tool.

Addressed in release/refresh/patch:

CDH 6.2.1
CDH 6.3.2

Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2020-376: CDH users must not use Apache HBase's OfflineMetaRepair tool

Multiple HBase Services on the Same CDH Cluster is not Supported

Cloudera Manager does not allow to deploy multiple HBase services on the same host of an HDFS cluster as by design a DataNode can only have a single HBase service per host. It is possible to have two HBase services on the same HDFS cluster but they have to be on different DataNodes, meaning that there will be one RegionServer per DataNode per HBase cluster. However, that requires additional configuration, for example you have to pin /hbase_enc and /hbase to avoid the HDFS balancer to cluster. However, that requires additional configuration, for example you have to pin /hbase_enc and /hbase to avoid the HDFS balancer to cause issues with data locality.

If Cloudera Manager is not used, you can manage multiple configurations per host for different RegionServers that are part of different HBase clusters but that can lead to multiple issues and difficult troubleshooting procedures. Thus, Cloudera does not support managing multiple HBase services on the same CDH cluster.

IOException from Timeouts

CDH 5.12.0 includes the fix HBASE-16604, where the internal scanner that retries in case of IOException from timeouts could potentially miss data. Java clients were properly updated to account for the new behavior, but thrift clients will now see exceptions where the previous missing data would be.

Workaround: Create a new scanner and retry the operation when encountering this issue.

`IntegrationTestReplication` fails if replication does not finish before the `verify` phase begins

During IntegrationTestReplication, if the verify phase starts before the replication phase finishes, the test will fail because the target cluster does not contain all of the data. If the HBase services in the target cluster does not have enough memory, long garbage-collection pauses might occur.

Workaround: Use the -t flag to set the timeout value before starting verification.

Cloudera Issue: None.

HDFS encryption with HBase

Cloudera has tested the performance impact of using HDFS encryption with HBase. The overall overhead of HDFS encryption on HBase performance is in the range of 3 to 4% for both read and update workloads. Scan performance has not been thoroughly tested.

`ExportSnapshot` or `DistCp` operations may fail on the Amazon `s3a://` protocol

ExportSnapshot or DistCP operations may fail on AWS when using certain JDK 8 versions, due to an incompatibility between the AWS Java SDK 1.9.x and the joda-time date-parsing module.

Workaround: Use joda-time 2.8.1 or higher, which is included in AWS Java SDK 1.10.1 or higher.

Cloudera Issue: None.

An operating-system level tuning issue in RHEL7 causes significant latency regressions

There are two distinct causes for the regressions, depending on the workload:

For a cached workload, the regression may be up to 11%, as compared to RHEL6. The cause relates to differences in the CPU's C-state (power saving state) behavior. With the same workload, the CPU is around 40% busier in RHEL7, and the CPU spends more time transitioning between C-states in RHEL7. Transitions out of deeper C-states add latency. When CPUs are configured to never enter a C-state lower than 1, RHEL7 is slightly faster than RHEL6 on the cached workload. The root cause is still under investigation and may be hardware-dependent.
For an IO-bound workload, the regression may be up to 8%, even with common C-state settings. A 6% difference in average disk service time has been observed, which in turn seems to be caused by a 10% higher average read size at the drive on RHEL7. The read sizes issued by HBase are the same in both cases, so the root cause seems to be a change in the EXT4 filesystem or the Linux block IO later. The root cause is still under investigation.

Bug: None

Severity: Medium

Workaround: Avoid using RHEL 7 if you have a latency-critical workload. For a cached workload, consider tuning the C-state (power-saving) behavior of your CPUs.

Export to Azure Blob Storage (the `wasb://` or `wasbs://` protocol) is not supported

CDH 5.3 and higher supports Azure Blob Storage for some applications. However, a null pointer exception occurs when you specify a wasb:// or wasbs:// location in the --copy-to option of the ExportSnapshot command or as the output directory (the second positional argument) of the Export command.

Workaround: None.

Apache Issue: HADOOP-12717

AccessController postOperation problems in asynchronous operations

When security and Access Control are enabled, the following problems occur:

If a Delete Table fails for a reason other than missing permissions, the access rights are removed but the table may still exist and may be used again.

If hbaseAdmin.modifyTable() is used to delete column families, the rights are not removed from the Access Control List (ACL) table. The postOperation is implemented only for postDeleteColumn().

If Create Table fails, full rights for that table persist for the user who attempted to create it. If another user later succeeds in creating the table, the user who made the failed attempt still has the full rights.

Workaround: None

Apache Issue: HBASE-6992

Apache Hive / HCatalog / Hive on Spark Known Issues

This topic also contains:

When vectorization is enabled on any file type (ORC, Parquet) queries that divide by zero using the modulo operator (%) return an error

When vectorization is enabled for Hive on any file type, including ORC and Parquet, if the query divides by zero using the modulo operator (%), it returns the following error: Arithmetic exception [divide by] 0. For example, if you run the following query this issue is triggered: SELECT 100 % column_c1 FROM table_t1; and the value in column_c1 is zero. The divide operator (/) is not affected by this issue.

Workaround: Disable vectorization for the query that is triggering this at either the session level by using the SET statement or at the server level by disabling the property with Cloudera Manager. For information about how to enable or disable query vectorization, see Enabling Hive Query Vectorization.

Affected Versions: When query vectorization is enabled for Hive, this issue affects Hive ORC tables in all versions of CDH and affects Hive Parquet tables in CDH 6.0 and later

Apache Issue: HIVE-19564

Cloudera Issue: CDH-71211

When vectorization is enabled for Hive on any file type (ORC, Parquet) queries that perform comparisons in the SELECT clause on large values in columns with the data type of BIGINT might return wrong results

When vectorization is enabled for Hive on any file type, including ORC and Parquet, if the query performs a comparison operation between very large values in columns that are BIGINT data types in the SELECT clause of the query, incorrect results might be returned. Comparison operators include ==, !=, <, <=, >, and >=. This issue does not occur when the comparison operation is performed in the filtering clause of the query. This issue can also occur when the difference of values in such columns is out of range for a LONG (64-bit) data type. For example, if column_c1 stores 8976171455044006767 and column_c2 stores -7272907770454997143, a query such as SELECT column_c1 < column_c2 FROM table_test returns true instead of false because the difference (8976171455044006767 - (-7272907770454997143)) is 1.6249079225499E19 which is greater than 9.22337203685478E18, which is the maximum possible value that a LONG (64-bit) data type can hold.

Workaround: Use a DECIMAL type instead of BIGINT for columns that might contain very large values. Another option is to disable vectorization for the query that is triggering this at either the session level by using the SET statement or at the server level by disabling the property with Cloudera Manager. For information about how to enable or disable query vectorization, see Enabling Hive Query Vectorization.

Affected Versions: When query vectorization is enabled for Hive, this issue affects Hive ORC tables in all versions of CDH and affects Hive Parquet tables in CDH 6.0 and later

Apache Issue: HIVE_20207

Cloudera Issue: CDH-70996

Specified column position in the ORDER BY clause is not supported for SELECT * queries

When column positions are specified in ORDER BY clauses, they are not honored for SELECT * queries and an error is returned as shown in the following example:

CREATE TABLE decimal_1 (id decimal(5,0));
SELECT * FROM decimal_1 ORDER BY 1 limit 100;
Error while compiling statement: FAILED: SemanticException [Error 10219]: Position in ORDER BY is not supported when using SELECT *

Instead the query must list out the columns it is selecting.

Affected Versions: CDH 6.0.0 and higher

Cloudera Issue: CDH-68550

DirectSQL with PostgreSQL

Hive doesn't support Hive direct SQL queries with PostgreSQL database. It only supports this feature with MySQL, MariaDB, and Oracle. With PostgresSQL, direct SQL is disabled as a precaution, since there have been issues reported upstream where it is not possible to fallback on DataNucleus in the event of some failures, plus other non-standard behaviors. For more information, see Hive Configuration Properties.

Affected Versions: All CDH versions

Cloudera Issue: CDH-49017

ALTER PARTITION … SET LOCATION does not work on Amazon S3 or between S3 and HDFS

Cloudera recommends that you do not use ALTER PARTITION … SET LOCATION on S3 or between S3 and HDFS. The rest of the ALTER PARTITION commands work as expected.

Affected Versions: All CDH versions

Cloudera Issue: CDH-42420

Commands run against an Oracle-backed metastore might fail

Commands run against an Oracle-backed Metastore fail with error:

javax.jdo.JDODataStoreException Incompatible data type for column TBLS.VIEW_EXPANDED_TEXT : was CLOB (datastore),
but type expected was LONGVARCHAR (metadata). Please check that the type in the datastore and the type specified in the MetaData are consistent.

This error might occur if the metastore is run on top of an Oracle database with the configuration property datanucleus.validateColumns set to true.

Workaround: Set datanucleus.validateColumns=false in the hive-site.xml configuration file.

Affected Versions: All CDH versions

Cannot create archive partitions with external HAR (Hadoop Archive) tables

ALTER TABLE ... ARCHIVE PARTITION is not supported on external tables.

Affected Versions: All CDH versions

Cloudera Issue: CDH-9638

Object types `Server` and `URI` are not supported in `"SHOW GRANT ROLE roleName on OBJECT objectName"` statements

Workaround: Use SHOW GRANT ROLE roleNameto list all privileges granted to the role.

Affected Versions: All CDH versions

Cloudera Issue: CDH-19430

HCatalog Known Issues

There are no notable known issues in this release of HCatalog.

Hive on Spark (HoS) Known Issues

Hive on Spark queries fail with "Timed out waiting for client to connect" for an unknown reason

If this exception is preceded by logs of the form "client.RpcRetryingCaller: Call exception...", then this failure is due to an unavailable HBase service. On a secure cluster, spark-submit will try to obtain delegation tokens from HBase, even though Hive on Spark might not need them. So if HBase is unavailable, spark-submit throws an exception.

Workaround: Fix the HBase service, or set spark.yarn.security.tokens.hbase.enabled to false.

Affected Versions: CDH 5.7.0 and higher

Cloudera Issues: CDH-59591, CDH-59599

Hue Known Issues

Cloudera Hue is vulnerable to Cross-Site Scripting attacks

Multiple Cross-Site Scripting (XSS) vulnerabilities of Cloudera Hue have been found. They allow JavaScript code injection and execution in the application context.

CVE-2021-29994 - The Add Description field in the Table schema browser does not sanitize user inputs as expected.
CVE-2021-32480 - Default Home direct button in Filebrowser is also susceptible to XSS attack.
CVE-2021-32481 - The Error snippet dialog of the Hue UI does not sanitize user inputs.

Products affected: Hue

Releases affected:

CDP Public Cloud 7.2.10 and lower
CDP Private Cloud Base 7.1.6 and lower
CDP Private Cloud Plus 1.2 and lower (NOTE: CDP Private Cloud Plus was renamed to CDP Private Cloud Experiences for version 1.2)
Cloudera Data Warehouse (DWX) 1.1.2-b1484 (CDH 7.2.11.0-59) or lower
CDH 6.3.4 and lower

User affected: All users of the affected versions

CVE:

CVE-2021-29994 - 5.5 (Medium) CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:L/I:L/A:L
CVE-2021-32480 - 5.5 (Medium) CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:L/I:L/A:L
CVE-2021-32481 - 5.5 (Medium) CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:L/I:L/A:L

Severity (Low/Medium/High): Medium

Impact:Security Vulnerabilities as mentioned in the CVEs

Immediate action required:

Upgrade (recommended):
- CDP Public Cloud users should upgrade to 7.2.11
- CDP Private Cloud Base users should upgrade to CDP 7.1.7
- CDP Private Cloud Plus users should upgrade to CDP PVC 1.3
- Cloudera Data Warehouse users should upgrade to the latest version DWX1.1.2-b1793 & CDH 2021.0.1-b10
- CDH users should request a patch

Hue allows unsigned SAML assertions

If Hue receives an unsigned assertion, it continues to process it as valid. This means it is possible for an end-user to forge or remove the signature and manipulate a SAML assertion to gain access without a successful authentication.

Products affected: Hue, CDH

Releases affected:

CDH 5.15.x and earlier
CDH 5.16.0, 5.16.1
CDH 6.0.x
CDH 6.1.x

User affected: All users who are using SAML with Hue.

CVE: CVE-2019-14775

Date/time of detection: January 2019

Detected by: Joel Snape

Severity (Low/Medium/High): High

Impact:

This is a significant security risk as it allows anyone to fake their access validity and therefore access Hue, even if they should not have access. In more detail: if Hue receives an unsigned assertion, it continues to process it as valid. This means it is possible for an end-user to forge or remove the signature and manipulate a SAML assertion to gain access without a successful authentication.

CVE: CVE-2019-14775

Immediate action required:

Upgrade (recommended): Upgrade to a version of CDH containing the fix.
Workaround: None

Addressed in release/refresh/patch:

CDH 5.16.2
CDH 6.2.0

Hue external users granted super user priviliges in C6

When using either the LdapBackend or the SAML2Backend authentication backends in Hue, users that are created on login when logging in for the first time are granted superuser privileges in CDH 6. This does not apply to users that are created through the User Admin application in Hue.

Products affected: Hue

Releases affected: CDH 6.0.0, CDH 6.0.1, CDH 6.1.0

Users affected: All user

Date/time of detection: Dec/12/18

Severity (Low/Medium/High): Medium

Impact:

The superuser privilege is granted to any user that logs in to Hue when LDAP or SAML authentication is used. For example, if you have the create_users_on_login property set to true in the Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini, and you are using LDAP or SAML authentication, a user that logs in to Hue for the first time is created with superuser privileges and can perform the following actions:

When the SAML2Backend is used, Hue accounts that have superuser privileges can:

Create/Delete users and groups
Assign users to groups
Alter group permissions

However, when the SAML2Backend is used, users can only log in to Hue using SAML authentication.

When the LdapBackend is used, Hue accounts that have superuser privileges can:

Synchronize Hue users with your LDAP server
Create local users and groups (these local users can login to Hue only if the mode of multi-backend authentication is set up as LdapBackend and AllowFirstUserDjangoBackend)
Assign users to groups
Alter group permissios

This impact does not apply to the following other scenarios:

When users are synced with your LDAP server manually by using the User Admin page in Hue.
When you are using other authentication methods. For example:
- AllowFirstUserDjangoBackend
- Spnego
- PAM
- Oauth

When the LdapBackend and AllowFirstUserDjangoBackend are used, administrators should note:

Local users, including users created by unexpected superusers, can login throug AllowFirstUserDjangoBackend.
Local users in Hue that created as hive, hdfs, or solr have privileges to access protected data and alter permissions in security app.
Removing the AllowFirstUserDjangoBackend authentication backend can stop local users login to Hue, but it requires the administrator to have Cloudera Manager access

CVE: CVE-2019-7319

Immediate action required: Upgrade and follow the instructions below.

Addressed in release/refresh/patch: CDH 6.1.1 and CDH 6.2.0

After upgrading to 6.1.1 or later, you must run the following update statement in the Hue database:

UPDATE useradmin_userprofile SET `creation_method` = 'EXTERNAL' WHERE `creation_method` = 'CreationMethod.EXTERNAL';

Important: If the Hue database is using MySQL, before you run this UPDATE statement, check if safe mode is on by using the following query:

SELECT @@SQL_SAFE_UPDATES;

If the safe mode is turned on, it returns '1'. You can tempirarily set it to off by using the following SET statement:

SET SQL_SAFE_UPDATES = 0;

After running the update statement, to re-enable safe mode:

SET SQL_SAFE_UPDATES = 1;

After executing the UPDATE statement, new Hue users are no longer automatically created as superusers.

To find out the list of superusers, run SQL query:

SELECT username FROM auth_user WHERE superuser = 1;

Users who obtained superuser privilege due to this issue need to be revoked manually by using the following steps:

Log in to the Hue UI as an administrator.
In the upper right corner of the page, click the user drop-down list and select Manage User:
In the User Admin page, make sure that the Users tab is selected and click the name of the user in the list that you want to edit:
In the Hue Users - Edit user page, click Step 3: Advanced:
Clear the checkbox for Superuser status:
At the bottom of the page, click Update user to save the change.

For the latest update on this issue see the corresponding Knowledge article:

TSB 2019-360: Hue external users granted super user privileges in C6

Hue does not support the Spark App

Hue does not currently support the Spark application.

Connecting to PostgreSQL Database Fails with Error "No module named psycopg2"

When configuring Hue to use a PostgreSQL database, the connection fails with the following error:

Error loading psycopg2 module: No module named psycopg2

Workaround: Install the psycopg2 Python package as documented in Installing the psycopg2 Python Package.

Affected Versions: All CDH 6 versions

Fixed Versions: None

Apache Issue: N/A

Cloudera Issue: CDH-65804

Apache Impala Known Issues

The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and whether a fix is in the pipeline.

Continue reading:

Impala Known Issues: Startup
- Impala requires FQDN from hostname command on kerberized clusters
Impala Known Issues: Crashes and Hangs
- Unable to view large catalog objects in catalogd Web UI
Impala Known Issues: Performance
- Metadata operations block read-only operations on unrelated tables
- Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true
Impala Known Issues: Security
Impala Known Issues: Resources
Impala Known Issues: Correctness
Impala Known Issues: Metadata
- Concurrent catalog operations with heavy DDL workloads can cause queries with SYNC_DDL to fail fast
Impala Known Issues: Interoperability
Impala Known Issues: Limitations
Impala Known Issues: Miscellaneous / Older Issues

Impala Known Issues: Startup

These issues can prevent one or more Impala-related daemons from starting properly.

Impala requires FQDN from hostname command on kerberized clusters

The method Impala uses to retrieve the host name while constructing the Kerberos principal is the gethostname() system call. This function might not always return the fully qualified domain name, depending on the network configuration. If the daemons cannot determine the FQDN, Impala does not start on a kerberized cluster.

Workaround: Test if a host is affected by checking whether the output of the hostname command includes the FQDN. On hosts where hostname, only returns the short name, pass the command-line flag --hostname=fully_qualified_domain_name in the startup options of all Impala-related daemons.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-4978

Impala Known Issues: Crashes and Hangs

These issues can cause Impala to quit or become unresponsive.

Unable to view large catalog objects in catalogd Web UI

In catalogd Web UI, you can list metadata objects and view their details. These details are accessed via a link and printed to a string formatted using thrift's DebugProtocol. Printing large objects (> 1 GB) in Web UI can crash catalogd.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-6841

Impala Known Issues: Performance

These issues involve the performance of operations such as queries or DDL statements.

Metadata operations block read-only operations on unrelated tables

Metadata operations that change the state of a table, like COMPUTE STATS or ALTER RECOVER PARTITIONS, may delay metadata propagation of unrelated unloaded tables triggered by statements like DESCRIBE or SELECT queries.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-6671

Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true

The configuration setting convert_legacy_hive_parquet_utc_timestamps=true uses an underlying function that can be a bottleneck on high volume, highly concurrent queries due to the use of a global lock while loading time zone information. This bottleneck can cause slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount of slowdown depends on factors such as the number of cores and number of threads involved in the query.

Workaround: Store the TIMESTAMP values as strings in one of the following formats:

yyyy-MM-dd
yyyy-MM-dd HH:mm:ss
yyyy-MM-dd HH:mm:ss.SSSSSSSSS
The date can have the 1-9 digits in the fractional part.

Impala implicitly converts such string values to TIMESTAMP in calls to date/time functions.

Affected Versions: CDH 6.0.x versions

Fixed Versions: CDH 6.1.0

Apache Issue: IMPALA-3316

Impala Known Issues: Security

These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and redaction.

Impala logs the session / operation secret on most RPCs at INFO level

Impala logs contain the session / operation secret. With this information a person who has access to the Impala logs might be able to hijack other users' sessions. This means the attacker is able to execute statements for which they do not have the necessary privileges otherwise. Impala deployments where Apache Sentry or Apache Ranger authorization is enabled may be vulnerable to privilege escalation. Impala deployments where audit logging is enabled may be vulnerable to incorrect audit logging.

Restricting access to the Impala logs that expose secrets will reduce the risk of an attack. Additionally, restricting access to trusted users for the Impala deployment will also reduce the risk of an attack. Log redaction techniques can be used to redact secrets from the logs. For more information, see the Cloudera Manager documentation.

For log redaction, users can create a rule with a search pattern: secret \(string\) [=:].*And the replacement could be for example: secret=LOG-REDACTED

This vulnerability is fixed upstream under IMPALA-10600

Products affected:

CDP Private Cloud Base
CDP Public Cloud
CDH

Releases affected:

CDP Private Cloud Base 7.0.3, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5 and 7.1.6
CDP Public Cloud 7.0.0, 7.0.1, 7.0.2, 7.1.0, 7.2.0, 7.2.1, 7.2.2, 7.2.6, 7.2.7, and 7.2.8
All CDH 6.3.4 and lower releases

Users affected: Impala users of the affected releases

Severity (Low/Medium/High): 7.5 (High) CVSS:3.1/AV:N/AC:H/PR:L/UI:N/S:U/C:H/I:H/A:H

Impact: Unauthorized access

CVE: CVE-2021-28131

Immediate action required:Upgrade to a CDP Private Cloud Base or CDP Public Cloud version containing the fix.

Addressed in release/refresh/patch:

CDP Private Cloud Base 7.1.7
CDP Public Cloud 7.2.9 or higher versions

Authenticated user with access to active session or query id can hijack other Impala session or query

If an authenticated Impala user supplies a valid query id to Impala's HS2 and Beeswax interfaces, they can perform operations on other sessions or queries when normally they do not have privileges to do so.

Releases affected:

CDH 5.16.x and lower
CDH 6.0.x
CDH 6.1.x
CDH 6.2.0

Users affected: All Impala users of affected versions.

Date/time of detection: 21st May 2019

Severity (Low/Medium/High): 7.5 (High) (CVSS 3.0: AV:N/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N)

Impact: Neither the original issue or the fix affect the normal use of the system.

CVE: CVE-2019-10084

Immediate action required: There is no workaround, upgrade to a version of CDH containing the fix.

Addressed in release/refresh/patch: CDH 6.2.1 and higher versions

XSS Cloudera Manager

Malicious Impala queries can result in Cross Site Scripting (XSS) when viewed in Cloudera Manager.

Products affected: Apache Impala

Releases affected:

Cloudera Manager 5.13.x, 5.14.x, 5.15.1, 5.15.2, 5.16.1
Cloudera Manager 6.0.0, 6.0.1, 6.1.0

Users affected: All Cloudera Manager Users

Date/time of detection: November 2018

Severity (Low/Medium/High): High

Impact: When a malicious user generates a piece of JavaScript in the impala-shell and then goes to the Queries tab of the Impala service in Cloudera Manager, that piece of JavaScript code gets evaluated, resulting in an XSS.

CVE: CVE-2019-14449

Immediate action required: There is no workaround, upgrade to the latest available maintenance release.

Addressed in release/refresh/patch:

Cloudera Manager 5.16.2
Cloudera Manager 6.0.2, 6.1.1, 6.2.0, 6.3.0

In Impala with Sentry enabled, REVOKE ALL ON SERVER does not remove the privileges granted with the GRANT option

If you grant a role the ALL privilege at the SERVER scope with the WITH GRANT OPTION clause, you cannot revoke the privilege. Although the SHOW GRANT ROLE command will show that the privilege has been revoked immediately after you run the command, the ALL privilege will reappear when you run the SHOW GRANT ROLE command after Sentry refreshes.

Immediate Action Required: Once the privilege has been granted, the only way to remove it is to delete the role.

Affected Versions: CDH 6.0.0, CDH 6.0.1, CDH 5.15.0, CDH 5.15.1, CDH 5.14.x and all prior releases

Fixed Versions: CDH 6.1.0, CDH 6.0.2, CDH 5.16.0, CDH 5.15.2

Cloudera Issue: TSB-341

Impala does not support Heimdal Kerberos

Heimdal Kerberos is not supported in Impala.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-7072

Impala Known Issues: Resources

These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management features.

Handling large rows during upgrade to CDH 5.13 / Impala 2.10 or higher

After an upgrade to CDH 5.13 / Impala 2.10 or higher, users who process very large column values (long strings), or have increased the --read_size configuration setting from its default of 8 MB, might encounter capacity errors for some queries that previously worked.

Resolution: After the upgrade, follow the instructions in Handling Large Rows During Upgrade to CDH 5.13 / Impala 2.10 or Higher to check if your queries are affected by these changes and to modify your configuration settings if so.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-6028

Configuration to prevent crashes caused by thread resource limits

Impala could encounter a serious error due to resource usage under very high concurrency. The error message is similar to:

F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'

Workaround:

In CDH 6.0 and lower versions of CDH, configure each host running an impalad daemon with the following settings:

echo 2000000 > /proc/sys/kernel/threads-max
echo 2000000 > /proc/sys/kernel/pid_max
echo 8000000 > /proc/sys/vm/max_map_count

In CDH 6.1 and higher versions, it is unlikely that you will hit the thread resource limit. Configure each host running an impalad daemon with the following setting:

echo 8000000 > /proc/sys/vm/max_map_count

To make the above settings durable, refer to your OS documentation. For example, on RHEL 6.x:

Add the following line to /etc/sysctl.conf:
```
vm.max_map_count=8000000
```
Run the following command:
```
sysctl -p
```

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-5605

Breakpad minidumps can be very large when the thread count is high

The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.

Workaround: Add --minidump_size_limit_hint_kb=size to set a soft upper limit on the size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump file can still grow larger than the "hinted" size. For example, if you have 10,000 threads, the minidump file can be more than 20 MB.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-3509

Process mem limit does not account for the JVM's memory usage

Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the impalad daemon.

Workaround: To monitor overall memory usage, use the top command, or add the memory figures in the Impala web UI /memz tab to JVM memory usage shown on the /metrics tab.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-691

Impala Known Issues: Correctness

These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.

Timestamp type-casted to varchar in a binary predicate can produce incorrect result

In an Impala query the timestamp can be type-casted to a varchar of smaller length to convert a timestamp value to a date string. However, if such Impala query is used in a binary comparison against a string literal, it can produce incorrect results, because of a bug in the expression rewriting code. The following is an example of this:

> select * from (select cast('2018-12-11 09:59:37' as timestamp) as ts) tbl where cast(ts as varchar(10)) = '2018-12-11';

The output will have 0 rows.

Affected version:

CDH 5.15.0, 5.15.1, 5.15.2, 5.16.0, 5.16.1
CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1

Fixed versions:

CDH 5.16.2
CDH 6.2.0

For the latest update on this issue see the corresponding Knowledge article:TSB 2019-358: Timestamp type-casted to varchar in a binary predicate can produce incorrect result

Incorrect result due to constant evaluation in query with outer join

An OUTER JOIN query could omit some expected result rows due to a constant such as FALSE in another join clause. For example:

explain SELECT 1 FROM alltypestiny a1
  INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
  RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
+---------------------------------------------------------+
| Explain String                                          |
+---------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
|                                                         |
| 00:EMPTYSET                                             |
+---------------------------------------------------------+

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-3094

BST between 1972 and 1995

The calculation of start and end times for the BST (British Summer Time) time zone could be incorrect between 1972 and 1995. Between 1972 and 1995, BST began and ended at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the third) and fourth Sunday in October. For example, both function calls should return 13, but actually return 12, in a query such as:

select
  extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start,
  extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end;

Affected Versions: All CDH 6 versions

Fixed Versions: CDH 6.1

Apache Issue: IMPALA-3082

% escaping does not work correctly in a LIKE clause

If the final character in the RHS argument of a LIKE operator is an escaped \% character, it does not match a % final character of the LHS argument.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-2422

Crash: impala::Coordinator::ValidateCollectionSlots

A query could encounter a serious error if includes multiple nested levels of INNER JOIN clauses involving subqueries.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-2603

Impala Known Issues: Metadata

These issues affect how Impala interacts with metadata. They cover areas such as the metastore database and the Impala Catalog Server daemon.

Concurrent catalog operations with heavy DDL workloads can cause queries with SYNC_DDL to fail fast

When Catalog Server is under a heavy load with concurrent catalog operations of long running DDLs, queries running with the SYNC_DDL query option can fail with the following message:

ERROR: CatalogException: Couldn't retrieve the catalog topic
version for the SYNC_DDL operation after 3 attempts.The operation has
been successfully executed but its effects may have not been
broadcast to all the coordinators.

The catalog operation is actually successful as the change has been committed to HMS and Catalog Server cache, but when Catalog Server notices a longer than expected time for it to broadcast the changes, it fails fast.

The coordinator daemons eventually sync up in the background.

Affected Versions: CDH versions 6.0 and 6.1

Apache Issue: IMPALA-7961 / CDH-76345

Impala Known Issues: Interoperability

These issues affect the ability to interchange data between Impala and other systems. They cover areas such as data types and file formats.

Queries Stuck on Failed HDFS Calls and not Timing out

In CDH 6.2 / Impala 3.2 and higher, if the following error appears multiple times in a short duration while running a query, it would mean that the connection between the impalad and the HDFS NameNode is in a bad state and hence the impalad would have to be restarted:

"hdfsOpenFile() for <filename> at backend <hostname:port> failed to finish before the <hdfs_operation_timeout_sec> second timeout "

In CDH 6.1 / Impala 3.1 and lower, the same issue would cause Impala to wait for a long time or hang without showing the above error message.

Workaround: Restart the impalad in the bad state.

Affected Versions: All versions of Impala

Apache Issue: HADOOP-15720

Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)

Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum allowed value of type (Hive returns NULL).

Workaround: None

Affected Versions: All CDH 6 versions

Configuration needed for Flume to be compatible with Impala

For compatibility with Impala, the value for the Flume HDFS Sink hdfs.writeFormat must be set to Text, rather than its default value of Writable. The hdfs.writeFormat setting must be changed to Text before creating data files with Flume; otherwise, those files cannot be read by either Impala or Hive.

Resolution: This information has been requested to be added to the upstream Flume documentation.

Affected Versions: All CDH 6 versions

Cloudera Issue: CDH-13199

Avro Scanner fails to parse some schemas

The default value in Avro schema must match the first union type. For example, if the default value is null, then the first type in the UNION must be "null".

Workaround: Swap the order of the fields in the schema specification. For example, use ["null", "string"] instead of ["string", "null"]. Note that the files written with the problematic schema must be rewritten with the new schema because Avro files have embedded schemas.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-635

Impala BE cannot parse Avro schema that contains a trailing semi-colon

If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.

Workaround: Remove trailing semicolon from the Avro schema.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-1024

Incorrect results with basic predicate on CHAR typed column

When comparing a CHAR column value to a string literal, the literal value is not blank-padded and so the comparison might fail when it should match.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-1652

Impala Known Issues: Limitations

These issues are current limitations of Impala that require evaluation as you plan how to integrate Impala into your data management workflow.

Set limits on size of expression trees

Very deeply nested expressions within queries can exceed internal Impala limits, leading to excessive memory usage.

Workaround: Avoid queries with extremely large expression trees. Setting the query option disable_codegen=true may reduce the impact, at a cost of longer query runtime.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-4551

Impala does not support running on clusters with federated namespaces

Impala does not support running on clusters with federated namespaces. The impalad process will not start on a node running such a filesystem based on the org.apache.hadoop.fs.viewfs.ViewFs class.

Workaround: Use standard HDFS on all Impala nodes.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-77

Hue and BDR require separate parameters for Impala Load Balancer

Cloudera Manager supports a single parameter for specifying the Impala Daemon Load Balancer. However, because BDR and Hue need to use different ports when connecting to the load balancer, it is not possible to configure the load balancer value so that BDR and Hue will work correctly in the same cluster.

Workaround: To configure BDR with Impala, use the load balancer configuration either without a port specification or with the Beeswax port.

To configure Hue, use the Hue Server Advanced Configuration Snippet (Safety Valve) for impalad_flags to specify the load balancer address with the HiveServer2 port.

Affected Versions: CDH versions from 5.11 to 6.0.1

Cloudera Issue: OPSAPS-46641

Impala Known Issues: Miscellaneous / Older Issues

These issues do not fall into one of the above categories or have not been categorized yet.

A failed CTAS does not drop the table if the insert fails

If a CREATE TABLE AS SELECT operation successfully creates the target table but an error occurs while querying the source table or copying the data, the new table is left behind rather than being dropped.

Workaround: Drop the new table manually after a failed CREATE TABLE AS SELECT.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-2005

Casting scenarios with invalid/inconsistent results

Using a CAST function to convert large literal values to smaller types, or to convert special values such as NaN or Inf, produces values not consistent with other database systems. This could lead to unexpected results from queries.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-1821

Impala Parser issue when using fully qualified table names that start with a number

A fully qualified table name starting with a number could cause a parsing error. In a name such as db.571_market, the decimal point followed by digits is interpreted as a floating-point number.

Workaround: Surround each part of the fully qualified name with backticks (``).

Affected Versions: All CDH 6 versions

Fixed Versions: CDH 6.2.0

Apache Issue: IMPALA-941

Impala should tolerate bad locale settings

If the LC_* environment variables specify an unsupported locale, Impala does not start.

Workaround: Add LC_ALL="C" to the environment settings for both the Impala daemon and the Statestore daemon. See Modifying Impala Startup Options for details about modifying these environment settings.

Resolution: Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-532

EMC Isilon Known Issues

CDH 6.0 is not currently supported on EMC Isilon.

Affected Versions: CDH 6.0.x

Apache Kafka Known Issues

Potential to bypass transaction and idempotent ACL checks in Apache Kafka

It is possible to manually craft a Produce request which bypasses transaction and idempotent ACL validation. Only authenticated clients with Write permission on the respective topics are able to exploit this vulnerability.

Products affected:

CDH
CDK Powered by Apache Kafka

Releases affected:

CDH versions 6.0.x, 6.1.x, 6.2.0
CDK versions 3.0.x, 3.1.x, 4.0.x

Users affected: All users who run Kafka in CDH and CDK.

Date/time of detection: September, 2018

Severity (Low/Medium/High):7.1 (High) (CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:L/I:H/A:H)

Impact: Attackers can exploit this issue to bypass certain security restrictions to perform unauthorized actions. This can aid in further attacks.

CVE: CVE-2018-17196

Immediate action required: Update to a version of CDH containing the fix.

Addressed in release/refresh/patch:

CDH 6.2.1, 6.3.2
CDK 4.1.0

Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2020-378: Potential to bypass transaction and idempotent ACL checks in Apache Kafka

Topics Created with the "kafka-topics" Tool Might Not Be Secured

Topics that are created and deleted via Kafka are secured (for example, auto created topics). However, most topic creation and deletion is done via the kafka-topics tool, which talks directly to ZooKeeper or some other third-party tool that talks directly to ZooKeeper. Because security is the responsibility of ZooKeeper authorization and authentication, Kafka cannot prevent users from making ZooKeeper changes. Anyone with access to ZooKeeper can create and delete topics. They will not be able to describe, read, or write to the topics even if they can create them.

The following commands talk directly to ZooKeeper and therefore are not secured via Kafka:

kafka-topics.sh
kafka-configs.sh
kafka-preferred-replica-election.sh
kafka-reassign-partitions.sh

"offsets.topic.replication.factor" Must Be Less Than or Equal to the Number of Live Brokers

The offsets.topic.replication.factor broker configuration is now enforced upon auto topic creation. Internal auto topic creation will fail with a GROUP_COORDINATOR_NOT_AVAILABLE error until the cluster size meets this replication factor requirement.

Kafka May Be Stuck with Under-replicated Partitions after ZooKeeper Session Expires

This problem can occur when your Kafka cluster includes a large number of under-replicated Kafka partitions. One or more broker logs include messages such as the following:

[2016-01-17 03:36:00,888] INFO Partition [__samza_checkpoint_event-creation_1,3] on broker 3: Shrinking ISR for partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 (kafka.cluster.Partition)
[2016-01-17 03:36:00,891] INFO Partition [__samza_checkpoint_event-creation_1,3] on broker 3: Cached zkVersion [66] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)

There will also be an indication of the ZooKeeper session expiring in one or more Kafka broker logs around the same time as the previous errors:

INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient)

The log is typically in /var/log/kafka on each host where a Kafka broker is running. The location is set by the property kafka.log4j.dir in Cloudera Manager. The log name is kafka-broker-hostname.log. In diagnostic bundles, the log is under logs/hostname-ip-address/.

Workaround: To move forward after seeing this problem, restart the affected Kafka brokers. You can restart individual brokers from the Instances tab in the Kafka service page in Cloudera Manager.

Reduce the potential for long garbage collection pauses by brokers:
- Use a better garbage collection mechanism in the JVM, such as G1GC. You can do this by adding ‑XX:+UseG1GC in the broker_java_opts.
- Increase broker heap size if it is too small (broker_max_heap_size). Be careful that you don’t choose a heap size that can cause out-of-memory problems given all the services running on the node.
Increase the ZooKeeper session timeout configuration on brokers (zookeeper.session.timeout.ms), to reduce the likelihood that sessions expire.
Ensure ZooKeeper itself is well resourced and not overwhelmed so it can respond. For example, it is highly recommended to locate the ZooKeeper log directory on its own disk.

Affected Versions: CDK 1.4.x, 2.0.x, 2.1.x, 2.2.x

Fixed Versions:

Full Fix: CDH 6.1.0
Partial Fix: CDH 6.0.0, Kafka implementations with CDH 6.0.0 are less likely to encounter this issue.

Apache Issue: KAFKA-2729

Cloudera Issue: CDH-42514

Requests Fail When Sending to a Nonexistent Topic with "auto.create.topics.enable" Set to True

The first few produce requests fail when sending to a nonexistent topic with auto.create.topics.enable set to true.

Workaround: Increase the number of retries in the Producer configuration setting retries.

Custom Kerberos Principal Names Cannot Be Used for Kerberized ZooKeeper and Kafka instances

When using ZooKeeper authentication and a custom Kerberos principal, Kerberos-enabled Kafka does not start.

Workaround: None. You must disable ZooKeeper authentication for Kafka or use the default Kerberos principals for ZooKeeper and Kafka.

Performance Degradation When SSL Is Enabled

Significant performance degradation can occur when SSL is enabled. The impact varies depending on your CPU, JVM version, and message size. Consumers are typically more affected than producers.

Workaround: Configure brokers and clients with ssl.secure.random.implementation = SHA1PRNG. It often reduces this degradation drastically, but its effect is CPU and JVM dependent.

Affected Versions: CDK 2.x and later

Fixed Versions: None

Apache Issue: KAFKA-2561

Cloudera Issue: None

The Idempotent and Transactional Capabilities of Kafka are Incompatible with Sentry

The idempotent and transactional capabilities of Kafka are not compatible with Sentry. The issue is due to Sentry being unable to handle authorization policies for Kafka transactions. As a result, users cannot use Kafka transaction in combination with Sentry.

Workaround: Use the Sentry super user in applications where idempotent producing is a requirement or disable Sentry.

Affected Versions: CDK 4.0 and later, CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1, 6.2.0, 6.3.0

Fixed Versions: CDH 6.2.1, 6.3.1

Apache Issue: N/A

Cloudera Issue: CDH-80606

Kafka Garbage Collection Logs are Written to the Process Directory

By default Kafka garbage collection logs are written to the CDH process directory. Changing the default path for these log files is currently unsupported.

Workaround: N/A

Affected Versions:All

Fixed Versions: N/A

Cloudera Issue: OPSAPS-43236

MirrorMaker Does Not Start When Sentry is Enabled

When MirrorMaker is used in conjunction with Sentry, MirrorMaker reports an authorization issue and does not start. This is due to Sentry being unable to authorize the kafka_mirror_maker principal which is automatically created.

Workaround: Complete the following steps prior to enabling Sentry:

Create the kafka_mirror_maker Linux user ID and the kafka_mirror_maker Linux group ID on the MirrorMaker hosts. Use the following command:
```
useradd kafka_mirror_maker
```
Create the necessary Sentry rules for the kafka_mirror_maker group.
Note: Alternatively, you can add the kafka_mirror_maker user to super.users, this bypasses authorization.

Affected Versions: CDH 6.0.0 and later

Fixed Versions: N/A

Apache Issue: N/A

Cloudera Issue: CDH-53706

Apache Kudu Known Issues

The following are known bugs and issues in Kudu. Note that this list is not exhaustive, and is meant to communicate only the most important known issues.

Kudu Masters unable to join back after a restart

In a multi master Kudu environment, if a master is restarted or goes offline for a few minutes, it can occasionally have trouble joining the cluster on startup. For example, if this happens in case of three kudu masters, and one of the other two masters is stopped or dies during this time, then the overall Kudu cluster is down because the majority of the masters are not running.

This issue is resolved by the KUDU-2748 upstream JIRA.

Products affected: Apache Kudu

Affected version:

CDH 5.14.0, 5.14.2, 5.14.4
CDH 5.15.0, 5.15.1, 5.15.2
CDH 5.16.1, 5.16.2
CDH 6.0.0, 6.0.1
CDH 6.1.0, 6.1.1
CDH 6.2.0, 6.2.1

Fixed version:

CDH 6.3.0

For the latest update on this issue see the corresponding Knowledge article:TSB 2020-442: Kudu Masters unable to join back after a restart

Inconsistent rows returned from queries in Kudu

Due to KUDU-2463, upon restarting Kudu, inconsistent rows may be returned from tables that have not recently been written to, resulting in any of the following:

multiple rows for the same key being returned
deleted data being returned
inconsistent results consistently being returned for the same query

If this happens, you have two options to resolve the conflicts: write to the affected Kudu partitions by:

re-deleting the known and deleted data
upserting the most up-to-date version of affected rows.

Products affected: Apache Kudu

Affected version:

CDH 5.12.2, 5.13.3, 5.14.4, 5.15.1, 5.16.1
CDH 6.0.1, 6.1.0, 6.1.1

Fixed version:

CDH 5.16.2
CDH 6.2.0

For the latest update on this issue see the corresponding Knowledge article:TSB 2019-353: Inconsistent rows returned from queries in Kudu

CFile Checksum Failure Causes Queries to Fail

When a CFile checksum fails, for example, due to a underlying disk corruption, queries against the replica will fail with an error message, such as this:

Unable to advance iterator: Corruption: checksum error on CFile block

Workaround: Remove the corrupted replica from the tablet's Raft configuration. See Kudu Troubleshooting Guide for the detailed steps.

Affected Versions: CDH 6.0.x and lower

Apache Issue: KUDU-2469

C++ Client Fails to Re-acquire Authentication Token in Multi-master Clusters

A security-related issue can cause Impala queries to start failing on busy clusters in the following scenario:

The cluster runs with the --rpc_authentication set as optional or required. The default is optional. Secure clusters use required.
The cluster is using multiple masters.
Impala queries happen frequently enough that the leader master connection to some impalad isn't idle-closed (more than 1 query per 65 seconds).
The connection stays alive for longer than the authentication token timeout (1 week by default).
A master leadership change occurs after the authentication token expiration.

Impala queries will start failing with errors in the impalad logs like:

I0904 13:53:08.748968 95857 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to 10.164.44.13:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired
I0904 13:53:10.389009 95861 status.cc:125] Unable to open Kudu table: Timed out: GetTableSchema timed out after deadline expired
 @ 0x95b1e9 impala::Status::Status()
 @ 0xff22d4 impala::KuduScanNodeBase::Open()
 @ 0xff101e impala::KuduScanNode::Open()
 @ 0xb73ced impala::FragmentInstanceState::Open()
 @ 0xb7532b impala::FragmentInstanceState::Exec()
 @ 0xb64ae8 impala::QueryState::ExecFInstance()
 @ 0xd15193 impala::Thread::SuperviseThread()
 @ 0xd158d4 boost::detail::thread_data<>::run()
 @ 0x129188a (unknown)
 @ 0x7f717ceade25 start_thread
 @ 0x7f717cbdb34d __clone

Impala shell queries will fail with a message like:

Unable to open Kudu table: Timed out: GetTableSchema timed out after deadline expired

Workaround:

Restart the affected Impala Daemons. Restarting a daemon ensures the problem will not reoccur for at least the authentication token lifetime, which defaults to one week.
Increase the authentication token lifetime (--authn_token_validity_seconds). Beware that raising this lifetime increases the window of vulnerability of the cluster if a client is compromised. It is recommended that you keep the token lifetime at one month maximum for a secure cluster. For unsecured clusters, a longer token lifetime is acceptable, and a 3 month lifetime is recommended.

Affected Versions: From CDH 5.11 through CDH 6.0.1

Apache Issue: KUDU-2580

Timeout Possible with Log Force Synchronization Option

If the Kudu master is configured with the -log_force_fsync_all option, tablet servers and clients will experience frequent timeouts, and the cluster may become unusable.

Affected Versions: All CDH 6 versions

Longer Startup Times with a Large Number of Tablets

If a tablet server has a very large number of tablets, it may take several minutes to start up. It is recommended to limit the number of tablets per server to 1000 or fewer. The maximum allowed number of tablets is 2000 per server. Consider this limitation when pre-splitting your tables. If you notice slow start-up times, you can monitor the number of tablets per server in the web UI.

Affected Versions: All CDH 6 versions

Fault Tolerant Scan Memory Issue

Unlike regular scans, fault tolerant scans will allocate all required memory when the scan begins rather than as it progresses. This can be significant for big tablets. Moreover, this memory usage isn't counted towards the tablet server's overall memory limit, raising the likelihood of the tablet server being out-of-memory killed by the kernel.

Affected Versions: CDH 6.2 / Kudu 1.9 and lower

Apache Issue: KUDU-2466

Descriptions for Kudu TLS/SSL Settings in Cloudera Manager

Use the descriptions in the following table to better understand the TLS/SSL settings in the Cloudera Manager Admin Console.

Field	Usage Notes
Kerberos Principal	Set to the default principal, `kudu`.
Enable Secure Authentication And Encryption	Select this checkbox to enable authentication and RPC encryption between all Kudu clients and servers, as well as between individual servers. Only enable this property after you have configured Kerberos.
Master TLS/SSL Server Private Key File (PEM Format)	Set to the path containing the Kudu master host's private key (PEM-format). This is used to enable TLS/SSL encryption (over HTTPS) for browser-based connections to the Kudu master web UI.
Tablet Server TLS/SSL Server Private Key File (PEM Format)	Set to the path containing the Kudu tablet server host's private key (PEM-format). This is used to enable TLS/SSL encryption (over HTTPS) for browser-based connections to Kudu tablet server web UIs.
Master TLS/SSL Server Certificate File (PEM Format)	Set to the path containing the signed certificate (PEM-format) for the Kudu master host's private key (set in Master TLS/SSL Server Private Key File). The certificate file can be created by concatenating all the appropriate root and intermediate certificates required to verify trust.
Tablet Server TLS/SSL Server Certificate File (PEM Format)	Set to the path containing the signed certificate (PEM-format) for the Kudu tablet server host's private key (set in Tablet Server TLS/SSL Server Private Key File). The certificate file can be created by concatenating all the appropriate root and intermediate certificates required to verify trust.
Master TLS/SSL Server CA Certificate (PEM Format)	Disregard this field.
Tablet Server TLS/SSL Server CA Certificate (PEM Format)	Disregard this field.
Enable TLS/SSL for Master Server	Enables HTTPS encryption on the Kudu master web UI.
Enable TLS/SSL for Tablet Server	Enables HTTPS encryption on the Kudu tablet server web UIs.

Affected Versions: All CDH 6 versions

Apache Oozie Known Issues

External ID of MapReduce action not filled properly and failing MR job treated as SUCCEEDED

When a MapReduce action is launched from Oozie, the external ID field is not filled properly. It gets populated with the YARN ID of the LauncherAM, not with the ID of the actual MR job. If the MR job is submitted successfully and then fails, it will be treated as a successfully executed action.

Affected Versions: CDH 6.0.0 and higher

Fixed Version: CDH 6.1.0 and higher

Apache Issue: OOZIE-3298

Oozie jobs fail (gracefully) on secure YARN clusters when JobHistory server is down

If the JobHistory server is down on a YARN (MRv2) cluster, Oozie attempts to submit a job, by default, three times. If the job fails, Oozie automatically puts the workflow in a SUSPEND state.

Workaround: When the JobHistory server is running again, use the resume command to tell Oozie to continue the workflow from the point at which it left off.

Affected Versions: CDH 5 and higher

Cloudera Issue: CDH-14623

Apache Parquet Known Issues

There are no known issues in Parquet.

Apache Pig Known Issues

There are no known issues in this release.

Cloudera Search Known Issues

The current release includes the following known limitations:

Default Solr core names cannot be changed (limitation)

Although it is technically possible to give user-defined Solr core names during core creation, it is to be avoided in te context of Cloudera Search. Cloudera Manager expects core names in the default "collection_shardX_replicaY" format. Altering core names results in Cloudera Manager being unable to fetch Solr metrics for the given core and this, eventually, may corrupt data collection for co-located core, or even shard and server level charts.

Processing UpdateRequest with delegation token throws NullPointerException

When using the Spark Crunch Indexer or another client application which utilizes the SolrJ API to send Solr Update requests with delegation token authentication, the server side processing of the request might fail with a NullPointerException.

Affected Versions: CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1, 6.2.0, 6.2.1, 6.3.0, 6.3.1, 6.3.2

Fixed Version: CDH 6.3.3

Apache Issue: SOLR-13921

Cloudera Issue: CDH-82599

Solr service with no added collections causes the upgrade process to fail

CDH 5.x to CDH 6.x upgrade fails while performing the bootstrap collections step of the solr-upgrade.sh script with the error message:

Failed to execute command Bootstrap Solr Collections on service Solr

if there are no collections present in Solr.

Workaround: If there are no collections added to it, remove the Solr service from your cluster before you start the upgrade.

Affected Versions: CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1, 6.2.0, 6.2.1, 6.3.0, 6.3.1, 6.3.2

Fixed Version: CDH 6.3.3

Cloudera Issue: CDH-82042

HBase Lily indexer might fail to write role log files

In certain scenarios the HBase Lily Indexer (Key-Value Store Indexer) fails to write its role log files.

Workaround: None

Affected Versions: CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1, 6.2.0, 6.2.1, 6.3.0, 6.3.1, 6.3.2

Fixed Version: CDH 6.3.3

Cloudera Issue: CDH-82342

Adding a new indexer instance to HBase Lily Indexer fails with GSSException

When Kerberos authentication is enabled and adding a new indexer instance to HBase Lily Indexer (Key-Value Store Indexer), the authentication might fail when Lily is communicating to the HBase Master process, throwing a similar Exception:

javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

Workaround: Ensure that the Lily indexer has a Sentry dependency configured by following these steps:

Go to Cloudera Manager > Key-Value Store indexer > Configuration.
Make sure the Sentry Service configuration option points to a Sentry service instance instead of none.

The workaround does not require defining any Sentry roles or privileges, it is just to trigger a code execution path which will authenticate the HBase service user.

Affected Versions: CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1, 6.2.0, 6.2.1, 6.3.0, 6.3.1, 6.3.2

Fixed Version: CDH 6.3.3

Cloudera Issue: CDH-82566

Solr SQL, Graph, and Stream Handlers are Disabled if Collection Uses Document-Level Security

The Solr SQL, Graph, and Stream handlers do not support document-level security, and are disabled if document-level security is enabled on the collection. If necessary, these handlers can be re-enabled by setting the following Java system properties, but document-level security is not enforced for these handlers:

SQL: solr.sentry.enableSqlQuery=true
Graph: solr.sentry.enableGraphQuery=true
Stream: solr.sentry.enableStreams=true

Workaround: None

Affected Versions: All CDH 6 releases

Cloudera Issue: CDH-66345

Collection Creation No Longer Supports Automatically Selecting A Configuration If Only One Exists

Before CDH 5.5.0, a collection could be created without specifying a configuration. If no -c value was specified, then:

If there was only one configuration, that configuration was chosen.
If the collection name matched a configuration name, that configuration was chosen.

If you enable TLS/SSL on a Solr instance with existing collections, the collections will break and become unavailable. Collections created after enabling TLS/SSL are not affected by this issue.

Workaround: Recreate the collection after enabling TLS. For more information, see How to update existing collections in Non-SSL to SSL in Solr.

Affected Versions: All

Cloudera Issue: CDPD-4139

Apache Sentry Known Issues

Sentry does not support Kafka topic name with more than 64 characters

A Kafka topic name can have 249 characters, but Sentry only supports topic names up to 64 characters.

Workaround: Keep Kafka topic names to 64 charcters or less.

Affected Versions: All CDH 5.x and 6.x versions

Cloudera Issue: CDH-64317

When granting privileges, a single transaction per grant causes long delays

Sentry takes a long time to grant or revoke a large number of column-level privileges that are requested in a single statement. For example if you execute the following command:

GRANT SELECT(col1, col2, …) ON TABLE table1;

Sentry applies the grants to each column separately and the refresh process causes long delays.

Workaround: Split the grant statement up into smaller chunks. This prevents the refresh process from causing delays.

Affected Versions:

CDH: 5.14.4
CDH: 5.15.1
CDH: 5.16.0
CDH: 6.1.0

Fixed Versions:

CDH 5.16.1 and above
CDH 6.2.0 and above

Cloudera Issue: CDH-74982

SHOW ROLE GRANT GROUP raises exception for a group that was never granted a role

If you run the command SHOW ROLE GRANT GROUP for a group that has never been granted a role, beeline raises an exception. However, if you run the same command for a group that does not have any roles, but has at one time been granted a role, you do not get an exception, but instead get an empty list of roles granted to the group.

Workaround: Adding a role will prevent the exception.

Affected Versions:

CDH 5.16.0
CDH 6.0.0

Cloudera Issue: CDH-71694

GRANT/REVOKE operations could fail if there are too many concurrent requests

Under a significant workload, Grant/Revoke operations can have issues.

Workaround: If you need to make many privilege changes, plan them at a time when you do not need to do too many at once.

Affected Versions: CDH 5.13.0 and above

Apache Issue: SENTRY-1855

Cloudera Issue: CDH-56553

Creating large set of Sentry roles results in performance problems

Using more than a thousand roles/permissions might cause significant performance problems.

Workaround: Plan your roles so that groups have as few roles as possible and roles have as few permissions as possible.

Affected Versions: CDH 5.13.0 and above

Cloudera Issue: CDH-59010

Users can't track jobs with Hive and Sentry

As a prerequisite of enabling Sentry, Hive impersonation is turned off, which means all YARN jobs are submitted to the Hive job queue, and are run as the hive user. This is an issue because the YARN History Server now has to block users from accessing logs for their own jobs, since their own usernames are not associated with the jobs. As a result, end users cannot access any job logs unless they can get sudo access to the cluster as the hdfs, hive or other admin users.

In CDH 5.8 (and higher), Hive overrides the default configuration, mapred.job.queuename, and places incoming jobs into the connected user's job queue, even though the submitting user remains hive. Hive obtains the relevant queue/username information for each job by using YARN's fair-scheduler.xml file.

Affected Versions: CDH 5.2.0 and above

Cloudera Issue: CDH-22890

Column-level privileges are not supported on Hive Metastore views

GRANT and REVOKE for column level privileges is not supported on Hive Metastore views.

Affected Versions: All CDH versions

Apache Issue: SENTRY-754

SELECT privilege on all columns does not equate to SELECT privilege on table

Users who have been explicitly granted the SELECT privilege on all columns of a table, will not have the permission to perform table-level operations. For example, operations such as SELECT COUNT (1) or SELECT COUNT (*) will not work even if you have the SELECT privilege on all columns.

There is one exception to this. The SELECT * FROM TABLE command will work even if you do not have explicit table-level access.

Affected Versions: All CDH versions

Apache Issue: SENTRY-838

The EXPLAIN SELECT operation works without table or column-level privileges

Users are able to run the EXPLAIN SELECT operation, exposing metadata for all columns, even for tables/columns to which they weren't explicitly granted access.

Affected Versions: All CDH versions

Apache Issue: SENTRY-849

Object types Server and URI are not supported in `SHOW GRANT ROLE roleName on OBJECT objectName`

Workaround:Use SHOW GRANT ROLE roleNameto list all privileges granted to the role.

Affected Versions: All CDH versions

Apache Issue: N/A

Cloudera Issue: CDH-19430

Relative URI paths not supported by Sentry

Sentry supports only absolute (not relative) URI paths in permission grants. Although some early releases (for example, CDH 5.7.0) might not have raised explicit errors when relative paths were set, upgrading a system that uses relative paths causes the system to lose Sentry permissions.

Resolution: Revoke privileges that have been set using relative paths, and grant permissions using absolute paths before upgrading.

Affected Versions: All versions. Relative paths are not supported in Sentry for permission grants.

Absolute (Use this form)	Relative (Do not use this form)
hdfs://absolute/path/	hdfs://relative/path
s3a://bucketname/	s3a://bucketname

Apache Spark Known Issues

The following sections describe the current known issues and limitations in Apache Spark 2.x as distributed with CDH 6. In some cases, a feature from the upstream Apache Spark project is currently not considered reliable enough to be supported by Cloudera.

Continue reading:

Shuffle+Repartition on a DataFrame could lead to incorrect answers
Shuffle+Repartition on an RDD could lead to incorrect answers
PySpark broadcast variables fail when disk encryption is enabled
Spark Streaming jobs loop if missing Kafka topic
Spark SQL does not respect size limit for the varchar type
Spark SQL does not prevent you from writing key types not supported by Avro tables
Spark SQL does not support timestamp in Avro tables
Spark SQL does not respect Sentry ACLs when communicating with Hive metastore
Dynamic allocation and Spark Streaming
Limitation with Region Pruning for HBase Tables
Running spark-submit with --principal and --keytab arguments does not work in client mode
The --proxy-user argument does not work in client mode
Long-running apps on a secure cluster might fail if driver is restarted
History link in ResourceManager web UI broken for killed Spark applications
ORC file format is not supported

Shuffle+Repartition on a DataFrame could lead to incorrect answers

When a repartition follows a shuffle, the assignment of rows to partitions is nondeterministic. If Spark has to recompute a partition, for example, due to an executor failure, the retry can consume a different set of input rows than the original computation. As a result, some rows can be dropped, and others can be duplicated.

Products affected: CDS Powered By Apache Spark

Affected versions:

CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1
CDS 2.1.0 release 1, release 2
CDS 2.2.0 release 1, release 2

Fixed versions:

CDH 6.2.0, 6.3.0
CDS 2.1.0 release 3
CDS 2.2.0 release 3
CDS 2.3.0 release 3

For the latest update on this issue see the corresponding Knowledge article: TSB 2019-337-3: Shuffle+Repartition on a DataFrame could lead to incorrect answers

Shuffle+Repartition on an RDD could lead to incorrect answers

When a repartition follows a shuffle, the assignment of records to partitions is nondeterministic. If Spark has to recompute a partition, for example, due to an executor failure, the retry can consume a different set of input records than the original computation. As a result, some records can be dropped, and others can be duplicated.

Products affected: CDS Powered By Apache Spark

Affected versions:

CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1
CDS 2.1.0 release 1, release 2, release 3
CDS 2.2.0 release 1, release 2, release 3
CDS 2.3.0 release 1, release 2, release 3

Fixed versions:

CDH 6.2.0, 6.3.0
CDS 2.1.0 release 4
CDS 2.2.0 release 4
CDS 2.3.0 release 4

For the latest update on this issue see the corresponding Knowledge article: TSB 2019-337-4: Shuffle+Repartition on an RDD could lead to incorrect answers

PySpark broadcast variables fail when disk encryption is enabled

When disk encryption is enabled, PySpark broadcast variables fail with the following stack trace:

Traceback (most recent call last): File "broadcast.py", line 37, in <module>
words_new.value File "/pyspark.zip/pyspark/broadcast.py", line 137, in value
File "pyspark.zip/pyspark/broadcast.py", line 122, in load_from_path File
"pyspark.zip/pyspark/broadcast.py", line 128, in load EOFError: Ran out of input

Workaround: None

Affected Versions: CDH 6.0.1, CDH 6.1.0

Fixed Versions: CDH 6.1.1, CDH 6.2.0

Apache Issue: SPARK-26201

Cloudera Issue: CDH-76055

Spark Streaming jobs loop if missing Kafka topic

Spark jobs can loop endlessly if the Kafka topic is deleted while a Kafka streaming job (which uses KafkaSource) is in progress.

Workaround: Stop a job before deleting a Kafka topic.

Affected Versions: All

Cloudera Issue: CDH-57903, CDH-64513

Spark SQL does not respect size limit for the varchar type

Spark SQL treats varchar as a string (that is, there no size limit). The observed behavior is that Spark reads and writes these columns as regular strings; if inserted values exceed the size limit, no error will occur. The data will be truncated when read from Hive, but not when read from Spark.

Workaround: None

Affected Versions: CDH 5.5.0 and higher

Apache Issue: SPARK-5918

Cloudera Issue: CDH-33642

Spark SQL does not prevent you from writing key types not supported by Avro tables

Spark allows you to declare DataFrames with any key type. Avro supports only string keys and trying to write any other key type to an Avro table will fail.

Workaround: None

Affected Versions: CDH 5.5.0 and higher

Cloudera Issue: CDH-33648

Spark SQL does not support timestamp in Avro tables

Workaround: None

Affected Versions: CDH 5.5.0 and higher

Cloudera Issue: CDH-33649

Spark SQL does not respect Sentry ACLs when communicating with Hive metastore

Even if user is configured via Sentry to not have read permission to a Hive table, a Spark SQL job running as that user can still read the table's metadata directly from the Hive metastore. Cloudera Issue: CDH-76468

Dynamic allocation and Spark Streaming

If you are using Spark Streaming, Cloudera recommends that you disable dynamic allocation by setting spark.dynamicAllocation.enabled to false when running streaming applications.

Limitation with Region Pruning for HBase Tables

When SparkSQL accesses an HBase table through the HiveContext, region pruning is not performed. This limitation can result in slower performance for some SparkSQL queries against tables that use the HBase SerDes than when the same table is accessed through Impala or Hive.

Workaround: None

Affected Versions: All

Cloudera Issue: CDH-56330

Running `spark-submit` with `--principal` and `--keytab` arguments does not work in client mode

The spark-submit script's --principal and --keytab arguments do not work with Spark-on-YARN's client mode.

Workaround: Use cluster mode instead.

Affected Versions: All

The `--proxy-user` argument does not work in client mode

Using the --proxy-user argument in client mode does not work and is not supported.

Workaround: Use cluster mode instead.

Affected Versions: All

Long-running apps on a secure cluster might fail if driver is restarted

If you submit a long-running app on a secure cluster using the --principal and --keytab options in cluster mode, and a failure causes the driver to restart after 7 days (the default maximum HDFS delegation token lifetime), the new driver fails with an error similar to the following:

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token <token_info> can't be found in cache

Workaround: None

Affected Versions: CDH 6.0

Apache Issue: SPARK-23361

Cloudera Issue: CDH-64865

History link in ResourceManager web UI broken for killed Spark applications

When a Spark application is killed, the history link in the ResourceManager web UI does not work.

Workaround: To view the history for a killed Spark application, see the Spark HistoryServer web UI instead.

Affected Versions: All CDH versions

Apache Issue: None

Cloudera Issue: CDH-49165

ORC file format is not supported

Currently, Cloudera does not support reading and writing Hive tables containing data files in the Apache ORC (Optimized Row Columnar) format from Spark applications. Cloudera recommends using Apache Parquet format for columnar data. That file format can be used with Spark, Hive, and Impala.

Apache Sqoop Known Issues

Column names cannot start with a number when importing data with the --as-parquetfile option.

Currently, Sqoop is using an Avro schema when writing data as a parquet file. The Avro schema requires that column names do not start with numbers, therefore Sqoop is renaming the columns in this case, prepending them with an underscore character. This can lead to issues when one wants to reuse the data in other tools, such as Impala.

Workaround: Rename the columns to comply with Avro limitations (start with letters or underscore, as specified in the Avro documentation).

Cloudera Issue: None

MySQL JDBC driver shipped with CentOS 6 systems does not work with Sqoop

CentOS 6 systems currently ship with version 5.1.17 of the MySQL JDBC driver. This version does not work correctly with Sqoop.

Workaround: Install version 5.1.31 of the JDBC driver as detailed in Installing the JDBC Drivers for Sqoop 1.

Affected Versions: MySQL JDBC 5.1.17, 5.1.4, 5.3.0

Cloudera Issue: CDH-23180

MS SQL Server "integratedSecurity" option unavailable in Sqoop

The integratedSecurity option is not available in the Sqoop CLI.

Workaround: None

Cloudera Issue: None

Sqoop1 (doc import + --as-parquetfile) limitation with KMS/KTS Encryption at Rest

Due to a limitation with Kite SDK, it is not possible to use (sqoop import --as-parquetfile) with KMS/KTS Encryption zones. See the following example.

sqoop import --connect jdbc:db2://djaxludb1001:61035/DDBAT003 --username=dh810202 --P --target-dir /data/hive_scratch/ASDISBURSEMENT --delete-target-dir -m1 --query "select disbursementnumber,disbursementdate,xmldata FROM DB2dba.ASDISBURSEMENT where DISBURSEMENTNUMBER = 2011113210000115311 AND \$CONDITIONS" -hive-import --hive-database adminserver -hive-table asdisbursement_dave --map-column-java XMLDATA=String --as-parquetfile

16/12/05 12:23:46 INFO mapreduce.Job: map 100% reduce 0%
16/12/05 12:23:46 INFO mapreduce.Job: Job job_1480530522947_0096 failed with state FAILED due to: Job commit failed: org.kitesdk.data.DatasetIOException: Could not move contents of hdfs://AJAX01-ns/tmp/adminserver/.temp/job_1480530522947_0096/mr/job_1480530522947_0096 to hdfs://AJAX01-ns/data/RetiredApps/INS/AdminServer/asdisbursement_dave
<SNIP>
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): /tmp/adminserver/.temp/job_1480530522947_0096/mr/job_1480530522947_0096/5ddcac42-5d69-4e46-88c2-17bbedac4858.parquet can't be moved into an encryption zone.

Workaround: If you use the Parquet Hadoop API based implementation for importing into Parquet, specify a --target-dir which is the same encryption zone as the Hive warehouse directory.

If you use the Kite Dataset API based implementation, use an alternate data file type, for example text or avro.

Apache Issue: SQOOP-2943

Cloudera Issue: CDH-40826

Doc import as Parquet files may result in out-of-memory errors

Out-of-memory (OOM) errors can be caused in the following two cases:

With many very large rows (multiple megabytes per row) before initial-page-run check (ColumnWriter)
When rows vary significantly by size so that the next-page-size check is based on small rows and is set very high followed by many large rows

Workaround: None, other than restructuring the data.

Apache Issue: PARQUET-99

Apache ZooKeeper Known Issues

ZooKeeper JMX did not support TLS when managed by Cloudera Manager

Technical Service Bulletin 2019-310 (TSB)

The ZooKeeper service optionally exposes a JMX port used for reporting and metrics. By default, Cloudera Manager enables this port, but prior to Cloudera Manager 6.1.0, it did not support mutual TLS authentication on this connection. While JMX has a password-based authentication mechanism that Cloudera Manager enables by default, weaknesses have been found in the authentication mechanism, and Oracle now advises JMX connections to enable mutual TLS authentication in addition to password-based authentication. A successful attack may leak data, cause denial of service, or even allow arbitrary code execution on the Java process that exposes a JMX port. Beginning in Cloudera Manager 6.1.0, it is possible to configure mutual TLS authentication on ZooKeeper’s JMX port.

Products affected: ZooKeeper

Releases affected: Cloudera Manager 6.1.0 and lower, Cloudera Manager 5.16 and lower

Users affected: All

Date/time of detection: June 7, 2018

Severity (Low/Medium/High): 9.8 High (CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)

Impact: Remote code execution

CVE: CVE-2018-11744

Immediate action required: Upgrade to Cloudera Manager 6.1.0 and enable TLS for the ZooKeeper JMX port by turning on the configuration settings “Enable TLS/SSL for ZooKeeper JMX” and “Enable TLS client authentication for JMX port” on the ZooKeeper service and configuring the appropriate TLS settings. Alternatively, disable the ZooKeeper JMX port via the configuration setting “Enable JMX Agent” on the ZooKeeper service.

Addressed in release/refresh/patch: Cloudera Manager 6.1.0

Timezone Names Unsupported in Impala in CDH 6.0.1

CDH 6.0.0 Release Notes

Known Issues and Limitations in CDH 6.0.1

Operating System Known Issues

Linux kernel security patch and CDH services crashes CVE-2017-10000364

Workaround

Leap-Second Events

Apache Accumulo Known Issues

Cloudera Data Science Workbench

Apache Crunch Known Issues

Apache Flume Known Issues

Fast Replay does not work with encrypted File Channel

Apache Hadoop Known Issues

Deprecated Properties

Hadoop Common

KMS Load Balancing Provider Fails to invalidate Cache on Key Delete

Hadoop LdapGroupsMapping does not support LDAPS for self-signed LDAP server

HDFS

Possible HDFS Erasure Coded (EC) Data Files Corruption in EC Reconstruction

HDFS Snapshot corruption

CVE-2018-1296 Permissive Apache Hadoop HDFS listXAttr Authorization Exposes Extended Attribute Key/Value Pairs

OIV ReverseXML processor fails

Cannot move encrypted files to trash

HDFS NFS gateway and CDH installation (using packages) limitation

No error when changing permission to 777 on .snapshot directory

Snapshot operations are not supported by ViewFileSystem

Snapshots do not retain directories' quotas settings

Permissions for dfs.namenode.name.dir incorrectly set

hadoop fsck -move does not work in a cluster with host-based Kerberos

Block report can exceed maximum RPC buffer size on some DataNodes

MapReduce2 and YARN

YARN Resource Managers will stay in standby state after failover or startup

NodeManager fails because of the changed default location of container executor binary

The Standby Resource Manager redirects /jmx and /metrics requests to the Active Resource Manager.

YARN's Continuous Scheduling can cause slowness in Oozie

JobHistory URL mismatch after server relocation

History link in ResourceManager web UI broken for killed Spark applications

Routable IP address required by ResourceManager

Amazon S3 copy may time out

Apache HBase Known Issues

Cloudera Navigator plugin impacts HBase performance

HBASE-25206: snapshot and cloned table corruption when original table is deleted

HBase Performance Issue

Default limits for PressureAwareCompactionThroughputController are too low

Data loss with restore snapshot

CDH users must not use Apache HBase's OfflineMetaRepair tool

Multiple HBase Services on the Same CDH Cluster is not Supported

IOException from Timeouts

IntegrationTestReplication fails if replication does not finish before the verify phase begins

HDFS encryption with HBase

ExportSnapshot or DistCp operations may fail on the Amazon s3a:// protocol

An operating-system level tuning issue in RHEL7 causes significant latency regressions

Export to Azure Blob Storage (the wasb:// or wasbs:// protocol) is not supported

AccessController postOperation problems in asynchronous operations

Apache Hive / HCatalog / Hive on Spark Known Issues

When vectorization is enabled on any file type (ORC, Parquet) queries that divide by zero using the modulo operator (%) return an error

When vectorization is enabled for Hive on any file type (ORC, Parquet) queries that perform comparisons in the SELECT clause on large values in columns with the data type of BIGINT might return wrong results

Specified column position in the ORDER BY clause is not supported for SELECT * queries

DirectSQL with PostgreSQL

ALTER PARTITION … SET LOCATION does not work on Amazon S3 or between S3 and HDFS

Commands run against an Oracle-backed metastore might fail

Cannot create archive partitions with external HAR (Hadoop Archive) tables

Object types Server and URI are not supported in "SHOW GRANT ROLE roleName on OBJECT objectName" statements

HCatalog Known Issues

Hive on Spark (HoS) Known Issues

Hive on Spark queries fail with "Timed out waiting for client to connect" for an unknown reason

Hue Known Issues

Cloudera Hue is vulnerable to Cross-Site Scripting attacks

Hue allows unsigned SAML assertions

Hue external users granted super user priviliges in C6

Hue does not support the Spark App

Connecting to PostgreSQL Database Fails with Error "No module named psycopg2"

Apache Impala Known Issues

Impala Known Issues: Startup

Impala requires FQDN from hostname command on kerberized clusters

Impala Known Issues: Crashes and Hangs

Unable to view large catalog objects in catalogd Web UI

Impala Known Issues: Performance

Metadata operations block read-only operations on unrelated tables

Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true

Impala Known Issues: Security

Impala logs the session / operation secret on most RPCs at INFO level

`IntegrationTestReplication` fails if replication does not finish before the `verify` phase begins

`ExportSnapshot` or `DistCp` operations may fail on the Amazon `s3a://` protocol

Export to Azure Blob Storage (the `wasb://` or `wasbs://` protocol) is not supported

Object types `Server` and `URI` are not supported in `"SHOW GRANT ROLE roleName on OBJECT objectName"` statements

The `quickstart.sh` file does not validate ZooKeeper and the NameNode on some operating systems