Known Issues and Limitations in CDH 6.0.1

Operating System Known Issues

Known issues and workarounds related to operating systems are listed below.

Linux kernel security patch and CDH services crashes CVE-2017-10000364

After applying a recent Linux kernel security patch for CVE-2017-1000364, CDH services that use the JSVC set of libraries crash with a Java Virtual Machine (JVM) error such as:
A fatal error has been detected by the Java Runtime Environment:
SIGBUS (0x7) at pc=0x00007fe91ef6cebc, pid=30321, tid=0x00007fe930c67700

Cloudera services for HDFS and Impala cannot start after applying the patch.

Commonly used Linux distributions are shown in the table below. However, the issue affects any CDH release that runs on RHEL, CentOS, Oracle Linux, SUSE Linux, or Ubuntu and that has had the Linux kernel security patch for CVE-2017-1000364 applied.

If you have already applied the patch for your OS according to the advisories for CVE-2017-1000364, apply the kernel update that contains the fix for your operating system (some of which are listed in the table). If you cannot apply the kernel update, you can workaround the issue by increasing the Java thread stack size as detailed in the steps below.

Distribution Advisories for CVE-2017-1000364 Advisory updates
Oracle Linux 6 ELSA-2017-1486 Oracle has fixed this problem in ELSA-2017-1723.
Oracle Linux 7 ELSA-2017-1484 Oracle has also added the fix for Oracle Linux 7 in ELBA-2017-1674.
RHEL 6 RHSA-2017-1486 RedHat has fixed this problem for RHEL 6, marked this as outdated and superseded by RHSA-2017-1723.
RHEL 7 RHSA-2017-1484 RedHat has fixed this problem for RHEL 7 and has marked this patch as outdated and superseded by RHBA-2017-1674.
SLES CVE-2017-1000364 SUSE has also fixed this problem and the patch names are included in this same advisory.

Workaround

If you cannot apply the kernel update, you can set the Java thread stack size to -Xss1280k for the affected services using the appropriate Java configuration option or the environment advanced configuration snippet, as detailed below.

For role instances that have specific Java configuration options properties:

  1. Log in to Cloudera Manager Admin Console.
  2. Select Clusters > Impala, and then click the Configuration tab.
  3. Type java in the search field to display Java related configuration parameters. The Java Configuration Options for Catalog Server property field displays. Type -Xss1280k in the entry field, adding to any existing settings.
  4. Click Save Changes.
  5. Navigate to the HDFS service by selecting Clusters > HDFS.
  6. Click the Configuration tab.
  7. Click the Scope filter DataNode. The Java Configuration Options for DataNode field displays among the properties listed. Enter -Xss1280k into the field, adding to any existing properties.
  8. Click Save Changes.
  9. Select the Scope filter NFS Gateway. The Java Configuration Options for NFS Gateway field displays among the properties listed. Enter -Xss1280k into the field, adding to any existing properties.
  10. Click Save Changes.
  11. Restart the affected roles (or configure the safety valves in next section and restart when finished with all configurations).

For role instances that do not have specific Java configuration options:

  1. Log in to Cloudera Manager Admin Console.
  2. Select Clusters > Impala, and then click the Configuration tab.
  3. Click the Scope filter Impala Daemon and Category filter Advanced.
  4. Type impala daemon environment in the search field to find the safety valve entry field.
  5. In the Impala Daemon Environment Advanced Configuration Snippet (Safety Valve), enter:
    JAVA_TOOL_OPTIONS=-Xss1280K
  6. Click Save Changes.
  7. Click the Scope filter Impala StateStore and Category filter Advanced.
  8. In the Impala StateStore Environment Advanced Configuration Snippet (Safety Valve), enter:
    JAVA_TOOL_OPTIONS=-Xss1280K
  9. Click Save Changes.
  10. Restart the affected roles.

The table below summarizes the parameters that can be set for the affected services:

Service Settable Java Configuration Option
HDFS DataNode Java Configuration Options for DataNode
HDFS NFS Gateway Java Configuration Options for NFS Gateway
Impala Catalog Server Java Configuration Options for Catalog Server
Impala Daemon Impala Daemon Environment Advanced Configuration Snippet (Safety Valve)
JAVA_TOOL_OPTIONS=-Xss1280K
Impala StateStore Impala StateStore Environment Advanced Configuration Snippet (Safety Valve)
JAVA_TOOL_OPTIONS=-Xss1280K

Cloudera Issue: CDH-55771

Leap-Second Events

Impact: After a leap-second event, Java applications (including CDH services) using older Java and Linux kernel versions, may consume almost 100% CPU. See https://access.redhat.com/articles/15145.

Leap-second events are tied to the time synchronization methods of the Linux kernel, the Linux distribution and version, and the Java version used by applications running on affected kernels.

Although Java is increasingly agnostic to system clock progression (and less susceptible to a kernel's mishandling of a leap-second event), using JDK 7 or 8 should prevent issues at the CDH level (for CDH components that use the Java Virtual Machine).

Immediate action required:

(1) Ensure that the kernel is up to date.

  • RHEL6/7, CentOS 6/7 - 2.6.32-298 or higher

  • Oracle Enterprise Linux (OEL) - Kernels built in 2013 or later

  • SLES12 - No action required.

(2) Ensure that your Java JDKs are current (especially if the kernel is not up to date and cannot be upgraded).
  • Java 8 - No action required.

(3) Ensure that your systems use either NTP or PTP synchronization.

For systems not using time synchronization, update both the OS tzdata and Java tzdata packages to the tzdata-2016g version, at a minimum. For OS tzdata package updates, contact OS support or check updated OS repositories. For Java tzdata package updates, see Oracle's Timezone Updater Tool.

Cloudera Issue: CDH-44788, TSB-189

Apache Accumulo Known Issues

Running Apache Accumulo on top of a CDH 6.0.x cluster is not currently supported. If you try to upgrade to CDH 6.0.x you will be asked to remove the Accumulo service from your cluster. Running Accumulo on top of CDH 6 will be supported in a future release.

Affected Versions: CDH 6.0.x

Cloudera Data Science Workbench

Cloudera Data Science Workbench is not supported with CDH 6.0.x. Cloudera Data Science Workbench 1.5.0 (and higher) is supported with CDH 6.1.x (and higher).

Cloudera Issue: DSE-2769

Apache Crunch Known Issues

Apache Flume Known Issues

Fast Replay does not work with encrypted File Channel

If an encrypted file channel is set to use fast replay, the replay will fail and the channel will fail to start.

Workaround: Disable fast replay for the encrypted channel by setting use-fast-replay to false.

Apache Issue: FLUME-1885

Apache Hadoop Known Issues

This page includes known issues and related topics, including:

Deprecated Properties

Several Hadoop and HDFS properties have been deprecated as of Hadoop 3.0 and later. For details, see Deprecated Properties.

Hadoop Common

Hadoop LdapGroupsMapping does not support LDAPS for self-signed LDAP server

Hadoop LdapGroupsMapping does not work with LDAP over SSL (LDAPS) if the LDAP server certificate is self-signed. This use case is currently not supported even if Hadoop User Group Mapping LDAP TLS/SSL Enabled, Hadoop User Group Mapping LDAP TLS/SSL Truststore, and Hadoop User Group Mapping LDAP TLS/SSL Truststore Password are filled properly.

Affected Versions: All CDH versions

Apache Issue: HADOOP-12862

Cloudera Issue: CDH-37926

HDFS

OIV ReverseXML processor fails

The HDFS OIV ReverseXML processor fails if the XML file contains escaped characters.

Affected Versions: CDH 6.x

Apache Issue: HDFS-12828

Cannot move encrypted files to trash

With HDFS encryption enabled, you cannot move encrypted files or directories to the trash directory.

Workaround: To remove encrypted files/directories, use the following command with the -skipTrash flag specified to bypass trash.
rm -r -skipTrash /testdir

Affected Versions: All CDH versions

Apache Issue: HADOOP-10902

HDFS NFS gateway and CDH installation (using packages) limitation

HDFS NFS gateway works as shipped ("out of the box") only on RHEL-compatible systems, but not on SLES or Ubuntu. Because of a bug in native versions of portmap/rpcbind, the HDFS NFS gateway does not work out of the box on SLES or Ubuntu systems when CDH has been installed from the command-line, using packages. It does work on supported versions of RHEL-compatible systems on which rpcbind-0.2.0-10.el6 or later is installed, and it does work if you use Cloudera Manager to install CDH, or if you start the gateway as root. For more information, see CDH and Cloudera Manager Supported Operating Systems.

Workarounds and caveats:
  • On Red Hat and similar systems, make sure rpcbind-0.2.0-10.el6 or later is installed.
  • On SLES and Ubuntu systems, do one of the following:
    • Install CDH using Cloudera Manager; or
    • Start the NFS gateway as root; or
    • Start the NFS gateway without using packages; or
    • You can use the gateway by running rpcbind in insecure mode, using the -i option, but keep in mind that this allows anyone from a remote host to bind to the portmap.

Upstream Issue: 731542 (Red Hat), 823364 (SLES)

No error when changing permission to 777 on .snapshot directory

Snapshots are read-only; running chmod 777 on the .snapshots directory does not change this, but does not produce an error (though other illegal operations do).

Affected Versions: All CDH versions

Apache Issue: HDFS-4981

Snapshot operations are not supported by ViewFileSystem

Affected Versions: All CDH versions

Snapshots do not retain directories' quotas settings

Affected Versions: All CDH versions

Apache Issue: HDFS-4897

Permissions for dfs.namenode.name.dir incorrectly set

Hadoop daemons should set permissions for the dfs.namenode.name.dir (or dfs.name.dir) directories to drwx------ (700), but in fact these permissions are set to the file-system default, usually drwxr-xr-x (755).

Workaround: Use chmod to set permissions to 700.

Affected Versions: All CDH versions

Apache Issue: HDFS-2470

hadoop fsck -move does not work in a cluster with host-based Kerberos

Workaround: Use hadoop fsck -delete

Affected Versions: All CDH versions

Apache Issue: None

Block report can exceed maximum RPC buffer size on some DataNodes

On a DataNode with a large number of blocks, the block report may exceed the maximum RPC buffer size.

Workaround: Increase the value ipc.maximum.data.length in hdfs-site.xml:
<property>
  <name>ipc.maximum.data.length</name>
  <value>268435456</value>
</property>

Affected Versions: All CDH versions

Apache Issue: None

MapReduce2 and YARN

The Standby Resource Manager redirects /jmx and /metrics requests to the Active Resource Manager.

When ResourceManager high availability is enabled the Standby Resource Manager redirects /jmx and /metrics requests to the Active Resource Manager. This causes the following issues in Cloudera Manager:
  • If Enable Kerberos Authentication for HTTP Web-Console is disabled: Cloudera Manager shows statistics for the wrong server.
  • If Enable Kerberos Authentication for HTTP Web-Console is enabled: connection from the agent to the standby fails with the HTTPError: HTTP Error 401: Authentication required error message. As a result, the health of the Standby Resource Manager will become bad.

Workaround: N/A

Affected Versions: CDH 6.0.x, CDH 6.1.0

Fixed Version: CDH 6.1.1

Cloudera Issue: CDH-76040

YARN's Continuous Scheduling can cause slowness in Oozie

When Continuous Scheduling is enabled in Yarn, this can cause slowness in Oozie due to long delays in communicating with Yarn. In Cloudera Manager 5.9.0 and higher, Enable Fair Scheduler Continuous Scheduler is turned off by default.

Workaround: Turn off Enable Fair Scheduler Continuous Scheduling in Cloudera Manager YARN Configuration. To keep equivalent benefits of this feature, turn on Fair Scheduler Assign Multiple Tasks.

Affected Versions: All CDH versions

Cloudera Issue: CDH-60788

JobHistory URL mismatch after server relocation

After moving the JobHistory Server to a new host, the URLs listed for the JobHistory Server on the ResourceManager web UI still point to the old JobHistory Server. This affects existing jobs only. New jobs started after the move are not affected.

Workaround: For any existing jobs that have the incorrect JobHistory Server URL, there is no option other than to allow the jobs to roll off the history over time. For new jobs, make sure that all clients have the updated mapred-site.xml that references the correct JobHistory Server.

Affected Versions: All CDH versions

Apache Issue: None

History link in ResourceManager web UI broken for killed Spark applications

When a Spark application is killed, the history link in the ResourceManager web UI does not work.

Workaround: To view the history for a killed Spark application, see the Spark HistoryServer web UI instead.

Affected Versions: All CDH versions

Apache Issue: None

Cloudera Issue: CDH-49165

Routable IP address required by ResourceManager

ResourceManager requires routable host:port addresses for yarn.resourcemanager.scheduler.address, and does not support using the wildcard 0.0.0.0 address.

Workaround: Set the address, in the form host:port, either in the client-side configuration, or on the command line when you submit the job.

Affected Versions: All CDH versions

Apache Issue: None

Cloudera Issue: CDH-6808

Amazon S3 copy may time out

The Amazon S3 filesystem does not support renaming files, and performs a copy operation instead. If the file to be moved is very large, the operation can time out because S3 does not report progress during the operation.

Workaround: Use -Dmapred.task.timeout=15000000 to increase the MR task timeout.

Affected Versions: All CDH versions

Apache Issue: MAPREDUCE-972

Cloudera Issue: CDH-17955

Apache HBase Known Issues

IOException from Timeouts

CDH 5.12.0 includes the fix HBASE-16604, where the internal scanner that retries in case of IOException from timeouts could potentially miss data. Java clients were properly updated to account for the new behavior, but thrift clients will now see exceptions where the previous missing data would be.

Workaround: Create a new scanner and retry the operation when encountering this issue.

IntegrationTestReplication fails if replication does not finish before the verify phase begins

During IntegrationTestReplication, if the verify phase starts before the replication phase finishes, the test will fail because the target cluster does not contain all of the data. If the HBase services in the target cluster does not have enough memory, long garbage-collection pauses might occur.

Workaround: Use the -t flag to set the timeout value before starting verification.

Cloudera Issue: None.

HDFS encryption with HBase

Cloudera has tested the performance impact of using HDFS encryption with HBase. The overall overhead of HDFS encryption on HBase performance is in the range of 3 to 4% for both read and update workloads. Scan performance has not been thoroughly tested.

ExportSnapshot or DistCp operations may fail on the Amazon s3a:// protocol

ExportSnapshot or DistCP operations may fail on AWS when using certain JDK 8 versions, due to an incompatibility between the AWS Java SDK 1.9.x and the joda-time date-parsing module.

Workaround: Use joda-time 2.8.1 or higher, which is included in AWS Java SDK 1.10.1 or higher.

Cloudera Issue: None.

An operating-system level tuning issue in RHEL7 causes 30% latency regressions

Workaround: Avoid using RHEL 7 if you have a latency-critical workload. For a cached workload, consider tuning the C-state (power-saving) behavior of your CPUs.

Cloudera Issue: CDH-32642

Severity: Medium

A RegionServer under extreme duress due to back-to-back garbage collection combined with heavy load on HDFS can lock up while attempting to append to the WAL

The RegionServer appears operational to ZooKeeper, and continues to host regions, but cannot complete any writes. The most obvious symptom is that all writes to regions on this RegionServer time out, and the RegionServer log shows no progress other than queuing of flushes that never complete. Log messages such as the following may occur:
124028 2015-11-14 05:54:48,659 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 42911ms instead of 3000ms,
  this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.#
124029 2015-11-14 05:54:48,659 WARN org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or
  host machine (eg GC): pause of approximately 41110ms
1806 2015-11-14 04:58:09,952 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog:
  Slow sync cost: 2734 ms, current pipeline: [DatanodeInfoWithStorage[10.17.198.17:20002,DS-56e2cf88-f267-43a8-b964-b29858#
1807 2015-11-14 04:58:09,952 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog:
  Slow sync cost: 2963 ms, current pipeline: [DatanodeInfoWithStorage[10.17.198.17:20002,DS-56e2cf88-f267-43a8-b964-b29858#

Workaround: Restart the RegionServer. To avoid the problem, adjust garbage-collection settings, give the RegionServer more RAM, and reduce the load on HDFS.

Apache Issue: HBASE-14374

Export to Azure Blob Storage (the wasb:// or wasbs:// protocol) is not supported

CDH 5.3 and higher supports Azure Blob Storage for some applications. However, a null pointer exception occurs when you specify a wasb:// or wasbs:// location in the --copy-to option of the ExportSnapshot command or as the output directory (the second positional argument) of the Export command.

Workaround: None.

Apache Issue: HADOOP-12717

AccessController postOperation problems in asynchronous operations

When security and Access Control are enabled, the following problems occur:

  • If a Delete Table fails for a reason other than missing permissions, the access rights are removed but the table may still exist and may be used again.
  • If hbaseAdmin.modifyTable() is used to delete column families, the rights are not removed from the Access Control List (ACL) table. The postOperation is implemented only for postDeleteColumn().
  • If Create Table fails, full rights for that table persist for the user who attempted to create it. If another user later succeeds in creating the table, the user who made the failed attempt still has the full rights.

Workaround: None

Apache Issue: HBASE-6992

Apache Hive / HCatalog / Hive on Spark Known Issues

When vectorization is enabled on any file type (ORC, Parquet) queries that divide by zero using the modulo operator (%) return an error

When vectorization is enabled for Hive on any file type, including ORC and Parquet, if the query divides by zero using the modulo operator (%), it returns the following error: Arithmetic exception [divide by] 0. For example, if you run the following query this issue is triggered: SELECT 100 % column_c1 FROM table_t1; and the value in column_c1 is zero. The divide operator (/) is not affected by this issue.

Workaround: Disable vectorization for the query that is triggering this at either the session level by using the SET statement or at the server level by disabling the property with Cloudera Manager. For information about how to enable or disable query vectorization, see Enabling Hive Query Vectorization.

Affected Versions: When query vectorization is enabled for Hive, this issue affects Hive ORC tables in all versions of CDH and affects Hive Parquet tables in CDH 6.0 and later

Apache Issue: HIVE-19564

Cloudera Issue: CDH-71211

When vectorization is enabled for Hive on any file type (ORC, Parquet) queries that perform comparisons in the SELECT clause on large values in columns with the data type of BIGINT might return wrong results

When vectorization is enabled for Hive on any file type, including ORC and Parquet, if the query performs a comparison operation between very large values in columns that are BIGINT data types in the SELECT clause of the query, incorrect results might be returned. Comparison operators include ==, !=, <, <=, >, and >=. This issue does not occur when the comparison operation is performed in the filtering clause of the query. This issue can also occur when the difference of values in such columns is out of range for a LONG (64-bit) data type. For example, if column_c1 stores 8976171455044006767 and column_c2 stores -7272907770454997143, a query such as SELECT column_c1 < column_c2 FROM table_test returns true instead of false because the difference (8976171455044006767 - (-7272907770454997143)) is 1.6249079225499E19 which is greater than 9.22337203685478E18, which is the maximum possible value that a LONG (64-bit) data type can hold.

Workaround: Use a DECIMAL type instead of BIGINT for columns that might contain very large values. Another option is to disable vectorization for the query that is triggering this at either the session level by using the SET statement or at the server level by disabling the property with Cloudera Manager. For information about how to enable or disable query vectorization, see Enabling Hive Query Vectorization.

Affected Versions: When query vectorization is enabled for Hive, this issue affects Hive ORC tables in all versions of CDH and affects Hive Parquet tables in CDH 6.0 and later

Apache Issue: HIVE_20207

Cloudera Issue: CDH-70996

Specified column position in the ORDER BY clause is not supported for SELECT * queries

When column positions are specified in ORDER BY clauses, they are not honored for SELECT * queries and an error is returned as shown in the following example:
CREATE TABLE decimal_1 (id decimal(5,0));
SELECT * FROM decimal_1 ORDER BY 1 limit 100;
Error while compiling statement: FAILED: SemanticException [Error 10219]: Position in ORDER BY is not supported when using SELECT *
          
Instead the query must list out the columns it is selecting.

Affected Versions: CDH 6.0.0 and higher

Cloudera Issue: CDH-68550

DirectSQL with PostgreSQL

Hive doesn't support Hive direct SQL queries with PostgreSQL database. It only supports this feature with MySQL, MariaDB, and Oracle. With PostgresSQL, direct SQL is disabled as a precaution, since there have been issues reported upstream where it is not possible to fallback on DataNucleus in the event of some failures, plus other non-standard behaviors. For more information, see Hive Configuration Properties.

Affected Versions: All CDH versions

Cloudera Issue: CDH-49017

ALTER PARTITION … SET LOCATION does not work on Amazon S3 or between S3 and HDFS

Cloudera recommends that you do not use ALTER PARTITION … SET LOCATION on S3 or between S3 and HDFS. The rest of the ALTER PARTITION commands work as expected.

Affected Versions: All CDH versions

Cloudera Issue: CDH-42420

Commands run against an Oracle-backed metastore might fail

Commands run against an Oracle-backed Metastore fail with error:
javax.jdo.JDODataStoreException Incompatible data type for column TBLS.VIEW_EXPANDED_TEXT : was CLOB (datastore),
but type expected was LONGVARCHAR (metadata). Please check that the type in the datastore and the type specified in the MetaData are consistent.

This error might occur if the metastore is run on top of an Oracle database with the configuration property datanucleus.validateColumns set to true.

Workaround: Set datanucleus.validateColumns=false in the hive-site.xml configuration file.

Affected Versions: All CDH versions

Cannot create archive partitions with external HAR (Hadoop Archive) tables

ALTER TABLE ... ARCHIVE PARTITION is not supported on external tables.

Affected Versions: All CDH versions

Cloudera Issue: CDH-9638

Object types Server and URI are not supported in "SHOW GRANT ROLE roleName on OBJECT objectName" statements

Workaround: Use SHOW GRANT ROLE roleNameto list all privileges granted to the role.

Affected Versions: All CDH versions

Cloudera Issue: CDH-19430

HCatalog Known Issues

There are no notable known issues in this release of HCatalog.

Hive on Spark (HoS) Known Issues

Hive on Spark queries fail with "Timed out waiting for client to connect" for an unknown reason

If this exception is preceded by logs of the form "client.RpcRetryingCaller: Call exception...", then this failure is due to an unavailable HBase service. On a secure cluster, spark-submit will try to obtain delegation tokens from HBase, even though Hive on Spark might not need them. So if HBase is unavailable, spark-submit throws an exception.

Workaround: Fix the HBase service, or set spark.yarn.security.tokens.hbase.enabled to false.

Affected Versions: CDH 5.7.0 and higher

Cloudera Issues: CDH-59591, CDH-59599

Hue Known Issues

Hue does not support the Spark App

Hue does not currently support the Spark application.

Connecting to PostgreSQL Database Fails with Error "No module named psycopg2"

When configuring Hue to use a PostgreSQL database, the connection fails with the following error:

Error loading psycopg2 module: No module named psycopg2

Workaround: Install the psycopg2 Python package as documented in Installing the psycopg2 Python Package.

Affected Versions: All CDH 6 versions

Fixed Versions: None

Apache Issue: N/A

Cloudera Issue: CDH-65804

Apache Impala Known Issues

The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and whether a fix is in the pipeline.

Continue reading:

Impala Known Issues: Startup

These issues can prevent one or more Impala-related daemons from starting properly.

Impala requires FQDN from hostname command on kerberized clusters

The method Impala uses to retrieve the host name while constructing the Kerberos principal is the gethostname() system call. This function might not always return the fully qualified domain name, depending on the network configuration. If the daemons cannot determine the FQDN, Impala does not start on a kerberized cluster.

Workaround: Test if a host is affected by checking whether the output of the hostname command includes the FQDN. On hosts where hostname, only returns the short name, pass the command-line flag --hostname=fully_qualified_domain_name in the startup options of all Impala-related daemons.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-4978

Impala Known Issues: Crashes and Hangs

These issues can cause Impala to quit or become unresponsive.

Unable to view large catalog objects in catalogd Web UI

In catalogd Web UI, you can list metadata objects and view their details. These details are accessed via a link and printed to a string formatted using thrift's DebugProtocol. Printing large objects (> 1 GB) in Web UI can crash catalogd.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-6841

Impala Known Issues: Performance

These issues involve the performance of operations such as queries or DDL statements.

Metadata operations block read-only operations on unrelated tables

Metadata operations that change the state of a table, like COMPUTE STATS or ALTER RECOVER PARTITIONS, may delay metadata propagation of unrelated unloaded tables triggered by statements like DESCRIBE or SELECT queries.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-6671

Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true

The configuration setting convert_legacy_hive_parquet_utc_timestamps=true uses an underlying function that can be a bottleneck on high volume, highly concurrent queries due to the use of a global lock while loading time zone information. This bottleneck can cause slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount of slowdown depends on factors such as the number of cores and number of threads involved in the query.

Workaround: Store the TIMESTAMP values as strings in one of the following formats:
  • yyyy-MM-dd
  • yyyy-MM-dd HH:mm:ss
  • yyyy-MM-dd HH:mm:ss.SSSSSSSSS

    The date can have the 1-9 digits in the fractional part.

Impala implicitly converts such string values to TIMESTAMP in calls to date/time functions.

Affected Versions: CDH 6.0.x versions

Fixed Versions: CDH 6.1.0

Apache Issue: IMPALA-3316

Impala Known Issues: Security

These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and redaction.

In Impala with Sentry enabled, REVOKE ALL ON SERVER does not remove the privileges granted with the GRANT option

If you grant a role the ALL privilege at the SERVER scope with the WITH GRANT OPTION clause, you cannot revoke the privilege. Although the SHOW GRANT ROLE command will show that the privilege has been revoked immediately after you run the command, the ALL privilege will reappear when you run the SHOW GRANT ROLE command after Sentry refreshes.

Immediate Action Required: Once the privilege has been granted, the only way to remove it is to delete the role.

Affected Versions: CDH 6.0.0, CDH 6.0.1, CDH 5.15.0, CDH 5.15.1, CDH 5.14.x and all prior releases

Fixed Versions: CDH 6.1.0, CDH 6.0.2, CDH 5.16.0, CDH 5.15.2

Cloudera Issue: TSB-341

Impala does not support Heimdal Kerberos

Heimdal Kerberos is not supported in Impala.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-7072

Impala Known Issues: Resources

These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management features.

Handling large rows during upgrade to CDH 5.13 / Impala 2.10 or higher

After an upgrade to CDH 5.13 / Impala 2.10 or higher, users who process very large column values (long strings), or have increased the --read_size configuration setting from its default of 8 MB, might encounter capacity errors for some queries that previously worked.

Resolution: After the upgrade, follow the instructions in Handling Large Rows During Upgrade to CDH 5.13 / Impala 2.10 or Higher to check if your queries are affected by these changes and to modify your configuration settings if so.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-6028

Configuration to prevent crashes caused by thread resource limits

Impala could encounter a serious error due to resource usage under very high concurrency. The error message is similar to:

F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'

Workaround:

In CDH 6.0 and lower versions of CDH, configure each host running an impalad daemon with the following settings:

echo 2000000 > /proc/sys/kernel/threads-max
echo 2000000 > /proc/sys/kernel/pid_max
echo 8000000 > /proc/sys/vm/max_map_count

In CDH 6.1 and higher versions, it is unlikely that you will hit the thread resource limit. Configure each host running an impalad daemon with the following setting:

echo 8000000 > /proc/sys/vm/max_map_count
To make the above settings durable, refer to your OS documentation. For example, on RHEL 6.x:
  1. Add the following line to /etc/sysctl.conf:
    vm.max_map_count=8000000
  2. Run the following command:
    sysctl -p

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-5605

Breakpad minidumps can be very large when the thread count is high

The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.

Workaround: Add --minidump_size_limit_hint_kb=size to set a soft upper limit on the size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump file can still grow larger than the "hinted" size. For example, if you have 10,000 threads, the minidump file can be more than 20 MB.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-3509

Process mem limit does not account for the JVM's memory usage

Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the impalad daemon.

Workaround: To monitor overall memory usage, use the top command, or add the memory figures in the Impala web UI /memz tab to JVM memory usage shown on the /metrics tab.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-691

Impala Known Issues: Correctness

These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.

Incorrect result due to constant evaluation in query with outer join

An OUTER JOIN query could omit some expected result rows due to a constant such as FALSE in another join clause. For example:
explain SELECT 1 FROM alltypestiny a1
  INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
  RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
+---------------------------------------------------------+
| Explain String                                          |
+---------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
|                                                         |
| 00:EMPTYSET                                             |
+---------------------------------------------------------+

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-3094

BST between 1972 and 1995

The calculation of start and end times for the BST (British Summer Time) time zone could be incorrect between 1972 and 1995. Between 1972 and 1995, BST began and ended at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the third) and fourth Sunday in October. For example, both function calls should return 13, but actually return 12, in a query such as:

select
  extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start,
  extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end;

Affected Versions: All CDH 6 versions

Fixed Versions: CDH 6.1

Apache Issue: IMPALA-3082

% escaping does not work correctly in a LIKE clause

If the final character in the RHS argument of a LIKE operator is an escaped \% character, it does not match a % final character of the LHS argument.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-2422

Crash: impala::Coordinator::ValidateCollectionSlots

A query could encounter a serious error if includes multiple nested levels of INNER JOIN clauses involving subqueries.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-2603

Impala Known Issues: Metadata

These issues affect how Impala interacts with metadata. They cover areas such as the metastore database and the Impala Catalog Server daemon.

Concurrent catalog operations with heavy DDL workloads can cause queries with SYNC_DDL to fail fast

When Catalog Server is under a heavy load with concurrent catalog operations of long running DDLs, queries running with the SYNC_DDL query option can fail with the following message:
ERROR: CatalogException: Couldn't retrieve the catalog topic
version for the SYNC_DDL operation after 3 attempts.The operation has
been successfully executed but its effects may have not been
broadcast to all the coordinators.

The catalog operation is actually successful as the change has been committed to HMS and Catalog Server cache, but when Catalog Server notices a longer than expected time for it to broadcast the changes, it fails fast.

The coordinator daemons eventually sync up in the background.

Affected Versions: CDH versions 6.0 and 6.1

Apache Issue: IMPALA-7961 / CDH-76345

Impala Known Issues: Interoperability

These issues affect the ability to interchange data between Impala and other systems. They cover areas such as data types and file formats.

Queries Stuck on Failed HDFS Calls and not Timing out

If the following error appears multiple times in a short duration while running a query, it would mean that the connection between the impalad and the HDFS NameNode is in a bad state and hence the impalad would have to be restarted:

"hdfsOpenFile() for <filename> at backend <hostname:port> failed to finish before the <hdfs_operation_timeout_sec> second timeout " 

Workaround: Restart the impalad in the bad state.

Affected Versions: All versions of Impala

Apache Issue: HADOOP-15720

Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)

Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum allowed value of type (Hive returns NULL).

Workaround: None

Affected Versions: All CDH 6 versions

Configuration needed for Flume to be compatible with Impala

For compatibility with Impala, the value for the Flume HDFS Sink hdfs.writeFormat must be set to Text, rather than its default value of Writable. The hdfs.writeFormat setting must be changed to Text before creating data files with Flume; otherwise, those files cannot be read by either Impala or Hive.

Resolution: This information has been requested to be added to the upstream Flume documentation.

Affected Versions: All CDH 6 versions

Cloudera Issue: CDH-13199

Avro Scanner fails to parse some schemas

The default value in Avro schema must match the first union type. For example, if the default value is null, then the first type in the UNION must be "null".

Workaround: Swap the order of the fields in the schema specification. For example, use ["null", "string"] instead of ["string", "null"]. Note that the files written with the problematic schema must be rewritten with the new schema because Avro files have embedded schemas.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-635

Impala BE cannot parse Avro schema that contains a trailing semi-colon

If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.

Workaround: Remove trailing semicolon from the Avro schema.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-1024

Incorrect results with basic predicate on CHAR typed column

When comparing a CHAR column value to a string literal, the literal value is not blank-padded and so the comparison might fail when it should match.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-1652

Impala Known Issues: Limitations

These issues are current limitations of Impala that require evaluation as you plan how to integrate Impala into your data management workflow.

Set limits on size of expression trees

Very deeply nested expressions within queries can exceed internal Impala limits, leading to excessive memory usage.

Workaround: Avoid queries with extremely large expression trees. Setting the query option disable_codegen=true may reduce the impact, at a cost of longer query runtime.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-4551

Impala does not support running on clusters with federated namespaces

Impala does not support running on clusters with federated namespaces. The impalad process will not start on a node running such a filesystem based on the org.apache.hadoop.fs.viewfs.ViewFs class.

Workaround: Use standard HDFS on all Impala nodes.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-77

Hue and BDR require separate parameters for Impala Load Balancer

Cloudera Manager supports a single parameter for specifying the Impala Daemon Load Balancer. However, because BDR and Hue need to use different ports when connecting to the load balancer, it is not possible to configure the load balancer value so that BDR and Hue will work correctly in the same cluster.

Workaround: To configure BDR with Impala, use the load balancer configuration either without a port specification or with the Beeswax port.

To configure Hue, use the Hue Server Advanced Configuration Snippet (Safety Valve) for impalad_flags to specify the load balancer address with the HiveServer2 port.

Affected Versions: CDH versions from 5.11 to 6.0.1

Cloudera Issue: OPSAPS-46641

Impala Known Issues: Miscellaneous / Older Issues

These issues do not fall into one of the above categories or have not been categorized yet.

A failed CTAS does not drop the table if the insert fails

If a CREATE TABLE AS SELECT operation successfully creates the target table but an error occurs while querying the source table or copying the data, the new table is left behind rather than being dropped.

Workaround: Drop the new table manually after a failed CREATE TABLE AS SELECT.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-2005

Casting scenarios with invalid/inconsistent results

Using a CAST function to convert large literal values to smaller types, or to convert special values such as NaN or Inf, produces values not consistent with other database systems. This could lead to unexpected results from queries.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-1821

Impala Parser issue when using fully qualified table names that start with a number

A fully qualified table name starting with a number could cause a parsing error. In a name such as db.571_market, the decimal point followed by digits is interpreted as a floating-point number.

Workaround: Surround each part of the fully qualified name with backticks (``).

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-941

Impala should tolerate bad locale settings

If the LC_* environment variables specify an unsupported locale, Impala does not start.

Workaround: Add LC_ALL="C" to the environment settings for both the Impala daemon and the Statestore daemon. See Modifying Impala Startup Options for details about modifying these environment settings.

Resolution: Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution.

Affected Versions: All CDH 6 versions

Apache Issue: IMPALA-532

EMC Isilon Known Issues

CDH 6.0 is not currently supported on EMC Isilon.

Affected Versions: CDH 6.0.x

Apache Kafka Known Issues

Topics Created with the "kafka-topics" Tool Might Not Be Secured

Topics that are created and deleted via Kafka are secured (for example, auto created topics). However, most topic creation and deletion is done via the kafka-topics tool, which talks directly to ZooKeeper or some other third-party tool that talks directly to ZooKeeper. Because security is the responsibility of ZooKeeper authorization and authentication, Kafka cannot prevent users from making ZooKeeper changes. Anyone with access to ZooKeeper can create and delete topics. They will not be able to describe, read, or write to the topics even if they can create them.

The following commands talk directly to ZooKeeper and therefore are not secured via Kafka:
  • kafka-topics.sh
  • kafka-configs.sh
  • kafka-preferred-replica-election.sh
  • kafka-reassign-partitions.sh

"offsets.topic.replication.factor" Must Be Less Than or Equal to the Number of Live Brokers

The offsets.topic.replication.factor broker configuration is now enforced upon auto topic creation. Internal auto topic creation will fail with a GROUP_COORDINATOR_NOT_AVAILABLE error until the cluster size meets this replication factor requirement.

Kafka May Be Stuck with Under-replicated Partitions after ZooKeeper Session Expires

This problem can occur when your Kafka cluster includes a large number of under-replicated Kafka partitions. One or more broker logs include messages such as the following:

[2016-01-17 03:36:00,888] INFO Partition [__samza_checkpoint_event-creation_1,3] on broker 3: Shrinking ISR for partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 (kafka.cluster.Partition)
[2016-01-17 03:36:00,891] INFO Partition [__samza_checkpoint_event-creation_1,3] on broker 3: Cached zkVersion [66] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
        

There will also be an indication of the ZooKeeper session expiring in one or more Kafka broker logs around the same time as the previous errors:

INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient)

The log is typically in /var/log/kafka on each host where a Kafka broker is running. The location is set by the property kafka.log4j.dir in Cloudera Manager. The log name is kafka-broker-hostname.log. In diagnostic bundles, the log is under logs/hostname-ip-address/.

Workaround: To move forward after seeing this problem, restart the affected Kafka brokers. You can restart individual brokers from the Instances tab in the Kafka service page in Cloudera Manager.
  • Reduce the potential for long garbage collection pauses by brokers:
    • Use a better garbage collection mechanism in the JVM, such as G1GC. You can do this by adding ‑XX:+UseG1GC in the broker_java_opts.
    • Increase broker heap size if it is too small (broker_max_heap_size). Be careful that you don’t choose a heap size that can cause out-of-memory problems given all the services running on the node.
  • Increase the ZooKeeper session timeout configuration on brokers (zookeeper.session.timeout.ms), to reduce the likelihood that sessions expire.
  • Ensure ZooKeeper itself is well resourced and not overwhelmed so it can respond. For example, it is highly recommended to locate the ZooKeeper log directory on its own disk.

Affected Versions: CDK 1.4.x, 2.0.x, 2.1.x, 2.2.x

Fixed Versions:
  • Full Fix: CDH 6.1.0
  • Partial Fix: CDH 6.0.0, Kafka implementations with CDH 6.0.0 are less likely to encounter this issue.

Apache Issue: KAFKA-2729

Cloudera Issue: CDH-42514

Requests Fail When Sending to a Nonexistent Topic with "auto.create.topics.enable" Set to True

The first few produce requests fail when sending to a nonexistent topic with auto.create.topics.enable set to true.

Workaround: Increase the number of retries in the Producer configuration setting retries.

Custom Kerberos Principal Names Cannot Be Used for Kerberized ZooKeeper and Kafka instances

When using ZooKeeper authentication and a custom Kerberos principal, Kerberos-enabled Kafka does not start.

Workaround: None. You must disable ZooKeeper authentication for Kafka or use the default Kerberos principals for ZooKeeper and Kafka.

Performance Degradation When SSL Is Enabled

Significant performance degradation can occur when SSL is enabled. The impact varies depending on your CPU, JVM version, and message size. Consumers are typically more affected than producers.

Workaround: Configure brokers and clients with ssl.secure.random.implementation = SHA1PRNG. It often reduces this degradation drastically, but its effect is CPU and JVM dependent.

Affected Versions: CDK 2.x and later

Fixed Versions: None

Apache Issue: KAFKA-2561

Cloudera Issue: None

Apache Kudu Known Issues

The following are known bugs and issues in Kudu. Note that this list is not exhaustive, and is meant to communicate only the most important known issues.

CFile Checksum Failure Causes Queries to Fail

When a CFile checksum fails, for example, due to a underlying disk corruption, queries against the replica will fail with an error message, such as this:
Unable to advance iterator: Corruption: checksum error on CFile block

Workaround: Remove the corrupted replica from the tablet's Raft configuration. See Kudu Troubleshooting Guide for the detailed steps.

Affected Versions: CDH 6.0.x and lower

Apache Issue: KUDU-2469

C++ Client Fails to Re-acquire Authentication Token in Multi-master Clusters

A security-related issue can cause Impala queries to start failing on busy clusters in the following scenario:

  • The cluster runs with the --rpc_authentication set as optional or required. The default is optional. Secure clusters use required.
  • The cluster is using multiple masters.
  • Impala queries happen frequently enough that the leader master connection to some impalad isn't idle-closed (more than 1 query per 65 seconds).
  • The connection stays alive for longer than the authentication token timeout (1 week by default).
  • A master leadership change occurs after the authentication token expiration.
Impala queries will start failing with errors in the impalad logs like:
I0904 13:53:08.748968 95857 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to 10.164.44.13:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired
I0904 13:53:10.389009 95861 status.cc:125] Unable to open Kudu table: Timed out: GetTableSchema timed out after deadline expired
 @ 0x95b1e9 impala::Status::Status()
 @ 0xff22d4 impala::KuduScanNodeBase::Open()
 @ 0xff101e impala::KuduScanNode::Open()
 @ 0xb73ced impala::FragmentInstanceState::Open()
 @ 0xb7532b impala::FragmentInstanceState::Exec()
 @ 0xb64ae8 impala::QueryState::ExecFInstance()
 @ 0xd15193 impala::Thread::SuperviseThread()
 @ 0xd158d4 boost::detail::thread_data<>::run()
 @ 0x129188a (unknown)
 @ 0x7f717ceade25 start_thread
 @ 0x7f717cbdb34d __clone
Impala shell queries will fail with a message like:
Unable to open Kudu table: Timed out: GetTableSchema timed out after deadline expired
Workaround:
  • Restart the affected Impala Daemons. Restarting a daemon ensures the problem will not reoccur for at least the authentication token lifetime, which defaults to one week.
  • Increase the authentication token lifetime (--authn_token_validity_seconds). Beware that raising this lifetime increases the window of vulnerability of the cluster if a client is compromised. It is recommended that you keep the token lifetime at one month maximum for a secure cluster. For unsecured clusters, a longer token lifetime is acceptable, and a 3 month lifetime is recommended.

Affected Versions: From CDH 5.11 through CDH 6.0.1

Apache Issue: KUDU-2580

Timeout Possible with Log Force Synchronization Option

If the Kudu master is configured with the -log_force_fsync_all option, tablet servers and clients will experience frequent timeouts, and the cluster may become unusable.

Affected Versions: All CDH 6 versions

Longer Startup Times with a Large Number of Tablets

If a tablet server has a very large number of tablets, it may take several minutes to start up. It is recommended to limit the number of tablets per server to 1000 or fewer. The maximum allowed number of tablets is 2000 per server. Consider this limitation when pre-splitting your tables. If you notice slow start-up times, you can monitor the number of tablets per server in the web UI.

Affected Versions: All CDH 6 versions

Fault Tolerant Scan Memory Issue

Unlike regular scans, fault tolerant scans will allocate all required memory when the scan begins rather than as it progresses. This can be significant for big tablets. Moreover, this memory usage isn't counted towards the tablet server's overall memory limit, raising the likelihood of the tablet server being out-of-memory killed by the kernel.

Affected Versions: CDH 6.2 / Kudu 1.9 and lower

Apache Issue: KUDU-2466

Descriptions for Kudu TLS/SSL Settings in Cloudera Manager

Use the descriptions in the following table to better understand the TLS/SSL settings in the Cloudera Manager Admin Console.

Field Usage Notes
Kerberos Principal Set to the default principal, kudu.
Enable Secure Authentication And Encryption Select this checkbox to enable authentication and RPC encryption between all Kudu clients and servers, as well as between individual servers. Only enable this property after you have configured Kerberos.
Master TLS/SSL Server Private Key File (PEM Format) Set to the path containing the Kudu master host's private key (PEM-format). This is used to enable TLS/SSL encryption (over HTTPS) for browser-based connections to the Kudu master web UI.
Tablet Server TLS/SSL Server Private Key File (PEM Format) Set to the path containing the Kudu tablet server host's private key (PEM-format). This is used to enable TLS/SSL encryption (over HTTPS) for browser-based connections to Kudu tablet server web UIs.
Master TLS/SSL Server Certificate File (PEM Format) Set to the path containing the signed certificate (PEM-format) for the Kudu master host's private key (set in Master TLS/SSL Server Private Key File). The certificate file can be created by concatenating all the appropriate root and intermediate certificates required to verify trust.
Tablet Server TLS/SSL Server Certificate File (PEM Format) Set to the path containing the signed certificate (PEM-format) for the Kudu tablet server host's private key (set in Tablet Server TLS/SSL Server Private Key File). The certificate file can be created by concatenating all the appropriate root and intermediate certificates required to verify trust.
Master TLS/SSL Server CA Certificate (PEM Format) Disregard this field.
Tablet Server TLS/SSL Server CA Certificate (PEM Format) Disregard this field.
Enable TLS/SSL for Master Server Enables HTTPS encryption on the Kudu master web UI.
Enable TLS/SSL for Tablet Server Enables HTTPS encryption on the Kudu tablet server web UIs.

Affected Versions: All CDH 6 versions

Apache Oozie Known Issues

External ID of MapReduce action not filled properly and failing MR job treated as SUCCEEDED

When a MapReduce action is launched from Oozie, the external ID field is not filled properly. It gets populated with the YARN ID of the LauncherAM, not with the ID of the actual MR job. If the MR job is submitted successfully and then fails, it will be treated as a successfully executed action.

Affected Versions: CDH 6.0.0 and higher

Fixed Version: CDH 6.1.0 and higher

Apache Issue: OOZIE-3298

Oozie jobs fail (gracefully) on secure YARN clusters when JobHistory server is down

If the JobHistory server is down on a YARN (MRv2) cluster, Oozie attempts to submit a job, by default, three times. If the job fails, Oozie automatically puts the workflow in a SUSPEND state.

Workaround: When the JobHistory server is running again, use the resume command to tell Oozie to continue the workflow from the point at which it left off.

Affected Versions: CDH 5 and higher

Cloudera Issue: CDH-14623

Oozie works with MapReduce or YARN, but not both

The Oozie server works with a MapReduce (MRv1) cluster or a YARN (MRv2) cluster, but not both at the same time.

Workaround: Use two different Oozie servers.

Affected Versions: All

Apache Parquet Known Issues

There are no known issues in Parquet.

Apache Pig Known Issues

There are no known issues in this release.

Cloudera Search Known Issues

The current release includes the following known limitations:

Solr SQL, Graph, and Stream Handlers are Disabled if Collection Uses Document-Level Security

The Solr SQL, Graph, and Stream handlers do not support document-level security, and are disabled if document-level security is enabled on the collection. If necessary, these handlers can be re-enabled by setting the following Java system properties, but document-level security is not enforced for these handlers:

  • SQL: solr.sentry.enableSqlQuery=true
  • Graph: solr.sentry.enableGraphQuery=true
  • Stream: solr.sentry.enableStreams=true

Workaround: None

Affected Versions: All CDH 6 releases

Cloudera Issue: CDH-66345

Collection Creation No Longer Supports Automatically Selecting A Configuration If Only One Exists

Before CDH 5.5.0, a collection could be created without specifying a configuration. If no -c value was specified, then:

  • If there was only one configuration, that configuration was chosen.
  • If the collection name matched a configuration name, that configuration was chosen.

Search for CDH 5.5.0 includes multiple built-in configurations. As a result, there is no longer a case in which only one configuration can be chosen by default.

Workaround: Explicitly specify the collection configuration to use by passing -c <configName> to solrctl collection --create.

Affected Versions: CDH 5.5.0 and higher

Cloudera Issue: CDH-34050

CrunchIndexerTool which includes Spark indexer requires specific input file format specifications

If the --input-file-format option is specified with CrunchIndexerTool, then its argument must be text, avro, or avroParquet, rather than a fully qualified class name.

Workaround: None

Affected Versions: All

Cloudera Issue: CDH-22190

The quickstart.sh file does not validate ZooKeeper and the NameNode on some operating systems

The quickstart.sh file uses the timeout function to determine if ZooKeeper and the NameNode are available. To ensure this check can be complete as intended, the quickstart.sh determines if the operating system on which the script is running supports timeout. If the script detects that the operating system does not support timeout, the script continues without checking if the NameNode and ZooKeeper are available. If your environment is configured properly or you are using an operating system that supports timeout, this issue does not apply.

Workaround: This issue only occurs in some operating systems. If timeout is not available, the quickstart continues and final validation is always done by the MapReduce jobs and Solr commands that are run by the quickstart.

Affected Versions: All

Cloudera Issue: CDH-19923

Field value class guessing and Automatic schema field addition are not supported with the MapReduceIndexerTool nor the HBaseMapReduceIndexerTool

The MapReduceIndexerTool and the HBaseMapReduceIndexerTool can be used with a Managed Schema created via NRT indexing of documents or via the Solr Schema API. However, neither tool supports adding fields automatically to the schema during ingest.

Workaround: Define the schema before running the MapReduceIndexerTool or HBaseMapReduceIndexerTool. In non-schemaless mode, define in the schema using the schema.xml file. In schemaless mode, either define the schema using the Solr Schema API or index sample documents using NRT indexing before invoking the tools. In either case, Cloudera recommends that you verify that the schema is what you expect using the List Fields API command.

Affected Versions: All

Cloudera Issue: CDH-26856

The Browse and Spell Request Handlers are not enabled in schemaless mode

The Browse and Spell Request Handlers require certain fields be present in the schema. Since those fields cannot be guaranteed to exist in a Schemaless setup, the Browse and Spell Request Handlers are not enabled by default.

Workaround: If you require the “Browse” and “Spell” Request Handlers, add them to the solrconfig.xml configuration file. Generate a non-schemaless configuration to see the usual settings and modify the required fields to fit your schema.

Affected Versions: All

Cloudera Issue: CDH-19407

Enabling blockcache writing may result in unusable indexes

It is possible to create indexes with solr.hdfs.blockcache.write.enabled set to true. Such indexes may appear corrupt to readers, and reading these indexes may irrecoverably corrupt indexes. Blockcache writing is disabled by default.

Workaround: None

Affected Versions: All

Cloudera Issue: CDH-17978

Users with insufficient Solr permissions may receive a "Page Loading" message from the Solr Web Admin UI

Users who are not authorized to use the Solr Admin UI are not given page explaining that access is denied, and instead receive a web page that never finishes loading.

Workaround: None

Affected Versions: All

Cloudera Issue: CDH-58276

Using MapReduceIndexerTool or HBaseMapReduceIndexerTool multiple times may produce duplicate entries in a collection.

Repeatedly running the MapReduceIndexerTool on the same set of input files can result in duplicate entries in the Solr collection. This occurs because the tool can only insert documents and cannot update or delete existing Solr documents. This issue does not apply to the HBaseMapReduceIndexerTool unless it is run with more than zero reducers.

Workaround: To avoid this issue, use HBaseMapReduceIndexerTool with zero reducers. This must be done without Kerberos.

Affected Versions: All

Cloudera Issue: CDH-15441

Deleting collections might fail if hosts are unavailable

It is possible to delete a collection when hosts that host some of the collection are unavailable. After such a deletion, if the previously unavailable hosts are brought back online, the deleted collection may be restored.

Workaround: Ensure all hosts are online before deleting collections.

Affected Versions: All

Cloudera Issue: CDH-58694

Saving search results is not supported

Cloudera Search does not support the ability to save search results.

Workaround: None

Affected Versions: All

Cloudera Issue: CDH-21162

HDFS Federation is not supported

Cloudera Search does not support HDFS Federation.

Workaround: None

Affected Versions: All

Cloudera Issue: CDH-11357

Apache Sentry Known Issues

Sentry does not support Kafka topic name with more than 64 characters

A Kafka topic name can have 249 characters, but Sentry only supports topic names up to 64 characters.

Workaround: Keep Kafka topic names to 64 charcters or less.

Affected Versions: All CDH 5.x and 6.x versions

Cloudera Issue: CDH-64317

When granting privileges, a single transaction per grant causes long delays

Sentry takes a long time to grant or revoke a large number of column-level privileges that are requested in a single statement. For example if you execute the following command:

GRANT SELECT(col1, col2, …) ON TABLE table1;

Sentry applies the grants to each column separately and the refresh process causes long delays.

Workaround: Split the grant statement up into smaller chunks. This prevents the refresh process from causing delays.

Affected Versions:
  • CDH: 5.14.4
  • CDH: 5.15.1
  • CDH: 5.16.0
  • CDH: 6.1.0
Fixed Versions:
  • CDH 5.16.1 and above
  • CDH 6.2.0 and above

Cloudera Issue: CDH-74982

SHOW ROLE GRANT GROUP raises exception for a group that was never granted a role

If you run the command SHOW ROLE GRANT GROUP for a group that has never been granted a role, beeline raises an exception. However, if you run the same command for a group that does not have any roles, but has at one time been granted a role, you do not get an exception, but instead get an empty list of roles granted to the group.

Workaround: Adding a role will prevent the exception.

Affected Versions:

  • CDH 5.16.0
  • CDH 6.0.0

Cloudera Issue: CDH-71694

GRANT/REVOKE operations could fail if there are too many concurrent requests

Under a significant workload, Grant/Revoke operations can have issues.

Workaround: If you need to make many privilege changes, plan them at a time when you do not need to do too many at once.

Affected Versions: CDH 5.13.0 and above

Apache Issue: SENTRY-1855

Cloudera Issue: CDH-56553

Creating large set of Sentry roles results in performance problems

Using more than a thousand roles/permissions might cause significant performance problems.

Workaround: Plan your roles so that groups have as few roles as possible and roles have as few permissions as possible.

Affected Versions: CDH 5.13.0 and above

Cloudera Issue: CDH-59010

Users can't track jobs with Hive and Sentry

As a prerequisite of enabling Sentry, Hive impersonation is turned off, which means all YARN jobs are submitted to the Hive job queue, and are run as the hive user. This is an issue because the YARN History Server now has to block users from accessing logs for their own jobs, since their own usernames are not associated with the jobs. As a result, end users cannot access any job logs unless they can get sudo access to the cluster as the hdfs, hive or other admin users.

In CDH 5.8 (and higher), Hive overrides the default configuration, mapred.job.queuename, and places incoming jobs into the connected user's job queue, even though the submitting user remains hive. Hive obtains the relevant queue/username information for each job by using YARN's fair-scheduler.xml file.

Affected Versions: CDH 5.2.0 and above

Cloudera Issue: CDH-22890

Column-level privileges are not supported on Hive Metastore views

GRANT and REVOKE for column level privileges is not supported on Hive Metastore views.

Affected Versions: All CDH versions

Apache Issue: SENTRY-754

SELECT privilege on all columns does not equate to SELECT privilege on table

Users who have been explicitly granted the SELECT privilege on all columns of a table, will not have the permission to perform table-level operations. For example, operations such as SELECT COUNT (1) or SELECT COUNT (*) will not work even if you have the SELECT privilege on all columns.

There is one exception to this. The SELECT * FROM TABLE command will work even if you do not have explicit table-level access.

Affected Versions: All CDH versions

Apache Issue: SENTRY-838

The EXPLAIN SELECT operation works without table or column-level privileges

Users are able to run the EXPLAIN SELECT operation, exposing metadata for all columns, even for tables/columns to which they weren't explicitly granted access.

Affected Versions: All CDH versions

Apache Issue: SENTRY-849

Object types Server and URI are not supported in SHOW GRANT ROLE roleName on OBJECT objectName

Workaround:Use SHOW GRANT ROLE roleNameto list all privileges granted to the role.

Affected Versions: All CDH versions

Apache Issue: N/A

Cloudera Issue: CDH-19430

Relative URI paths not supported by Sentry

Sentry supports only absolute (not relative) URI paths in permission grants. Although some early releases (for example, CDH 5.7.0) might not have raised explicit errors when relative paths were set, upgrading a system that uses relative paths causes the system to lose Sentry permissions.

Resolution: Revoke privileges that have been set using relative paths, and grant permissions using absolute paths before upgrading.

Affected Versions: All versions. Relative paths are not supported in Sentry for permission grants.

Absolute (Use this form) Relative (Do not use this form)
hdfs://absolute/path/ hdfs://relative/path
s3a://bucketname/ s3a://bucketname

Apache Spark Known Issues

PySpark broadcast variables fail when disk encryption is enabled

When disk encryption is enabled, PySpark broadcast variables fail with the following stack trace:

Traceback (most recent call last): File "broadcast.py", line 37, in <module>
words_new.value File "/pyspark.zip/pyspark/broadcast.py", line 137, in value
File "pyspark.zip/pyspark/broadcast.py", line 122, in load_from_path File
"pyspark.zip/pyspark/broadcast.py", line 128, in load EOFError: Ran out of input
      

Workaround: None

Affected Versions: CDH 6.0.1

Apache Issue: SPARK-26201

Cloudera Issue: CDH-76116

Spark Streaming jobs loop if missing Kafka topic

Spark jobs can loop endlessly if the Kafka topic is deleted while a Kafka streaming job (which uses KafkaSource) is in progress.

Workaround: Stop a job before deleting a Kafka topic.

Affected Versions: All

Cloudera Issue: CDH-57903, CDH-64513

Spark SQL does not respect size limit for the varchar type

Spark SQL treats varchar as a string (that is, there no size limit). The observed behavior is that Spark reads and writes these columns as regular strings; if inserted values exceed the size limit, no error will occur. The data will be truncated when read from Hive, but not when read from Spark.

Workaround: None

Affected Versions: CDH 5.5.0 and higher

Apache Issue: SPARK-5918

Cloudera Issue: CDH-33642

Spark SQL does not prevent you from writing key types not supported by Avro tables

Spark allows you to declare DataFrames with any key type. Avro supports only string keys and trying to write any other key type to an Avro table will fail.

Workaround: None

Affected Versions: CDH 5.5.0 and higher

Cloudera Issue: CDH-33648

Spark SQL does not support timestamp in Avro tables

Workaround: None

Affected Versions: CDH 5.5.0 and higher

Cloudera Issue: CDH-33649

Spark SQL does not respect Sentry ACLs when communicating with Hive metastore

Even if user is configured via Sentry to not have read permission to a Hive table, a Spark SQL job running as that user can still read the table's metadata directly from the Hive metastore. Cloudera Issue: CDH-33658

Dynamic allocation and Spark Streaming

If you are using Spark Streaming, Cloudera recommends that you disable dynamic allocation by setting spark.dynamicAllocation.enabled to false when running streaming applications.

Limitation with Region Pruning for HBase Tables

When SparkSQL accesses an HBase table through the HiveContext, region pruning is not performed. This limitation can result in slower performance for some SparkSQL queries against tables that use the HBase SerDes than when the same table is accessed through Impala or Hive.

Workaround: None

Affected Versions: All

Cloudera Issue: CDH-56330

Running spark-submit with --principal and --keytab arguments does not work in client mode

The spark-submit script's --principal and --keytab arguments do not work with Spark-on-YARN's client mode.

Workaround: Use cluster mode instead.

Affected Versions: All

Long-running apps on a secure cluster might fail if driver is restarted

If you submit a long-running app on a secure cluster using the --principal and --keytab options in cluster mode, and a failure causes the driver to restart after 7 days (the default maximum HDFS delegation token lifetime), the new driver fails with an error similar to the following:

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token <token_info> can't be found in cache

Workaround: None

Affected Versions: CDH 6.0

Apache Issue: SPARK-23361

Cloudera Issue: CDH-64865

History link in ResourceManager web UI broken for killed Spark applications

When a Spark application is killed, the history link in the ResourceManager web UI does not work.

Workaround: To view the history for a killed Spark application, see the Spark HistoryServer web UI instead.

Affected Versions: All CDH versions

Apache Issue: None

Cloudera Issue: CDH-49165

Apache Sqoop Known Issues

Column names cannot start with a number when importing data with the --as-parquetfile option.

Currently, Sqoop is using an Avro schema when writing data as a parquet file. The Avro schema requires that column names do not start with numbers, therefore Sqoop is renaming the columns in this case, prepending them with an underscore character. This can lead to issues when one wants to reuse the data in other tools, such as Impala.

Workaround: Rename the columns to comply with Avro limitations (start with letters or underscore, as specified in the Avro documentation).

Cloudera Issue: None

MySQL JDBC driver shipped with CentOS 6 systems does not work with Sqoop

CentOS 6 systems currently ship with version 5.1.17 of the MySQL JDBC driver. This version does not work correctly with Sqoop.

Workaround: Install version 5.1.31 of the JDBC driver as detailed in Installing the JDBC Drivers for Sqoop 1.

Affected Versions: MySQL JDBC 5.1.17, 5.1.4, 5.3.0

Cloudera Issue: CDH-23180

MS SQL Server "integratedSecurity" option unavailable in Sqoop

The integratedSecurity option is not available in the Sqoop CLI.

Workaround: None

Cloudera Issue: None

Sqoop1 (doc import + --as-parquetfile) limitation with KMS/KTS Encryption at Rest

Due to a limitation with Kite SDK, it is not possible to use (sqoop import --as-parquetfile) with KMS/KTS Encryption zones. See the following example.
sqoop import --connect jdbc:db2://djaxludb1001:61035/DDBAT003 --username=dh810202 --P --target-dir /data/hive_scratch/ASDISBURSEMENT --delete-target-dir -m1 --query "select disbursementnumber,disbursementdate,xmldata FROM DB2dba.ASDISBURSEMENT where DISBURSEMENTNUMBER = 2011113210000115311 AND \$CONDITIONS" -hive-import --hive-database adminserver -hive-table asdisbursement_dave --map-column-java XMLDATA=String --as-parquetfile

16/12/05 12:23:46 INFO mapreduce.Job: map 100% reduce 0%
16/12/05 12:23:46 INFO mapreduce.Job: Job job_1480530522947_0096 failed with state FAILED due to: Job commit failed: org.kitesdk.data.DatasetIOException: Could not move contents of hdfs://AJAX01-ns/tmp/adminserver/.temp/job_1480530522947_0096/mr/job_1480530522947_0096 to hdfs://AJAX01-ns/data/RetiredApps/INS/AdminServer/asdisbursement_dave
<SNIP>
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): /tmp/adminserver/.temp/job_1480530522947_0096/mr/job_1480530522947_0096/5ddcac42-5d69-4e46-88c2-17bbedac4858.parquet can't be moved into an encryption zone.

Workaround: If you use the Parquet Hadoop API based implementation for importing into Parquet, specify a --target-dir which is the same encryption zone as the Hive warehouse directory.

If you use the Kite Dataset API based implementation, use an alternate data file type, for example text or avro.

Apache Issue: SQOOP-2943

Cloudera Issue: CDH-40826

Doc import as Parquet files may result in out-of-memory errors

Out-of-memory (OOM) errors can be caused in the following two cases:
  • With many very large rows (multiple megabytes per row) before initial-page-run check (ColumnWriter)
  • When rows vary significantly by size so that the next-page-size check is based on small rows and is set very high followed by many large rows

Workaround: None, other than restructuring the data.

Apache Issue: PARQUET-99

Apache ZooKeeper Known Issues

ZooKeeper JMX did not support TLS when managed by Cloudera Manager

Technical Service Bulletin 2019-310 (TSB)

The ZooKeeper service optionally exposes a JMX port used for reporting and metrics. By default, Cloudera Manager enables this port, but prior to Cloudera Manager 6.1.0, it did not support mutual TLS authentication on this connection. While JMX has a password-based authentication mechanism that Cloudera Manager enables by default, weaknesses have been found in the authentication mechanism, and Oracle now advises JMX connections to enable mutual TLS authentication in addition to password-based authentication. A successful attack may leak data, cause denial of service, or even allow arbitrary code execution on the Java process that exposes a JMX port. Beginning in Cloudera Manager 6.1.0, it is possible to configure mutual TLS authentication on ZooKeeper’s JMX port.

Products affected: ZooKeeper

Releases affected: Cloudera Manager 6.1.0 and lower, Cloudera Manager 5.16 and lower

Users affected: All

Date/time of detection: June 7, 2018

Severity (Low/Medium/High): 9.8 High (CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)

Impact: Remote code execution

CVE: CVE-2018-11744

Immediate action required: Upgrade to Cloudera Manager 6.1.0 and enable TLS for the ZooKeeper JMX port by turning on the configuration settings “Enable TLS/SSL for ZooKeeper JMX” and “Enable TLS client authentication for JMX port” on the ZooKeeper service and configuring the appropriate TLS settings. Alternatively, disable the ZooKeeper JMX port via the configuration setting “Enable JMX Agent” on the ZooKeeper service.

Addressed in release/refresh/patch: Cloudera Manager 6.1.0