This is the documentation for Cloudera 5.5.x. Documentation for other versions is available at Cloudera Documentation.

Impala Known Issues

The following sections describe known issues and workarounds in Impala, as of the current production release (Impala 2.3.x / CDH 5.5.x). This page summarizes the most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and whether a fix is in the pipeline.

  Note: The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue you are experiencing has already been reported, or which release an issue is fixed in, search on the issues.cloudera.org JIRA tracker.

Continue reading:

For issues fixed in various Impala releases, see Impala Fixed Issues.

Impala Known Issues: Crashes and Hangs

These issues can cause Impala to quit or become unresponsive.

Queries may hang on server-to-server exchange errors

The DataStreamSender::Channel::CloseInternal() does not close the channel on an error. This causes the node on the other side of the channel to wait indefinitely, causing a hang.

Bug: IMPALA-2592

Severity: Low. This issue does not occur frequently.

Workaround: None.

Impalad is crashing if udf jar is not available in hdfs location for first time

If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala CREATE FUNCTION statement is issued, the impalad daemon crashes.

Bug: IMPALA-2365

Severity: High

Impala Known Issues: Performance

These issues involve the performance of operations such as queries or DDL statements.

Slow DDL statements for tables with large number of partitions

DDL statements for tables with a large number of partitions might be slow.

Bug: https://issues.cloudera.org/browse/IMPALA-1480IMPALA-1480

Severity: High

Workaround: Run the DDL statement in Hive if the slowness is an issue.

Impala Known Issues: Usability

These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue.

Less than 100% progress on completed simple SELECT queries

Simple SELECT queries show less than 100% progress even though they are already completed.

Bug: IMPALA-1776

Severity: Low

Impala Known Issues: JDBC and ODBC Drivers

These issues affect applications that use the JDBC or ODBC APIs, such as business intelligence tools or custom-written applications in languages such as Java or C++.

ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)

If the ODBC SQLGetData is called on a series of columns, the function calls must follow the same order as the columns. For example, if data is fetched from column 2 then column 1, the SQLGetData call for column 1 returns NULL.

Bug: IMPALA-1792

Severity: High

Workaround: Fetch columns in the same order they are defined in the table.

Impala Known Issues: Security

These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and redaction.

Kerberos tickets must be renewable

In a Kerberos environment, the impalad daemon might not start if Kerberos tickets are not renewable.

Workaround: Configure your KDC to allow tickets to be renewed, and configure krb5.conf to request renewable tickets.

Server-to-server SSL and Kerberos do not work together

If SSL is enabled between internal Impala components (with ssl_client_ca_certificate), and Kerberos authentication is used between servers, the cluster fails to start.

Bug: IMPALA-2598

Severity: Medium; the ssl_client_ca_certificate setting is a new feature, so the issue does not affect existing cluster configurations

Workaround: Do not use the new ssl_client_ca_certificate setting on Kerberos-enabled clusters until this issue is resolved.

Impala Known Issues: Resources

These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management features.

Process mem limit does not account for the JVM's memory usage

Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the impalad daemon.

Bug: IMPALA-691

Severity: High

Workaround: To monitor overall memory usage, use the top command, or add the memory figures in the Impala web UI /memz tab to JVM memory usage shown on the /metrics tab.

Fix issues with the legacy join and agg nodes using --enable_partitioned_hash_join=false and --enable_partitioned_aggregation=false

Bug: IMPALA-2375

Severity: High

Workaround: Transition away from the "old-style" join and aggregation mechanism if practical.

Impala Known Issues: Correctness

These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.

parse_url() returns incorrect result if @ character in URL

If a URL contains an @ character, the parse_url() function could return an incorrect value for the hostname field.

Bug: https://issues.cloudera.org/browse/IMPALA-1170IMPALA-1170

Severity: High

% escaping does not work correctly when occurs at the end in a LIKE clause

If the final character in the RHS argument of a LIKE operator is an escaped \% character, it does not match a % final character of the LHS argument.

Bug: IMPALA-2422

Severity: High

ORDER BY rand() does not work.

Because the value for rand() is computed early in a query, using an ORDER BY expression involving a call to rand() does not actually randomize the results.

Bug: IMPALA-397

Severity: High

Duplicated column in inline view causes dropping null slots during scan

If the same column is queried twice within a view, NULL values for that column are omitted. For example, the result of COUNT(*) on the view could be less than expected.

Bug: IMPALA-2643

Severity: High

Workaround: Avoid selecting the same column twice within an inline view.

Incorrect assignment of predicates through an outer join in an inline view.

A query involving an OUTER JOIN clause where one of the table references is an inline view might apply predicates from the ON clause incorrectly.

Bug: IMPALA-1459

Severity: High

Crash: impala::Coordinator::ValidateCollectionSlots

A query could encounter a serious error if includes multiple nested levels of INNER JOIN clauses involving subqueries.

Bug: IMPALA-2603

Severity: High

Incorrect assignment of On-clause predicate inside inline view with an outer join.

A query might return incorrect results due to wrong predicate assignment in the following scenario:

  1. There is an inline view that contains an outer join
  2. That inline view is joined with another table in the enclosing query block
  3. That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view

Bug: IMPALA-2665

Severity: High

Wrong assignment of having clause predicate across outer join

In an OUTER JOIN query with a HAVING clause, the comparison from the HAVING clause might be applied at the wrong stage of query processing, leading to incorrect results.

Bug: IMPALA-2144

Severity: High

Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate

A NOT IN operator with a subquery that calls an aggregate function, such as NOT IN (SELECT SUM(...)), could return incorrect results.

Bug: IMPALA-2093

Severity: High

Impala Known Issues: Metadata

These issues affect how Impala interacts with metadata. They cover areas such as the metastore database, the COMPUTE STATS statement, and the Impala catalogd daemon.

Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats

Incremental stats use up about 400 bytes per partition for each column. For example, for a table with 20K partitions and 100 columns, the memory overhead from incremental statistics is about 800 MB. When serialized for transmission across the network, this metadata exceeds the 2 GB Java array size limit and leads to a catalogd crash.

Bugs: IMPALA-2647, IMPALA-2648, IMPALA-2649

Severity: Low. This does not occur frequently.

Workaround: If feasible, compute full stats periodically and avoid computing incremental stats for that table. The scalability of incremental stats computation is a continuing work item.

Can't update stats manually via alter table after upgrading to CDH 5.2

Bug: IMPALA-1420

Severity: High

Workaround: On CDH 5.2, when adjusting table statistics manually by setting the numRows, you must also enable the Boolean property STATS_GENERATED_VIA_STATS_TASK. For example, use a statement like the following to set both properties with a single ALTER TABLE statement:

ALTER TABLE table_name SET TBLPROPERTIES('numRows'='new_value', 'STATS_GENERATED_VIA_STATS_TASK' = 'true');

Resolution: The underlying cause is the issue HIVE-8648 that affects the metastore in Hive 0.13. The workaround is only needed until the fix for this issue is incorporated into a CDH release.

Impala Known Issues: Interoperability

These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types and file formats.

Deviation from Hive behavior: Impala does not do implicit casts between string and numeric and boolean types.

Severity: Low

Anticipated Resolution: None

Workaround: Use explicit casts.

Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)

Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum allowed value of type (Hive returns NULL).

Severity: Low

Workaround: None

Configuration needed for Flume to be compatible with Impala

For compatibility with Impala, the value for the Flume HDFS Sink hdfs.writeFormat must be set to Text, rather than its default value of Writable. The hdfs.writeFormat setting must be changed to Text before creating data files with Flume; otherwise, those files cannot be read by either Impala or Hive.

Severity: High

Resolution: This information has been requested to be added to the upstream Flume documentation.

Avro Scanner fails to parse some schemas

Querying certain Avro tables could cause a crash or return no rows, even though Impala could DESCRIBE the table.

Bug: IMPALA-635

Severity: High

Workaround: Swap the order of the fields in the schema specification. For example, ["null", "string"] instead of ["string", "null"].

Resolution: Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the crashing issue is resolved.

Impala BE cannot parse Avro schema that contains a trailing semi-colon

If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.

Bug: IMPALA-1024

Severity: High

Severity: Remove trailing semicolon from the Avro schema.

Fix decompressor to allow parsing gzips with multiple streams

Currently, Impala can only read gzipped files containing a single stream. If a gzipped file contains multiple concatenated streams, the Impala query only processes the data from the first stream.

Bug: IMPALA-2154

Severity: High

Workaround: Use a different gzip tool to compress file to a single stream file.

Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block

If a carriage return / newline pair of characters in a text table is split between HDFS data blocks, Impala incorrectly processes the row following the \n\r pair twice.

Bug: IMPALA-1578

Severity: High

Workaround: Use the Parquet format for large volumes of data where practical.

Invalid bool value not reported as a scanner error

In some cases, an invalid BOOLEAN value read from a table does not produce a warning message about the bad value. The result is still NULL as expected. Therefore, this is not a query correctness issue, but it could lead to overlooking the presence of invalid data.

Bug: IMPALA-1862

Severity: High

Incorrect results with basic predicate on CHAR typed column.

When comparing a CHAR column value to a string literal, the literal value is not blank-padded and so the comparison might fail when it should match.

Bug: IMPALA-1652

Severity: High

Workaround: Use the RPAD() function to blank-pad literals compared with CHAR columns to the expected length.

Impala Known Issues: Limitations

These issues are current limitations of Impala that require evaluation as you plan how to integrate Impala into your data management workflow.

Impala does not support running on clusters with federated namespaces

Impala does not support running on clusters with federated namespaces. The impalad process will not start on a node running such a filesystem based on the org.apache.hadoop.fs.viewfs.ViewFs class.

Bug: IMPALA-77

Anticipated Resolution: Limitation

Workaround: Use standard HDFS on all Impala nodes.

Impala Known Issues: Miscellaneous / Older Issues

These issues do not fall into one of the above categories or have not been categorized yet.

A failed CTAS does not drop the table if the insert fails.

If a CREATE TABLE AS SELECT operation successfully creates the target table but an error occurs while querying the source table or copying the data, the new table is left behind rather than being dropped.

Bug: IMPALA-2005

Severity: High

Workaround: Drop the new table manually after a failed CREATE TABLE AS SELECT.

Casting scenarios with invalid/inconsistent results

Using a CAST() function to convert large literal values to smaller types, or to convert special values such as NaN or Inf, produces values not consistent with other database systems. This could lead to unexpected results from queries.

Bug: IMPALA-1821

Severity: High

Support individual memory allocations larger than 1 GB

The largest single block of memory that Impala can allocate during a query is 1 GiB. Therefore, a query could fail or Impala could crash if a compressed text file resulted in more than 1 GiB of data in uncompressed form, or if a string function such as group_concat() returned a value greater than 1 GiB.

Bug: IMPALA-1619

Severity: High

Impala Parser issue when using fully qualified table names that start with a number.

A fully qualified table name starting with a number could cause a parsing error. In a name such as db.571_market, the decimal point followed by digits is interpreted as a floating-point number.

Bug: IMPALA-941

Severity: Low

Workaround: Surround each part of the fully qualified name with backticks (``).

Impala should tolerate bad locale settings

If the LC_* environment variables specify an unsupported locale, Impala does not start.

Bug: IMPALA-532

Severity: Low

Workaround: Add LC_ALL="C" to the environment settings for both the Impala daemon and the Statestore daemon. See Modifying Impala Startup Options for details about modifying these environment settings.

Resolution: Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution.

Log Level 3 Not Recommended for Impala

The extensive logging produced by log level 3 can cause serious performance overhead and capacity issues.

Severity: Low

Workaround: Reduce the log level to its default value of 1, that is, GLOG_v=1. See Setting Logging Levels for details about the effects of setting different logging levels.

Page generated January 14, 2016.