This is the documentation for Impala 2.1.x, included as part of CDH 5.3.x.
Latest Version of this Page | All Cloudera Docs

Known Issues and Workarounds in Impala

The following sections describe known issues and workarounds in Impala.

For issues fixed in various Impala releases, see Fixed Issues in Impala.

This page summarizes the most serious or frequently encountered issues in the current release, to help you make decisions about installing and upgrading. The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue you are experiencing has already been reported, or which release an issue is fixed in, search on the JIRA tracker.

Further Information Available in Standalone CDH Release Notes

  Note: Starting in April 2016, future release note updates are being consolidated in a single location to avoid duplication of stale or incomplete information. You can view online the Impala New Features, Incompatible Changes, Known Issues, and Fixed Issues. You can view or print all of these by downloading the latest Impala PDF.

Known Issues in the Current Production Release (Impala 2.1.x)

These known issues affect the current release. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and whether a fix is in the pipeline.

Continue reading:

Impala requires Parquet column metadata in same order as the schema definition

Impala could read columns in incorrect order from Parquet files created by other components. Some files created using external Parquet libraries could contain column metadata written in a different order than the actual columns within the file.

Severity: High

Resolution: The Parquet libraries used by other components, and the Parquet spec itself, are being updated to match Impala behavior as part of the issue PARQUET-188.

Fix decompressor to allow parsing gzips with multiple streams

Currently, Impala can only read gzipped files containing a single stream. If a gzipped file contains multiple concatenated streams, the Impala query only processes the data from the first stream.

Bug: IMPALA-2154

Severity: High

Workaround: Use a different gzip tool to compress file to a single stream file.

Partitions with TINYINT partition columns will always have 0 estimated rows after compute stats

Declaring a partition key column as a TINYINT caused problems with the COMPUTE STATS statement. The associated partitions would always have zero estimated rows, leading to potential inefficient query plans.

Bug: IMPALA-2136

Severity: High

Workaround: Temporarily convert any TINYINT partition key columns to larger-width integers such as SMALLINT. Issue an ALTER TABLE statement in Hive:

hive> ALTER TABLE table PARTITION COLUMN (partition_column SMALLINT);

Collect new metadata and statistics in impala-shell:


Be prepared to change the partition columns back to TINYINT if this issue is fixed in a future release.

invalid tuple_idx when combining INSERT INTO with analytic subquery

An INSERT ... SELECT statement could encounter an error if the SELECT portion included an analytic function call.

Bug: IMPALA-1737

Severity: High

Workaround: Rewrite the statement as a CREATE TABLE AS SELECT statement.

CPU requirement for SSE4.1

Currently, Impala 2.0.x and 2.1.x do not function on CPUs without the SSE4.1 instruction set. This minimum CPU requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check the CPU level of the hosts in your cluster before upgrading to Impala 2.0.x or 2.1.x, or CDH 5.2.x or CDH 5.3.x.

impala processes should cleanup their own old log files

Because Impala log files are not automatically deleted, you could potentially encounter disk space issues due to log file growth.

Bug: IMPALA-377

Severity: High

Workaround: Set up manual log rotation using your Linux tool or technique of choice. See Rotating Impala Logs for details.

Can't update stats manually via alter table after upgrading to CDH 5.2

Bug: IMPALA-1420

Severity: High

Workaround: On CDH 5.2, when adjusting table statistics manually by setting the numRows, you must also enable the Boolean property STATS_GENERATED_VIA_STATS_TASK. For example, use a statement like the following to set both properties with a single ALTER TABLE statement:

ALTER TABLE table_name SET TBLPROPERTIES('numRows'='new_value', 'STATS_GENERATED_VIA_STATS_TASK' = 'true');

Resolution: The underlying cause is the issue HIVE-8648 that affects the metastore in Hive 0.13. The workaround is only needed until the fix for this issue is incorporated into a CDH release.

Memory leak using zlib on CentOS6 (and possibly other platforms)

Unreleased memory could accumulate as more and more queries are run. The cause is thought to be a bug in version 1.2.3 of the zlib library, which is used in CentOS 6.4 and possibly other Linux releases. Impala uses this library internally to compress query profiles.

Bug: IMPALA-1194

Severity: High

Resolution: Under investigation

Memory Limit Exceeded Error when running with multiple clients

Out-of-memory errors could occur if multiple concurrent queries utilize the "spill to disk" mechanism because of memory pressure from other queries.

Bug: IMPALA-1385

Severity: High

Workaround: Either run such queries concurrently using a mechanism such as admission control, or reduce the memory limit for each query so that the spilling operation is triggered sooner. For example, if two queries are encountering this issue when running with MEM_LIMIT=4g, reduce the memory limit for each query by half, to 2 GB.

ORDER BY rand() does not work.

Because the value for rand() is computed early in a query, using an ORDER BY expression involving a call to rand() does not actually randomize the results.

Bug: IMPALA-397

Severity: High

Impala BE cannot parse Avro schema that contains a trailing semi-colon

If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.

Bug: IMPALA-1024

Severity: High

Process mem limit does not account for the JVM's memory usage

Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the impalad daemon.

Bug: IMPALA-691

Severity: High

Workaround: To monitor overall memory usage, use the top command, or add the memory figures in the Impala web UI /memz tab to JVM memory usage shown on the /metrics tab.

Impala Parser issue when using fully qualified table names that start with a number.

A fully qualified table name starting with a number could cause a parsing error. In a name such as db.571_market, the decimal point followed by digits is interpreted as a floating-point number.

Bug: IMPALA-941

Severity: High

Workaround: Surround each part of the fully qualified name with backticks (``).

CatalogServer should not require HBase to be up to reload its metadata

If HBase is unavailable during Impala startup or after an INVALIDATE METADATA statement, the catalogd daemon could go into an error loop, making Impala unresponsive.

Bug: IMPALA-788

Severity: High

Workaround: For systems not managed by Cloudera Manager, add the following settings to /etc/impala/conf/hbase-site.xml:


Currently, Cloudera Manager does not have an Impala-only override for HBase settings, so any HBase configuration change you make through Cloudera Manager would take affect for all HBase applications. Therefore, this change is not recommended on systems managed by Cloudera Manager.

Kerberos tickets must be renewable

In a Kerberos environment, the impalad daemon might not start if Kerberos tickets are not renewable.

Workaround: Configure your KDC to allow tickets to be renewed, and configure krb5.conf to request renewable tickets.

Avro Scanner fails to parse some schemas

Querying certain Avro tables could cause a crash or return no rows, even though Impala could DESCRIBE the table.

Bug: IMPALA-635

Severity: High

Workaround: Swap the order of the fields in the schema specification. For example, ["null", "string"] instead of ["string", "null"].

Resolution: Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the crashing issue is resolved.

Configuration needed for Flume to be compatible with Impala

For compatibility with Impala, the value for the Flume HDFS Sink hdfs.writeFormat must be set to Text, rather than its default value of Writable. The hdfs.writeFormat setting must be changed to Text before creating data files with Flume; otherwise, those files cannot be read by either Impala or Hive.

Severity: High

Resolution: This information has been requested to be added to the upstream Flume documentation.

Impala does not support running on clusters with federated namespaces

Impala does not support running on clusters with federated namespaces. The impalad process will not start on a node running such a filesystem based on the org.apache.hadoop.fs.viewfs.ViewFs class.

Bug: IMPALA-77

Severity: Undetermined

Anticipated Resolution: Limitation

Workaround: Use standard HDFS on all Impala nodes.

Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)

Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum allowed value of type (Hive returns NULL).

Severity: Low

Workaround: None

Deviation from Hive behavior: Impala does not do implicit casts between string and numeric and boolean types.

Severity: Low

Anticipated Resolution: None

Workaround: Use explicit casts.

If Hue and Impala are installed on the same host, and if you configure Hue Beeswax in CDH 4.1 to execute Impala queries, Beeswax cannot list Hive tables and shows an error on Beeswax startup.

Hue requires Beeswaxd to be running in order to list the Hive tables. Because of a port conflict bug in Hue in CDH4.1, when Hue and Impala are installed on the same host, an error page is displayed when you start the Beeswax application, and when you open the Tables page in Beeswax.

Severity: High

Anticipated Resolution: Fixed in an upcoming CDH4 release

Workarounds: Choose one of the following workarounds (but only one):

  • Install Hue and Impala on different hosts. OR
  • Upgrade to CDH4.1.2 and add the following property in the beeswax section of the /etc/hue/hue.ini configuration file:


  • If you are using CDH4.1.1 and you want to install Hue and Impala on the same host, change the code in this file:

    Replace line 66:


    With this line:


    Beeswaxd will then use port 8004.


    If you used Cloudera Manager to install Impala, refer to the Cloudera Manager release notes for information about using an equivalent workaround by specifying the beeswax_meta_server_only=9004 configuration value in the options field for Hue. In Cloudera Manager 4, these fields are labelled Safety Valve; in Cloudera Manager 5, they are called Advanced Configuration Snippet.

Impala should tolerate bad locale settings

If the LC_* environment variables specify an unsupported locale, Impala does not start.

Bug: IMPALA-532

Severity: Low

Workaround: Add LC_ALL="C" to the environment settings for both the Impala daemon and the Statestore daemon. See Modifying Impala Startup Options for details about modifying these environment settings.

Resolution: Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution.

Log Level 3 Not Recommended for Impala

The extensive logging produced by log level 3 can cause serious performance overhead and capacity issues.

Severity: Low

Workaround: Reduce the log level to its default value of 1, that is, GLOG_v=1. See Setting Logging Levels for details about the effects of setting different logging levels.

Page generated September 15, 2016.