This is the documentation for Cloudera 5.4.x. Documentation for other versions is available at Cloudera Documentation.

Cloudera Impala Fixed Issues

The following sections describe the major issues fixed in each Impala release.

For known issues that are currently unresolved, see Cloudera Impala Known Issues.

Continue reading:

Issues Fixed in Impala for CDH 5.4.5

This section lists the most frequently encountered customer issues fixed in Impala for CDH 5.4.5.

  Note: The Impala 2.2.x maintenance releases now use the CDH 5.4.x numbering system rather than increasing the Impala version numbers. Impala 2.2 and higher are not available under CDH 4.

For the full list of fixed issues, see Issues Fixed in CDH 5.4.5.

Impala DML/DDL operations corrupt table metadata leading to Hive query failures

When the Impala COMPUTE STATS statement was run on a partitioned Parquet table that was created in Hive, the table subsequently became inaccessible in Hive. The table was still accessible to Impala. Regaining access in Hive required a workaround of creating a new table. The error displayed in Hive was:

Error: Error while compiling statement: FAILED: SemanticException Class not found: com.cloudera.impala.hive.serde.ParquetInputFormat (state=42000,code=40000)

Bug: IMPALA-2048

Severity: High

Avoiding a DCHECK of NULL hash table in spilled right joins

A query could encounter a serious error if it contained a RIGHT OUTER, RIGHT ANTI, or FULL OUTER join clause and approached the memory limit on a host so that the "spill to disk" mechanism was activated.

Bug: IMPALA-1929

Severity: High

Bug in PrintTColumnValue caused wrong stats for TINYINT partition cols

Declaring a partition key column as a TINYINT caused problems with the COMPUTE STATS statement. The associated partitions would always have zero estimated rows, leading to potential inefficient query plans.

Bug: IMPALA-2136

Severity: High

Where clause does not propagate to joins inside nested views

A query that referred to a view whose query referred to another view containing a join, could return incorrect results. WHERE clauses for the outermost query were not always applied, causing the result set to include additional rows that should have been filtered out.

Bug: IMPALA-2018

Severity: High

Add effective_user() builtin

The user() function returned the name of the logged-in user, which might not be the same as the user name being checked for authorization if, for example, delegation was enabled.

Bug: IMPALA-2064

Severity: High

Resolution: Rather than change the behavior of the user() function, the fix introduces an additional function effective_user() that returns the user name that is checked during authorization.

Make UTC to local TimestampValue conversion faster.

Query performance was improved substantially for Parquet files containing TIMESTAMP data written by Hive, when the -convert_legacy_hive_parquet_utc_timestamps=true setting is in effect.

Bug: IMPALA-2125

Severity: High

Workaround IMPALA-1619 in BufferedBlockMgr::ConsumeMemory()

A join query could encounter a serious error if the query approached the memory limit on a host so that the "spill to disk" mechanism was activated, and data volume in the join was large enough that an internal memory buffer exceeded 1 GB in size on a particular host. (Exceeding this limit would only happen for huge join queries, because Impala could split this intermediate data into 16 parts during the join query, and the buffer only contains compact bookkeeping data rather than the actual join column data.)

Bug: IMPALA-2065

Severity: High

Issues Fixed in Impala for CDH 5.4.3

This section lists the most frequently encountered customer issues fixed in Impala for CDH 5.4.3.

  Note: The Impala 2.2.x maintenance releases now use the CDH 5.4.x numbering system rather than increasing the Impala version numbers. Impala 2.2 and higher are not available under CDH 4.

For the full list of fixed issues, see Issues Fixed in CDH 5.4.3.

Enable using Isilon as the underlying filesystem.

The certification of CDH and Impala with the Isilon filesystem involves a number of fixes to performance and flexibility for dealing with I/O using remote reads. See Using Impala with Isilon Storage for details on using Impala and Isilon together.

Bug: IMPALA-1968, IMPALA-1730

Severity: High

Expand set of supported timezones.

The set of timezones recognized by Impala was expanded. You can always find the latest list of supported timezones in the Impala source code, in the file timezone_db.cc.

Bug: IMPALA-1381

Severity: High

Impala Timestamp ISO-8601 Support.

Impala can now process TIMESTAMP literals including a trailing z, signifying "Zulu" time, a synonym for UTC.

Bug: IMPALA-1963

Severity: High

Fix wrong warning when insert overwrite to empty table

An INSERT OVERWRITE operation would encounter an error if the SELECT portion of the statement returned zero rows, such as with a LIMIT 0 clause.

Bug: IMPALA-2008

Severity: High

Expand parsing of decimals to include scientific notation

DECIMAL literals can now include e scientific notation. For example, now CAST(1e3 AS DECIMAL(5,3)) is a valid expression. Formerly it returned NULL. Some scientific expressions might have worked before in DECIMAL context, but only when the scale was 0.

Bug: https://issues.cloudera.org/browse/

Severity: High

Issues Fixed in Impala for CDH 5.4.1

This section lists the most frequently encountered customer issues fixed in Impala for CDH 5.4.1.

  Note: The Impala 2.2.x maintenance releases now use the CDH 5.4.x numbering system rather than increasing the Impala version numbers. Impala 2.2 and higher are not available under CDH 4.

For the full list of fixed issues, see Issues Fixed in CDH 5.4.1.

Issues Fixed in the 2.2.0 Release / CDH 5.4.0

This section lists the most frequently encountered customer issues fixed in Impala 2.2.0.

For the full list of fixed issues in Impala 2.2.0, including over 40 critical issues, see this report in the JIRA system.

  Note: Impala 2.2.0 is available as part of CDH 5.4.0 and is not available for CDH 4. Cloudera does not intend to release future versions of Impala for CDH 4 outside patch and maintenance releases if required. Given the upcoming end-of-maintenance for CDH 4, Cloudera recommends all customers to migrate to a recent CDH 5 release.

Continue reading:

Altering a column's type causes column stats to stop sticking for that column

When the type of a column was changed in either Hive or Impala through ALTER TABLE CHANGE COLUMN, the metastore database did not correctly propagate that change to the table that contains the column statistics. The statistics (particularly the NDV) for that column were permanently reset and could not be changed by Impala's COMPUTE STATS command. The underlying cause is a Hive bug (HIVE-9866).

Bug: IMPALA-1607

Severity: Major

Resolution: Resolved by incorporating the fix for HIVE-9866.

Workaround: On systems without the corresponding Hive fix, change the column back to its original type. The stats reappear and you can recompute or drop them.

Impala may leak or use too many file descriptors

If a file was truncated in HDFS without a corresponding REFRESH in Impala, Impala could allocate memory for file descriptors and not free that memory.

Bug: IMPALA-1854

Severity: High

Spurious stale block locality messages

Impala could issue messages stating the block locality metadata was stale, when the metadata was actually fine. The internal "remote bytes read" counter was not being reset properly. This issue did not cause an actual slowdown in query execution, but the spurious error could result in unnecessary debugging work and unnecessary use of the INVALIDATE METADATA statement.

Bug: IMPALA-1712

Severity: High

DROP TABLE fails after COMPUTE STATS and ALTER TABLE RENAME to a different database.

When a table was moved from one database to another, the column statistics were not pointed to the new database.i This could result in lower performance for queries due to unavailable statistics, and also an inability to drop the table.

Bug: IMPALA-1711

Severity: High

IMPALA-1556 causes memory leak with secure connections

impalad daemons could experience a memory leak on clusters using Kerberos authentication, with memory usage growing as more data is transferred across the secure channel, either to the client program or between Impala nodes. The same issue affected LDAP-secured clusters to a lesser degree, because the LDAP security only covers data transferred back to client programs.

Bug: IMPALA-1674

Severity: High

unix_timestamp() does not return correct time

The unix_timestamp() function could return an incorrect value (a constant value of 1).

Bug: IMPALA-1623

Severity: High

Impala incorrectly handles text data missing a newline on the last line

Some queries did not recognize the final line of a text data file if the line did not end with a newline character. This could lead to inconsistent results, such as a different number of rows for SELECT COUNT(*) as opposed to SELECT *.

Bug: IMPALA-1476

Severity: High

Impala's ACLs check do not consider all group ACLs, only checked first one.

If the HDFS user ID associated with the impalad process had read or write access in HDFS based on group membership, Impala statements could still fail with HDFS permission errors if that group was not the first listed group for that user ID.

Bug: IMPALA-1805

Severity: High

Fix infinite loop opening or closing file with invalid metadata

Truncating a file in HDFS, after Impala had cached the file metadata, could produce a hang when Impala queried a table containing that file.

Bug: IMPALA-1794

Severity: High

Cannot write Parquet files when values are larger than 64KB

Impala could sometimes fail to INSERT into a Parquet table if a column value such as a STRING was larger than 64 KB.

Bug: IMPALA-1705

Severity: High

Impala Will Not Run on Certain Intel CPUs

This fix relaxes the CPU requirement for Impala. Now only the SSSE3 instruction set is required. Formerly, SSE4.1 instructions were generated, making Impala refuse to start on some older CPUs.

Bug: IMPALA-1646

Severity: High

Issues Fixed in the 2.1.5 Release / CDH 5.3.6

This section lists the most significant Impala issues fixed in Impala 2.1.5 for CDH 5.3.6.

For the full list of Impala fixed issues in this release, see Issues Fixed in CDH 5.3.6.

  Note: This Impala maintenance release is only available as part of CDH 5, not under CDH 4.

Issues Fixed in the 2.1.4 Release / CDH 5.3.4

This section lists the most significant Impala issues fixed in Impala 2.1.4 for CDH 5.3.4. Because CDH 5.3.5 does not include any code changes for Impala, Impala 2.1.4 is included with both CDH 5.3.4 and 5.3.5.

For the full list of Impala fixed issues in Impala 2.1.4 for CDH 5.3.4, see Issues Fixed in CDH 5.3.4.

  Note: This Impala maintenance release is only available as part of CDH 5, not under CDH 4.

Continue reading:

Crash: impala::TupleIsNullPredicate::Prepare

When expressions that tested for NULL were used in combination with analytic functions, an error could occur. (The original crash issue was fixed by an earlier patch.)

Bug: IMPALA-1519

Severity: High

Expand parsing of decimals to include scientific notation

DECIMAL literals could include e scientific notation. For example, now CAST(1e3 AS DECIMAL(5,3)) is a valid expression. Formerly it returned NULL. Some scientific expressions might have worked before in DECIMAL context, but only when the scale was 0.

Bug: IMPALA-1952

Severity: High

INSERT/CTAS evaluates and applies constant predicates.

An INSERT OVERWRITE statement would write new data, even if a constant clause such as WHERE 1 = 0 should have prevented it from writing any rows.

Bug: IMPALA-1860

Severity: High

Assign predicates below analytic functions with a compatible partition by clause

If the PARTITION BY clause in an analytic function refers to partition key columns in a partitioned table, now Impala can perform partition pruning during the analytic query.

Bug: IMPALA-1900

Severity: High

FIRST_VALUE may produce incorrect results with preceding windows

A query using the FIRST_VALUE analytic function and a window defined with the PRECEDING keyword could produce wrong results.

Bug: IMPALA-1888

Severity: High

FIRST_VALUE rewrite fn type might not match slot type

A query referencing a DECIMAL column with the FIRST_VALUE analytic function could encounter an error.

Bug: IMPALA-1559

Severity: High

AnalyticEvalNode cannot handle partition/order by exprs with NaN

A query using an analytic function could encounter an error if the evaluation of an analytic ORDER BY or PARTITION expression resulted in a NaN value, for example if the ORDER BY or PARTITION contained a division operation where both operands were zero.

Bug: IMPALA-1808

Severity: High

AnalyticEvalNode not properly handling nullable tuples

An analytic function containing only an OVER clause could encounter an error if another part of the query (specifically an outer join) produced all-NULL tuples.

Bug: IMPALA-1562

Severity: High

Issues Fixed in the 2.1.3 Release / CDH 5.3.3

Add compatibility flag for Hive-Parquet-Timestamps

When Hive writes TIMESTAMP values, it represents them in the local time zone of the server. Impala expects TIMESTAMP values to always be in the UTC time zone, possibly leading to inconsistent results depending on which component created the data files. This patch introduces a new startup flag, -convert_legacy_hive_parquet_utc_timestamps for the impalad daemon. Specify -convert_legacy_hive_parquet_utc_timestamps=true to make Impala recognize Parquet data files written by Hive and automatically adjust TIMESTAMP values read from those files into the UTC time zone for compatibility with other Impala TIMESTAMP processing. Although this setting is currently turned off by default, consider enabling it if practical in your environment, for maximum interoperability with Hive-created Parquet files.

Bug: IMPALA-1658

Severity: High

Use snprintf() instead of lexical_cast() in float-to-string casts

Converting a floating-point value to a STRING could be slower than necessary.

Bug: IMPALA-1738

Severity: High

Fix partition spilling cleanup when new stream OOMs

Certain calls to aggregate functions with STRING arguments could encounter a serious error when the system ran low on memory and attempted to activate the spill-to-disk mechanism. The error message referenced the function impala::AggregateFunctions::StringValGetValue.

Bug: IMPALA-1865

Severity: High

Impala's ACLs check do not consider all group ACLs, only checked first one.

If the HDFS user ID associated with the impalad process had read or write access in HDFS based on group membership, Impala statements could still fail with HDFS permission errors if that group was not the first listed group for that user ID.

Bug: IMPALA-1805

Severity: High

Fix infinite loop opening or closing file with invalid metadata

Truncating a file in HDFS, after Impala had cached the file metadata, could produce a hang when Impala queried a table containing that file.

Bug: IMPALA-1794

Severity: High

external-data-source-executor leaking global jni refs

Successive calls to the data source API could result in excessive memory consumption, with memory allocated but never freed.

Bug: IMPALA-1801

Severity: High

Spurious stale block locality messages

Impala could issue messages stating the block locality metadata was stale, when the metadata was actually fine. The internal "remote bytes read" counter was not being reset properly. This issue did not cause an actual slowdown in query execution, but the spurious error could result in unnecessary debugging work and unnecessary use of the INVALIDATE METADATA statement.

Bug: IMPALA-1712

Severity: High

Issues Fixed in the 2.1.2 Release / CDH 5.3.2

This section lists the most significant issues fixed in Impala 2.1.2.

For the full list of fixed issues in Impala 2.1.2, see this report in the JIRA system.

  Note: Impala 2.1.2 is available as part of CDH 5.3.2, not under CDH 4.

Continue reading:

Impala incorrectly handles double numbers with more than 19 significant decimal digits

When a floating-point value was read from a text file and interpreted as a FLOAT or DOUBLE value, it could be incorrectly interpreted if it included more than 19 significant digits.

Bug: IMPALA-1622

Severity: High

unix_timestamp() does not return correct time

The unix_timestamp() function could return an incorrect value (a constant value of 1).

Bug: IMPALA-1623

Severity: High

Row Count Mismatch: Partition pruning with NULL

A query against a partitioned table could return incorrect results if the WHERE clause compared the partition key to NULL using operators such as = or !=.

Bug: IMPALA-1535

Severity: High

Fetch column stats in bulk using new (Hive .13) HMS APIs

The performance of the COMPUTE STATS statement and queries was improved, particularly for wide tables.

Bug: IMPALA-1120

Severity: High

Issues Fixed in the 2.1.1 Release / CDH 5.3.1

This section lists the most significant issues fixed in Impala 2.1.1.

For the full list of fixed issues in Impala 2.1.1, see this report in the JIRA system.

Continue reading:

IMPALA-1556 causes memory leak with secure connections

impalad daemons could experience a memory leak on clusters using Kerberos authentication, with memory usage growing as more data is transferred across the secure channel, either to the client program or between Impala nodes. The same issue affected LDAP-secured clusters to a lesser degree, because the LDAP security only covers data transferred back to client programs.

Bug: https://issues.cloudera.org/browse/IMPALA-1674 IMPALA-1674

Severity: High

TSaslServerTransport::Factory::getTransport() leaks transport map entries

impalad daemons in clusters secured by Kerberos or LDAP could experience a slight memory leak on each connection. The accumulation of unreleased memory could cause problems on long-running clusters.

Bug: IMPALA-1668

Severity: High

Issues Fixed in the 2.1.0 Release / CDH 5.3.0

This section lists the most significant issues fixed in Impala 2.1.0.

For the full list of fixed issues in Impala 2.1.0, see this report in the JIRA system.

Continue reading:

Kerberos fetches 3x slower

Transferring large result sets back to the client application on Kerberos

Bug: IMPALA-1455

Severity: High

Compressed file needs to be hold on entirely in Memory

Queries on gzipped text files required holding the entire data file and its uncompressed representation in memory at the same time. SELECT and COMPUTE STATS statements could fail or perform inefficiently as a result. The fix enables streaming reads for gzipped text, so that the data is uncompressed as it is read.

Bug: IMPALA-1556

Severity: High

Cannot read hbase metadata with NullPointerException: null

Impala might not be able to access HBase tables, depending on the associated levels of Impala and HBase on the system.

Bug: IMPALA-1611

Severity: High

Serious errors / crashes

Improved code coverage in Impala testing uncovered a number of potentially serious errors that could occur with specific query syntax. These errors are resolved in Impala 2.1.

Bug: IMPALA-1553 , IMPALA-1528 , IMPALA-1526 , IMPALA-1524 , IMPALA-1508 , IMPALA-1493 , IMPALA-1501 , IMPALA-1483

Severity: High

Issues Fixed in the 2.0.5 Release / CDH 5.2.6

For the full list of fixed issues in Impala 2.0.5, see this report in the JIRA system.

  Note: Impala 2.0.5 is available as part of CDH 5.2.6, not under CDH 4.

Issues Fixed in the 2.0.4 Release / CDH 5.2.5

This section lists the most significant issues fixed in Impala 2.0.4.

For the full list of fixed issues in Impala 2.0.4, see this report in the JIRA system.

  Note: Impala 2.0.4 is available as part of CDH 5.2.5, not under CDH 4.

Continue reading:

Add compatibility flag for Hive-Parquet-Timestamps

When Hive writes TIMESTAMP values, it represents them in the local time zone of the server. Impala expects TIMESTAMP values to always be in the UTC time zone, possibly leading to inconsistent results depending on which component created the data files. This patch introduces a new startup flag, -convert_legacy_hive_parquet_utc_timestamps for the impalad daemon. Specify -convert_legacy_hive_parquet_utc_timestamps=true to make Impala recognize Parquet data files written by Hive and automatically adjust TIMESTAMP values read from those files into the UTC time zone for compatibility with other Impala TIMESTAMP processing. Although this setting is currently turned off by default, consider enabling it if practical in your environment, for maximum interoperability with Hive-created Parquet files.

Bug: IMPALA-1658

Severity: High

IoMgr infinite loop opening/closing file when shorter than cached metadata size

If a table data file was replaced by a shorter file outside of Impala, such as with INSERT OVERWRITE in Hive producing an empty output file, subsequent Impala queries could hang.

Bug: IMPALA-1794

Severity: High

Issues Fixed in the 2.0.3 Release / CDH 5.2.4

This section lists the most significant issues fixed in Impala 2.0.3.

For the full list of fixed issues in Impala 2.0.3, see this report in the JIRA system.

  Note: Impala 2.0.3 is available as part of CDH 5.2.4, not under CDH 4.

Continue reading:

Anti join could produce incorrect results when spilling

An anti-join query (or a NOT EXISTS operation that was rewritten internally into an anti-join) could produce incorrect results if Impala reached its memory limit, causing the query to write temporary results to disk.

Bug: IMPALA-1471

Severity: High

Row Count Mismatch: Partition pruning with NULL

A query against a partitioned table could return incorrect results if the WHERE clause compared the partition key to NULL using operators such as = or !=.

Bug: IMPALA-1535

Severity: High

Fetch column stats in bulk using new (Hive .13) HMS APIs

The performance of the COMPUTE STATS statement and queries was improved, particularly for wide tables.

Bug: IMPALA-1120

Severity: High

Issues Fixed in the 2.0.2 Release / CDH 5.2.3

This section lists the most significant issues fixed in Impala 2.0.2.

For the full list of fixed issues in Impala 2.0.2, see this report in the JIRA system.

  Note: Impala 2.0.2 is available as part of CDH 5.2.3, not under CDH 4.

Continue reading:

GROUP BY on STRING column produces inconsistent results

Some operations in queries submitted through Hue or other HiveServer2 clients could produce inconsistent results.

Bug: IMPALA-1453

Severity: High

Fix leaked file descriptor and excessive file descriptor use

Impala could encounter an error from running out of file descriptors. The fix reduces the amount of time file descriptors are kept open, and avoids leaking file descriptors when read operations encounter errors.

Severity: High

unix_timestamp() does not return correct time

The unix_timestamp() function could return a constant value 1 instead of a representation of the time.

Bug: IMPALA-1623

Severity: High

Impala should randomly select cached replica

To avoid putting too heavy a load on any one node, Impala now randomizes which scan node processes each HDFS data block rather than choosing the first cached block replica.

Bug: IMPALA-1586

Severity: High

Impala does not always give short name to Llama.

In clusters secured by Kerberos or LDAP, a discrepancy in internal transmission of user names could cause a communication error with Llama.

Bug: IMPALA-1606

Severity: High

accept unmangled native UDF symbols

The CREATE FUNCTION statement could report that it could not find a function entry point within the .so file for a UDF written in C++, even if the corresponding function was present.

Bug: IMPALA-1475

Severity: High

Issues Fixed in the 2.0.1 Release / CDH 5.2.1

This section lists the most significant issues fixed in Impala 2.0.1.

For the full list of fixed issues in Impala 2.0.1, see this report in the JIRA system.

Continue reading:

Queries fail with metastore exception after upgrade and compute stats

After running the COMPUTE STATS statement on an Impala table, subsequent queries on that table could fail with the exception message Failed to load metadata for table: default.stats_test.

Bug: https://issues.cloudera.org/browse/IMPALA-1416 IMPALA-1416

Severity: High

Workaround: Upgrading to CDH 5.2.1, or another level of CDH that includes the fix for HIVE-8627, prevents the problem from affecting future COMPUTE STATS statements. On affected levels of CDH, or for Impala tables that have become inaccessible, the workaround is to disable the hive.metastore.try.direct.sql setting in the Hive metastore hive-site.xml file and issue the INVALIDATE METADATA statement for the affected table. You do not need to rerun the COMPUTE STATS statement for the table.

Issues Fixed in the 2.0.0 Release / CDH 5.2.0

Join Hint is dropped when used inside a view

Hints specified within a view query did not take effect when the view was queried, leading to slow performance. As part of this fix, Impala now supports hints embedded within comments.

Bug: IMPALA-995"

Severity: High

WHERE condition ignored in simple query with RIGHT JOIN

Potential wrong results for some types of queries.

Bug: IMPALA-1101"

Severity: High

Query with self joined table may produce incorrect results

Potential wrong results for some types of queries.

Bug: IMPALA-1102"

Severity: High

Incorrect plan after reordering predicates (inner join following outer join)

Potential wrong results for some types of queries.

Bug: IMPALA-1118"

Severity: High

Combining fragments with compatible data partitions can lead to incorrect results due to type incompatibilities (missing casts).

Potential wrong results for some types of queries.

Bug: IMPALA-1123"

Severity: High

Predicate dropped: Inline view + DISTINCT aggregate in outer query

Potential wrong results for some types of queries.

Bug: IMPALA-1165"

Severity: High

Reuse of a column in JOIN predicate may lead to incorrect results

Potential wrong results for some types of queries.

Bug: IMPALA-1353"

Severity: High

Usage of TRUNC with string timestamp reliably crashes node

Serious error for certain combinations of function calls and data types.

Bug: IMPALA-1105"

Severity: High

Timestamp Cast Returns invalid TIMESTAMP

Serious error for certain combinations of function calls and data types.

Bug: IMPALA-1109"

Severity: High

IllegalStateException upon JOIN of DECIMAL columns with different precision

DECIMAL columns with different precision could not be compared in join predicates.

Bug: IMPALA-1121"

Severity: High

Allow creating Avro tables without column definitions. Allow COMPUTE STATS to always work on Impala-created Avro tables.

Hive-created Avro tables with columns specified by a JSON file or literal could produce errors when queried in Impala, and could not be used with the COMPUTE STATS statement. Now you can create such tables in Impala to avoid such errors.

Bug: IMPALA-1104"

Severity: High

Ensure all webserver output is escaped

The Impala debug web UI did not properly encode all output.

Bug: IMPALA-1133"

Severity: High

Queries with union in inline view have empty resource requests

Certain queries could run without obeying the limits imposed by resource management.

Bug: IMPALA-1236"

Severity: High

Impala does not employ ACLs when checking path permissions for LOAD and INSERT

Certain INSERT and LOAD DATA statements could fail unnecessarily, if the target directories in HDFS had restrictive HDFS permissions, but those permissions were overridden by HDFS extended ACLs.

Bug: IMPALA-1279"

Severity: High

Impala does not map principals to lowercase, affecting Sentry authorisation

In a Kerberos environment, the principal name was not mapped to lowercase, causing issues when a user logged in with an uppercase principal name and Sentry authorization was enabled.

Bug: IMPALA-1334"

Severity: High

Issues Fixed in the 1.4.4 Release / CDH 5.1.5

For the list of fixed issues, see Issues Fixed in CDH 5.1.5 in the CDH 5 Release Notes.

  Note: Impala 1.4.4 is available as part of CDH 5.1.5, not under CDH 4.

Issues Fixed in the 1.4.3 Release / CDH 5.1.4

Impala 1.4.3 includes fixes to address what is known as the POODLE vulnerability in SSLv3. SSLv3 access is disabled in the Impala debug web UI.

  Note: Impala 1.4.3 is available as part of CDH 5.1.4, and under CDH 4.

Issues Fixed in the 1.4.2 Release / CDH 5.1.3

This section lists the most significant issues fixed in Impala 1.4.2.

For the full list of fixed issues in Impala 1.4.2, see this report in the JIRA system.

  Note: Impala 1.4.3 is available as part of CDH 5.1.4, and under CDH 4.

Continue reading:

    Issues Fixed in the 1.4.1 Release / CDH 5.1.2

    impalad terminating with Boost exception

    Occasionally, a non-trivial query run through Llama could encounter a serious error. The detailed error in the log was:

    boost::exception_detail::clone_impl
      <boost::exception_detail::error_info_injector<boost::lock_error> >
    

    Severity: High

    Impalad uses wrong string format when writing logs

    Impala log files could contain internal error messages due to a problem formatting certain strings. The messages consisted of a Java call stack starting with:

    jni-util.cc:177] java.util.MissingFormatArgumentException: Format specifier 's'
    

    Severity: High

    Update HS2 client API.

    A downlevel version of the HiveServer2 API could cause difficulty retrieving the precision and scale of a DECIMAL value.

    Bug: IMPALA-1107

    Severity: High

    Impalad catalog updates can fail with error: "IllegalArgumentException: fromKey out of range" at com.cloudera.impala.catalog.CatalogDeltaLog

    The error in the title could occur following a DDL statement. This issue was discovered during internal testing and has not been reported in customer environments.

    Bug: IMPALA-1093

    Severity: High

    "Total" time counter does not capture all the network transmit time

    The time for some network operations was not counted in the report of total time for a query, making it difficult to diagnose network-related performance issues.

    Bug: IMPALA-1131

    Severity: High

    Impala will crash when reading certain Avro files containing bytes data

    Certain Avro fields for byte data could cause Impala to be unable to read an Avro data file, even if the field was not part of the Impala table definition. With this fix, Impala can now read these Avro data files, although Impala queries cannot refer to the "bytes" fields.

    Bug: IMPALA-1149

    Severity: High

    Support specifying a custom AuthorizationProvider in Impala

    The --authorization_policy_provider_class option for impalad was added back. This option specifies a custom AuthorizationProvider class rather than the default HadoopGroupAuthorizationProvider. It had been used for internal testing, then removed in Impala 1.4.0, but it was considered useful by some customers.

    Bug: IMPALA-1142

    Severity: High

    Issues Fixed in the 1.4.0 Release / CDH 5.1.0

    Failed DCHECK in disk-io-mgr-reader-context.cc:174

    The serious error in the title could occur, with the supplemental message:

    num_used_buffers_ < 0: #used=-1 during cancellation HDFS cached data

    The issue was due to the use of HDFS caching with data files accessed by Impala. Support for HDFS caching in Impala was introduced in Impala 1.4.0 for CDH 5.1.0. The fix for this issue was backported to Impala 1.3.x, and is the only change in Impala 1.3.2 for CDH 5.0.4.

    Bug: IMPALA-1019

    Severity: High

    Workaround: On CDH 5.0.x, upgrade to CDH 5.0.4 with Impala 1.3.2, where this issue is fixed. In Impala 1.3.0 or 1.3.1 on CDH 5.0.x, do not use HDFS caching for Impala data files in Impala internal or external tables. If some of these data files are cached (for example because they are used by other components that take advantage of HDFS caching), set the query option DISABLE_CACHED_READS=true. To set that option for all Impala queries across all sessions, start impalad with the -default_query_options option and include this setting in the option argument, or on a cluster managed by Cloudera Manager, fill in this option setting on the Impala Daemon options page.

    Resolution: This issue is fixed in Impala 1.3.2 for CDH 5.0.4. The addition of HDFS caching support in Impala 1.4 means that this issue does not apply to any new level of Impala on CDH 5.

    impala-shell only works with ASCII characters

    The impala-shell interpreter could encounter errors processing SQL statements containing non-ASCII characters.

    Bug: IMPALA-489

    Severity: High

    The extended view definition SQL text in Views created by Impala should always have fully-qualified table names

    When a view was accessed while inside a different database, references to tables were not resolved unless the names were fully qualified when the view was created.

    Bug: IMPALA-962

    Severity: High

    Impala forgets about partitions with non-existant locations

    If an ALTER TABLE specified a non-existent HDFS location for a partition, afterwards Impala would not be able to access the partition at all.

    Bug: IMPALA-741

    Severity: High

    CREATE TABLE LIKE fails if source is a view

    The CREATE TABLE LIKE clause was enhanced to be able to create a table with the same column definitions as a view. The resulting table is a text table unless the STORED AS clause is specified, because a view does not have an associated file format to inherit.

    Bug: IMPALA-834

    Severity: High

    Improve partition pruning time

    Operations on tables with many partitions could be slow due to the time to evaluate which partitions were affected. The partition pruning code was speeded up substantially.

    Bug: IMPALA-887

    Severity: High

    Improve compute stats performance

    The performance of the COMPUTE STATS statement was improved substantially. The efficiency of its internal operations was improved, and some statistics are no longer gathered because they are not currently used for planning Impala queries.

    Bug: IMPALA-1003

    Severity: High

    When I run CREATE TABLE new_table LIKE avro_table, the schema does not get mapped properly from an avro schema to a hive schema

    After a CREATE TABLE LIKE statement using an Avro table as the source, the new table could have incorrect metadata and be inaccessible, depending on how the original Avro table was created.

    Bug: IMPALA-185

    Severity: High

    Race condition in IoMgr. Blocked ranges enqueued after cancel.

    Impala could encounter a serious error after a query was cancelled.

    Bug: IMPALA-1046

    Severity: High

    Deadlock in scan node

    A deadlock condition could make all impalad daemons hang, making the cluster unresponsive for Impala queries.

    Bug: IMPALA-1083

    Severity: High

    Issues Fixed in the 1.3.3 Release / CDH 5.0.5

    Impala 1.3.3 includes fixes to address what is known as the POODLE vulnerability in SSLv3. SSLv3 access is disabled in the Impala debug web UI.

      Note: Impala 1.3.3 is only available as part of CDH 5.0.5, not under CDH 4.

    Continue reading:

      Issues Fixed in the 1.3.2 Release / CDH 5.0.4

      This backported bug fix is the only change between Impala 1.3.1 and Impala 1.3.2.

        Note: Impala 1.3.3 is only available as part of CDH 5.0.5, not under CDH 4.

      Failed DCHECK in disk-io-mgr-reader-context.cc:174

      The serious error in the title could occur, with the supplemental message:

      num_used_buffers_ < 0: #used=-1 during cancellation HDFS cached data

      The issue was due to the use of HDFS caching with data files accessed by Impala. Support for HDFS caching in Impala was introduced in Impala 1.4.0 for CDH 5.1.0. The fix for this issue was backported to Impala 1.3.x, and is the only change in Impala 1.3.2 for CDH 5.0.4.

      Bug: IMPALA-1019

      Severity: High

      Workaround: On CDH 5.0.x, upgrade to CDH 5.0.4 with Impala 1.3.2, where this issue is fixed. In Impala 1.3.0 or 1.3.1 on CDH 5.0.x, do not use HDFS caching for Impala data files in Impala internal or external tables. If some of these data files are cached (for example because they are used by other components that take advantage of HDFS caching), set the query option DISABLE_CACHED_READS=true. To set that option for all Impala queries across all sessions, start impalad with the -default_query_options option and include this setting in the option argument, or on a cluster managed by Cloudera Manager, fill in this option setting on the Impala Daemon options page.

      Resolution: This issue is fixed in Impala 1.3.2 for CDH 5.0.4. The addition of HDFS caching support in Impala 1.4 means that this issue does not apply to any new level of Impala on CDH 5.

      Issues Fixed in the 1.3.1 Release / CDH 5.0.3

      Impalad crashes when left joining inline view that has aggregate using distinct

      Impala could encounter a severe error in a query combining a left outer join with an inline view containing a COUNT(DISTINCT) operation.

      Bug: IMPALA-904

      Severity: High

      Incorrect result with group by query with null value in group by data

      If the result of a GROUP BY operation is NULL, the resulting row might be omitted from the result set. This issue depends on the data values and data types in the table.

      Bug: IMPALA-901

      Severity: High

      Drop Function does not clear local library cache

      When a UDF is dropped through the DROP FUNCTION statement, and then the UDF is re-created with a new .so library or JAR file, the original version of the UDF is still used when the UDF is called from queries.

      Bug: IMPALA-786

      Severity: High

      Workaround: Restart the impalad daemon on all nodes.

      Compute stats doesn't propagate underlying error correctly

      If a COMPUTE STATS statement encountered an error, the error message is "Query aborted" with no further detail. Common reasons why a COMPUTE STATS statement might fail include network errors causing the coordinator node to lose contact with other impalad instances, and column names that match Impala reserved words. (Currently, if a column name is an Impala reserved word, COMPUTE STATS always returns an error.)

      Bug: IMPALA-762

      Severity: High

      Inserts should respect changes in partition location

      After an ALTER TABLE statement that changes the LOCATION property of a partition, a subsequent INSERT statement would always use a path derived from the base data directory for the table.

      Bug: IMPALA-624

      Severity: High

      Text data with carriage returns generates wrong results for count(*)

      A COUNT(*) operation could return the wrong result for text tables using nul characters (ASCII value 0) as delimiters.

      Bug: IMPALA-13

      Severity: High

      Workaround: Impala adds support for ASCII 0 characters as delimiters through the clause FIELDS TERMINATED BY '\0'.

      IO Mgr should take instance memory limit into account when creating io buffers

      Impala could allocate more memory than necessary during certain operations.

      Bug: IMPALA-488

      Severity: High

      Workaround: Before issuing a COMPUTE STATS statement for a Parquet table, reduce the number of threads used in that operation by issuing SET NUM_SCANNER_THREADS=2 in impala-shell. Then issue UNSET NUM_SCANNER_THREADS before continuing with queries.

      Impala should provide an option for new sub directories to automatically inherit the permissions of the parent directory

      When new subdirectories are created underneath a partitioned table by an INSERT statement, previously the new subdirectories always used the default HDFS permissions for the impala user, which might not be suitable for directories intended to be read and written by other components also.

      Bug: IMPALA-827

      Severity: High

      Resolution: In Impala 1.3.1 and higher, you can specify the --insert_inherit_permissions configuration when starting the impalad daemon.

      Illegal state exception (or crash) in query with UNION in inline view

      Impala could encounter a severe error in a query where the FROM list contains an inline view that includes a UNION. The exact type of the error varies.

      Bug: IMPALA-888

      Severity: High

      INSERT column reordering doesn't work with SELECT clause

      The ability to specify a subset of columns in an INSERT statement, with order different than in the target table, was not working as intended.

      Bug: IMPALA-945

      Severity: High

      Issues Fixed in the 1.3.0 Release / CDH 5.0.0

      Inner join after right join may produce wrong results

      The automatic join reordering optimization could incorrectly reorder queries with an outer join or semi join followed by an inner join, producing incorrect results.

      Bug: IMPALA-860

      Severity: High

      Workaround: Including the STRAIGHT_JOIN keyword in the query prevented the issue from occurring.

      Incorrect results with codegen on multi-column group by with NULLs.

      A query with a GROUP BY clause referencing multiple columns could introduce incorrect NULL values in some columns of the result set. The incorrect NULL values could appear in rows where a different GROUP BY column actually did return NULL.

      Bug: IMPALA-850

      Severity: High

      Using distinct inside aggregate function may cause incorrect result when using having clause

      A query could return incorrect results if it combined an aggregate function call, a DISTINCT operator, and a HAVING clause, without a GROUP BY clause.

      Bug: IMPALA-845

      Severity: High

      Aggregation on union inside (inline) view not distributed properly.

      An aggregation query or a query with ORDER BY and LIMIT could be executed on a single node in some cases, rather than distributed across the cluster. This issue affected queries whose FROM clause referenced an inline view containing a UNION.

      Bug: IMPALA-831

      Severity: High

      Wrong expression may be used in aggregate query if there are multiple similar expressions

      If a GROUP BY query referenced the same columns multiple times using different operators, result rows could contain multiple copies of the same expression.

      Bug: IMPALA-817

      Severity: High

      Incorrect results when changing the order of aggregates in the select list with codegen enabled

      Referencing the same columns in both a COUNT() and a SUM() call in the same query, or some other combinations of aggregate function calls, could incorrectly return a result of 0 from one of the aggregate functions. This issue affected references to TINYINT and SMALLINT columns, but not INT or BIGINT columns.

      Bug: IMPALA-765

      Severity: High

      Workaround: Setting the query option DISABLE_CODEGEN=TRUE prevented the incorrect results. Switching the order of the function calls could also prevent the issue from occurring.

      Union queries give Wrong result in a UNION followed by SIGSEGV in another union

      A UNION query could produce a wrong result, followed by a serious error for a subsequent UNION query.

      Bug: IMPALA-723

      Severity: High

      String data in MR-produced parquet files may be read incorrectly

      Impala could return incorrect string results when reading uncompressed Parquet data files containing multiple row groups. This issue only affected Parquet data files produced by MapReduce jobs.

      Bug: IMPALA-729

      Severity: High

      Compute stats need to use quotes with identifiers that are Impala keywords

      Using a column or table name that conflicted with Impala keywords could prevent running the COMPUTE STATS statement for the table.

      Bug: IMPALA-777

      Severity: High

      COMPUTE STATS child queries do not inherit parent query options.

      The COMPUTE STATS statement did not use the setting of the MEM_LIMIT query option in impala-shell, potentially causing problems gathering statistics for wide Parquet tables.

      Bug: IMPALA-903

      Severity: High

      COMPUTE STATS should update partitions in batches

      The COMPUTE STATS statement could be slow or encounter a timeout while analyzing a table with many partitions.

      Bug: IMPALA-880

      Severity: High

      Fail early (in analysis) when COMPUTE STATS is run against Avro table with no columns

      If the columns for an Avro table were all defined in the TBLPROPERTIES or SERDEPROPERTIES clauses, the COMPUTE STATS statement would fail after completely analyzing the table, potentially causing a long delay. Although the COMPUTE STATS statement still does not work for such tables, now the problem is detected and reported immediately.

      Bug: IMPALA-867

      Severity: High

      Workaround: Re-create the Avro table with columns defined in SQL style, using the output of SHOW CREATE TABLE. (See the JIRA page for detailed steps.)

      Issues Fixed in the 1.2.4 Release

      The Catalog Server exits with an OOM error after a certain number of CREATE statements

      A large number of concurrent CREATE TABLE statements can cause the catalogd process to consume excessive memory, and potentially be killed due to an out-of-memory condition.

      Bug: IMPALA-818

      Severity: High

      Workaround: Restart the catalogd service and re-try the DDL operations that failed.

      Catalog Server consumes excessive cpu cycle

      A large number of tables and partitions could result in unnecessary CPU overhead during Impala idle time and background operations.

      Bug: IMPALA-821

      Severity: High

      Resolution: Catalog server processing was optimized in several ways.

      Query against Avro table crashes Impala with codegen enabled

      A query against a TIMESTAMP column in an Avro table could encounter a serious issue.

      Bug: IMPALA-828

      Severity: High

      Workaround: Set the query option DISABLE_CODEGEN=TRUE

      Statestore seems to send concurrent heartbeats to the same subscriber leading to repeated "Subscriber 'hostname' is registering with statestore, ignoring update" messages

      Impala nodes could produce repeated error messages after recovering from a communication error with the statestore service.

      Bug: IMPALA-809

      Severity: High

      Join predicate incorrectly ignored

      A join query could produce wrong results if multiple equality comparisons between the same tables referred to the same column.

      Bug: IMPALA-805

      Severity: High

      Query result differing between Impala and Hive

      Certain outer join queries could return wrong results. If one of the tables involved in the join was an inline view, some tests from the WHERE clauses could be applied to the wrong phase of the query.

      Severity: High

      ArrayIndexOutOfBoundsException / Invalid query handle when reading large HBase cell

      An HBase cell could contain a value larger than 32 KB, leading to a serious error when Impala queries that table. The error could occur even if the applicable row is not part of the result set.

      Bug: IMPALA-715

      Severity: High

      Workaround: Use smaller values in the HBase table, or exclude the column containing the large value from the result set.

      select with distinct and full outer join, impalad coredump

      A query involving a DISTINCT operator combined with a FULL OUTER JOIN could encounter a serious error.

      Bug: IMPALA-735

      Severity: High

      Workaround: Set the query option DISABLE_CODEGEN=TRUE

      Impala cannot load tables with more than Short.MAX_VALUE number of partitions

      If a table had more than 32,767 partitions, Impala would not recognize the partitions above the 32K limit and query results could be incomplete.

      Bug: IMPALA-749

      Severity: High

      Various issues with HBase row key specification

      Queries against HBase tables could fail with an error if the row key was compared to a function return value rather than a string constant. Also, queries against HBase tables could fail if the WHERE clause contained combinations of comparisons that could not possibly match any row key.

      Severity: High

      Resolution: Queries now return appropriate results when function calls are used in the row key comparison. For queries involving non-existent row keys, such as WHERE row_key IS NULL or where the lower bound is greater than the upper bound, the query succeeds and returns an empty result set.

      Issues Fixed in the 1.2.3 Release

      This release is a fix release that supercedes Impala 1.2.2, with the same features and fixes as 1.2.2 plus one additional fix for compatibility with Parquet files generated outside of Impala by components such as Hive, Pig, or MapReduce.

      Continue reading:

      Impala cannot read Parquet files with multiple row groups

      The parquet-mr library included with CDH4.5 writes files that are not readable by Impala, due to the presence of multiple row groups. Queries involving these data files might result in a crash or a failure with an error such as "Column chunk should not contain two dictionary pages".

      This issue does not occur for Parquet files produced by Impala INSERT statements, because Impala only produces files with a single row group.

      Bug: IMPALA-720

      Severity: High

      Issues Fixed in the 1.2.2 Release

      Order of table references in FROM clause is critical for optimal performance

      Impala does not currently optimize the join order of queries; instead, it joins tables in the order in which they are listed in the FROM clause. Queries that contain one or more large tables on the right hand side of joins (either an explicit join expressed as a JOIN statement or a join implicit in the list of table references in the FROM clause) may run slowly or crash Impala due to out-of-memory errors. For example:

      SELECT ... FROM small_table JOIN large_table

      Severity: Medium

      Anticipated Resolution: Fixed in Impala 1.2.2.

      Workaround: In Impala 1.2.2 and higher, use the COMPUTE STATS statement to gather statistics for each table involved in the join query, after data is loaded. Prior to Impala 1.2.2, modify the query, if possible, to join the largest table first. For example:

      SELECT ... FROM small_table JOIN large_table

      should be modified to:

      SELECT ... FROM large_table JOIN small_table

      Parquet in CDH4.5 writes data files that are sometimes unreadable by Impala

      Some Parquet files could be generated by other components that Impala could not read.

      Bug: IMPALA-694

      Severity: High

      Resolution: The underlying issue is being addressed by a fix in the CDH Parquet libraries. Impala 1.2.2 works around the problem and reads the existing data files.

      Deadlock in statestore when unregistering a subscriber and building a topic update

      The statestore service cound experience an internal error leading to a hang.

      Bug: IMPALA-699

      Severity: High

      IllegalStateException when doing a union involving a group by

      A UNION query where one side involved a GROUP BY operation could cause a serious error.

      Bug: IMPALA-687

      Severity: High

      Impala Parquet Writer hit DCHECK in RleEncoder

      A serious error could occur when doing an INSERT into a Parquet table.

      Bug: IMPALA-689

      Severity: High

      Hive UDF jars cannot be loaded by the FE

      If the JAR file for a Java-based Hive UDF was not in the CLASSPATH, the UDF could not be called during a query.

      Bug: IMPALA-695

      Severity: High

      Issues Fixed in the 1.2.1 Release

      Scanners use too much memory when reading past scan range

      While querying a table with long column values, Impala could over-allocate memory leading to an out-of-memory error. This problem was observed most frequently with tables using uncompressed RCFile or text data files.

      Bug: IMPALA-525

      Severity: High

      Resolution: Fixed in 1.2.1

      Join node consumes memory way beyond mem-limit

      A join query could allocate a temporary work area that was larger than needed, leading to an out-of-memory error. The fix makes Impala return unused memory to the system when the memory limit is reached, avoiding unnecessary memory errors.

      Bug: IMPALA-657

      Severity: High

      Resolution: Fixed in 1.2.1

      Excessive memory consumption when query tables with 1k columns (Parquet file)

      Impala could encounter an out-of-memory condition setting up work areas for Parquet tables with many columns. The fix reduces the size of the allocated memory when not actually needed to hold table data.

      Bug: IMPALA-652

      Severity: High

      Resolution: Fixed in 1.2.1

      Issues Fixed in the 1.2.0 Beta Release

      This section lists the most significant issues fixed in Impala 1.2 (beta). For the full list of fixed issues, see this report in the JIRA system.

      Issues Fixed in the 1.1.1 Release

      Unexpected LLVM Crash When Querying Doubles on CentOS 5.x

      Certain queries involving DOUBLE columns could fail with a serious error. The fix improves the generation of native machine instructions for certain chipsets.

      Bug: IMPALA-477

      Severity: High

      "block size is too big" error with Snappy-compressed RCFile containing null

      Queries could fail with a "block size is too big" error, due to NULL values in RCFile tables using Snappy compression.

      Bug: IMPALA-482

      Severity: High

      Cannot query RC file for table that has more columns than the data file

      Queries could fail if an Impala RCFile table was defined with more columns than in the corresponding RCFile data files.

      Bug: IMPALA-510

      Severity: High

      Views Sometimes Not Utilizing Partition Pruning

      Certain combinations of clauses in a view definition for a partitioned table could result in inefficient performance and incorrect results.

      Bug: IMPALA-495

      Severity: High

      Update the serde name we write into the metastore for Parquet tables

      The SerDes class string written into Parquet data files created by Impala was updated for compatibility with Parquet support in Hive. See Incompatible Changes Introduced in Impala 1.1.1 for the steps to update older Parquet data files for Hive compatibility.

      Bug: IMPALA-485

      Severity: High

      Selective queries over large tables produce unnecessary memory consumption

      A query returning a small result sets from a large table could tie up memory unnecessarily for the duration of the query.

      Bug: IMPALA-534

      Severity: High

      Impala stopped to query AVRO tables

      Queries against Avro tables could fail depending on whether the Avro schema URL was specified in the TBLPROPERTIES or SERDEPROPERTIES field. The fix causes Impala to check both fields for the schema URL.

      Bug: IMPALA-538

      Severity: High

      Impala continues to allocate more memory even though it has exceed its mem-limit

      Queries could allocate substantially more memory than specified in the impalad -mem_limit startup option. The fix causes more frequent checking of the limit during query execution.

      Bug: IMPALA-520

      Severity: High

      Issues Fixed in the 1.1.0 Release

      10-20% perf regression for most queries across all table formats

      This issue is due to a performance tradeoff between systems running many queries concurrently, and systems running a single query. Systems running only a single query could experience lower performance than in early beta releases. Systems running many queries simultaneously should experience higher performance than in the beta releases.

      Severity: High

      planner fails with "Join requires at least one equality predicate between the two tables" when "from" table order does not match "where" join order

      A query could fail if it involved 3 or more tables and the last join table was specified as a subquery.

      Bug: IMPALA-85

      Severity: High

      Parquet writer uses excessive memory with partitions

      INSERT statements against partitioned tables using the Parquet format could use excessive amounts of memory as the number of partitions grew large.

      Bug: IMPALA-257

      Severity: High

      Comments in impala-shell in interactive mode are not handled properly causing syntax errors or wrong results

      The impala-shell interpreter did not accept comment entered at the command line, making it problematic to copy and paste from scripts or other code examples.

      Bug: IMPALA-192

      Severity: Low

      Cancelled queries sometimes aren't removed from the inflight query list

      The Impala web UI would sometimes display a query as if it were still running, after the query was cancelled.

      Bug: IMPALA-364

      Severity: High

      Impala's 1.0.1 Shell Broke Python 2.4 Compatibility (AttributeError: 'module' object has no attribute 'field_size_limit)

      The impala-shell command in Impala 1.0.1 does not work with Python 2.4, which is the default on Red Hat 5.

      For the impala-shell command in Impala 1.0, the -o option (pipe output to a file) does not work with Python 2.4.

      Bug: IMPALA-396

      Severity: High

      Issues Fixed in the 1.0.1 Release

      Impala parquet scanner can not read all data files generated by other frameworks

      Impala might issue an erroneous error message when processing a Parquet data file produced by a non-Impala Hadoop component.

      Bug: IMPALA-333

      Severity: High

      Resolution: Fixed

      Impala is unable to query RCFile tables which describe fewer columns than the file's header.

      If an RCFile table definition had fewer columns than the fields actually in the data files, queries would fail.

      Bug: IMPALA-293

      Severity: High

      Resolution: Fixed

      Impala does not correctly substitute _HOST with hostname in --principal

      The _HOST placeholder in the --principal startup option was not substituted with the correct hostname, potentially leading to a startup error in setups using Kerberos authentication.

      Bug: IMPALA-351

      Severity: High

      Resolution: Fixed

      HBase query missed the last region

      A query for an HBase table could omit data from the last region.

      Bug: IMPALA-356

      Severity: High

      Resolution: Fixed

      Hbase region changes are not handled correctly

      After a region in an HBase table was split or moved, an Impala query might return incomplete or out-of-date results.

      Bug: IMPALA-300

      Severity: High

      Resolution: Fixed

      Query state for successful create table is EXCEPTION

      After a successful CREATE TABLE statement, the corresponding query state would be incorrectly reported as EXCEPTION.

      Bug: IMPALA-349

      Severity: High

      Resolution: Fixed

      Double check release of JNI-allocated byte-strings

      Operations involving calls to the Java JNI subsystem (for example, queries on HBase tables) could allocate memory but not release it.

      Bug: IMPALA-358

      Severity: High

      Resolution: Fixed

      Impala returns 0 for bad time values in UNIX_TIMESTAMP, Hive returns NULL

      Impala returns 0 for bad time values in UNIX_TIMESTAMP, Hive returns NULL.

      Impala:

      impala> select UNIX_TIMESTAMP('10:02:01') ;
      impala> 0

      Hive:

      hive> select UNIX_TIMESTAMP('10:02:01') FROM tmp;
      hive> NULL

      Bug: IMPALA-16

      Severity: Low

      Anticipated Resolution: Fixed

      INSERT INTO TABLE SELECT <constant> does not work.

      Insert INTO TABLE SELECT <constant> will not insert any data and may return an error.

      Severity: Low

      Anticipated Resolution: Fixed

      Issues Fixed in the 1.0 GA Release

      Undeterministically receive "ERROR: unknown row bach destination..." and "ERROR: Invalid query handle" from impala shell when running union query

      A query containing both UNION and LIMIT clauses could intermittently cause the impalad process to halt with a segmentation fault.

      Bug: IMPALA-183

      Severity: High

      Resolution: Fixed

      Insert with NULL partition keys results in SIGSEGV.

      An INSERT statement specifying a NULL value for one of the partitioning columns could cause the impalad process to halt with a segmentation fault.

      Bug: IMPALA-190

      Severity: High

      Resolution: Fixed

      INSERT queries don't show completed profiles on the debug webpage

      In the Impala web user interface, the profile page for an INSERT statement showed obsolete information for the statement once it was complete.

      Bug: IMPALA-217

      Severity: High

      Resolution: Fixed

      Impala HBase scan is very slow

      Queries involving an HBase table could be slower than expected, due to excessive memory usage on the Impala nodes.

      Bug: IMPALA-231

      Severity: High

      Resolution: Fixed

      Add some library version validation logic to impalad when loading impala-lzo shared library

      No validation was done to check that the impala-lzo shared library was compatible with the version of Impala, possibly leading to a crash when using LZO-compressed text files.

      Bug: IMPALA-234

      Severity: High

      Resolution: Fixed

      Workaround: Always upgrade the impala-lzo library at the same time as you upgrade Impala itself.

      Problems inserting into tables with TIMESTAMP partition columns leading table metadata loading failures and failed dchecks

      INSERT statements for tables partitioned on columns involving datetime types could appear to succeed, but cause errors for subsequent queries on those tables. The problem was especially serious if an improperly formatted timestamp value was specified for the partition key.

      Bug: IMPALA-238

      Severity: Critical

      Resolution: Fixed

      Ctrl-C sometimes interrupts shell in system call, rather than cancelling query

      Pressing Ctrl-C in the impala-shell interpreter could sometimes display an error and return control to the shell, making it impossible to cancel the query.

      Bug: IMPALA-243

      Severity: Critical

      Resolution: Fixed

      Empty string partition value causes metastore update failure

      Specifying an empty string or NULL for a partition key in an INSERT statement would fail.

      Bug: IMPALA-252

      Severity: High

      Resolution: Fixed. The behavior for empty partition keys was made more compatible with the corresponding Hive behavior.

      Round() does not output the right precision

      The round() function did not always return the correct number of significant digits.

      Bug: IMPALA-266

      Severity: High

      Resolution: Fixed

      Cannot cast string literal to string

      Casting from a string literal back to the same type would cause an "invalid type cast" error rather than leaving the original value unchanged.

      Bug: IMPALA-267

      Severity: High

      Resolution: Fixed

      Excessive mem usage for certain queries which are very selective

      Some queries that returned very few rows experienced unnecessary memory usage.

      Bug: IMPALA-288

      Severity: High

      Resolution: Fixed

      HdfsScanNode crashes in UpdateCounters

      A serious error could occur for relatively small and inexpensive queries.

      Bug: IMPALA-289

      Severity: High

      Resolution: Fixed

      Parquet performance issues on large dataset

      Certain aggregation queries against Parquet tables were inefficient due to lower than required thread utilization.

      Bug: IMPALA-292

      Severity: High

      Resolution: Fixed

      impala not populating hive metadata correctly for create table

      The Impala CREATE TABLE command did not fill in the owner and tbl_type columns in the Hive metastore database.

      Bug: IMPALA-295

      Severity: High

      Resolution: Fixed. The metadata was made more Hive-compatible.

      impala daemons die if statestore goes down

      The impalad instances in a cluster could halt when the statestored process became unavailable.

      Bug: IMPALA-312

      Severity: High

      Resolution: Fixed

      Constant SELECT clauses do not work in subqueries

      A subquery would fail if the SELECT statement inside it returned a constant value rather than querying a table.

      Bug: IMPALA-67

      Severity: High

      Resolution: Fixed

      Right outer Join includes NULLs as well and hence wrong result count

      The result set from a right outer join query could include erroneous rows containing NULL values.

      Bug: IMPALA-90

      Severity: High

      Resolution: Fixed

      Parquet scanner hangs for some queries

      The Parquet scanner non-deterministically hangs when executing some queries.

      Bug: IMPALA-204

      Severity: Medium

      Resolution: Fixed

      Issues Fixed in Version 0.7 of the Beta Release

      Impala does not gracefully handle unsupported Hive table types (INDEX and VIEW tables)

      When attempting to load metadata from an unsupported Hive table type (INDEX and VIEW tables), Impala fails with an unclear error message.

      Bug: IMPALA-167

      Severity: Low

      Resolution: Fixed in 0.7

      DDL statements (CREATE/ALTER/DROP TABLE) are not supported in the Impala Beta Release

      Severity: Medium

      Resolution: Fixed in 0.7

      Avro is not supported in the Impala Beta Release

      Severity: Medium

      Resolution: Fixed in 0.7

      Workaround: None

      Impala does not currently allow limiting the memory consumption of a single query

      It is currently not possible to limit the memory consumption of a single query. All tables on the right hand side of JOIN statements need to be able to fit in memory. If they do not, Impala may crash due to out of memory errors.

      Severity: High

      Resolution: Fixed in 0.7

      Aggregate of a subquery result set returns wrong results if the subquery contains a 'limit' and data is distributed across multiple nodes

      Aggregate of a subquery result set returns wrong results if the subquery contains a 'limit' clause and data is distributed across multiple nodes. From the query plan, it looks like we are just summing the results from each slave.

      Bug: IMPALA-20

      Severity: Low

      Resolution: Fixed in 0.7

      Partition pruning for arbitrary predicates that are fully bound by a particular partition column

      We currently can't utilize a predicate like "country_code in ('DE', 'FR', 'US')" to do partitioning pruning, because that requires an equality predicate or a binary comparison.

      We should create a superclass of planner.ValueRange, ValueSet, that can be constructed with an arbitrary predicate, and whose isInRange(analyzer, valueExpr) constructs a literal predicate by substitution of the valueExpr into the predicate.

      Bug: IMPALA-144

      Severity: Medium

      Resolution: Fixed in 0.7

      Issues Fixed in Version 0.6 of the Beta Release

      Impala reads the NameNode address and port as command line parameters

      Impala reads the NameNode address and port as command line parameters rather than reading them from core-site.xml. Updating the NameNode address in the core-site.xml file does not propagate to Impala.

      Severity: Low

      Resolution: Fixed in 0.6 - Impala reads the namenode location and port from the Hadoop configuration files, though setting -nn and -nn_port overrides this. Users are advised not to set -nn or -nn_port.

      Queries may fail on secure environment due to impalad Kerberos ticket expiration

      Queries may fail on secure environment due to impalad Kerberos tickets expiring. This can happen if the Impala -kerberos_reinit_interval flag is set to a value ten minutes or less. This may lead to an impalad requesting a ticket with a lifetime that is less than the time to the next ticket renewal.

      Bug: IMPALA-64

      Severity: Medium

      Resolution: Fixed in 0.6

      Concurrent queries may fail when Impala uses Thrift to communicate with the Hive Metastore

      Concurrent queries may fail when Impala is using Thrift to communicate with part of the Hive Metastore such as the Hive Metastore Service. In such a case, the error get_fields failed: out of sequence response" may occur because Impala shared a single Hive Metastore Client connection across threads. With Impala 0.6, a separate connection is used for each metadata request.

      Bug: IMPALA-48

      Severity: Low

      Resolution: Fixed in 0.6

      impalad fails to start if unable to connect to the Hive Metastore

      Impala fails to start if it is unable to establish a connection with the Hive Metastore. This behavior was fixed, allowing Impala to start, even when no Metastore is available.

      Bug: IMPALA-58

      Severity: Low

      Resolution: Fixed in 0.6

      Impala treats database names as case-sensitive in some contexts

      In some queries (including "USE database" statements), database names are treated as case-sensitive. This may lead queries to fail with an IllegalStateException.

      Bug: IMPALA-44

      Severity: Medium

      Resolution: Fixed in 0.6

      Impala does not ignore hidden HDFS files

      Impala does not ignore hidden HDFS files, meaning those files prefixed with a period '.' or underscore '_'. This diverges from Hive/MapReduce, which skips these files.

      Bug: IMPALA-18

      Severity: Low

      Resolution: Fixed in 0.6

      Issues Fixed in Version 0.5 of the Beta Release

      Impala may have reduced performance on tables that contain a large number of partitions

      Impala may have reduced performance on tables that contain a large number of partitions. This is due to extra overhead reading/parsing the partition metadata.

      Severity: High

      Resolution: Fixed in 0.5

      Backend client connections not getting cached causes an observable latency in secure clusters

      Backend impalads do not cache connections to the coordinator. On a secure cluster, this introduces a latency proportional to the number of backend clients involved in query execution, as the cost of establishing a secure connection is much higher than in the non-secure case.

      Bug: IMPALA-38

      Severity: Medium

      Resolution: Fixed in 0.5

      Concurrent queries may fail with error: "Table object has not been been initialised : `PARTITIONS`"

      Concurrent queries may fail with error: "Table object has not been been initialised : `PARTITIONS`". This was due to a lack of locking in the Impala table/database metadata cache.

      Bug: IMPALA-30

      Severity: Medium

      Resolution: Fixed in 0.5

      UNIX_TIMESTAMP format behaviour deviates from Hive when format matches a prefix of the time value

      The Impala UNIX_TIMESTAMP(val, format) operation compares the length of format and val and returns NULL if they do not match. Hive instead effectively truncates val to the length of the format parameter.

      Bug: IMPALA-15

      Severity: Medium

      Resolution: Fixed in 0.5

      Issues Fixed in Version 0.4 of the Beta Release

      Impala fails to refresh the Hive metastore if a Hive temporary configuration file is removed

      Impala is impacted by Hive bug HIVE-3596 which may cause metastore refreshes to fail if a Hive temporary configuration file is deleted (normally located at /tmp/hive-<user>-<tmp_number>.xml). Additionally, the impala-shell will incorrectly report that the failed metadata refresh completed successfully.

      Severity: Medium

      Anticipated Resolution: To be fixed in a future release

      Workaround: Restart the impalad service. Use the impalad log to check for metadata refresh errors.

      lpad/rpad builtin functions is not correct.

      The lpad/rpad builtin functions generate the wrong results.

      Severity: Mild

      Resolution: Fixed in 0.4

      Files with .gz extension reported as 'not supported'

      Compressed files with extensions incorrectly generate an exception.

      Bug: IMPALA-14

      Severity: High

      Resolution: Fixed in 0.4

      Queries with large limits would hang.

      Some queries with large limits were hanging.

      Severity: High

      Resolution: Fixed in 0.4

      Order by on a string column produces incorrect results if there are empty strings

      Severity: Low

      Resolution: Fixed in 0.4

      Issues Fixed in Version 0.3 of the Beta Release

      All table loading errors show as unknown table

      If Impala is unable to load the metadata for a table for any reason, a subsequent query referring to that table will return an unknown table error message, even if the table is known.

      Severity: Mild

      Resolution: Fixed in 0.3

      A table that cannot be loaded will disappear from SHOW TABLES

      After failing to load metadata for a table, Impala removes that table from the list of known tables returned in SHOW TABLES. Subsequent attempts to query the table returns 'unknown table', even if the metadata for that table is fixed.

      Severity: Mild

      Resolution: Fixed in 0.3

      Impala cannot read from HBase tables that are not created as external tables in the hive metastore.

      Attempting to select from these tables fails.

      Severity: Medium

      Resolution: Fixed in 0.3

      Certain queries that contain OUTER JOINs may return incorrect results

      Queries that contain OUTER JOINs may not return the correct results if there are predicates referencing any of the joined tables in the WHERE clause.

      Severity: Medium

      Resolution: Fixed in 0.3.

      Issues Fixed in Version 0.2 of the Beta Release

      Subqueries which contain aggregates cannot be joined with other tables or Impala may crash

      Subqueries that contain an aggregate cannot be joined with another table or Impala may crash. For example:

      SELECT * FROM (SELECT sum(col1) FROM some_table GROUP BY col1) t1 JOIN other_table ON (...);

      Severity: Medium

      Resolution: Fixed in 0.2

      An insert with a limit that runs as more than one query fragment inserts more rows than the limit.

      For example:

      INSERT OVERWRITE TABLE test SELECT * FROM test2 LIMIT 1;

      Severity: Medium

      Resolution: Fixed in 0.2

      Query with limit clause might fail.

      For example:

      SELECT * FROM test2 LIMIT 1;

      Severity: Medium

      Resolution: Fixed in 0.2

      Files in unsupported compression formats are read as plain text.

      Attempting to read such files does not generate a diagnostic.

      Severity: Medium

      Resolution: Fixed in 0.2

      Impala server raises a null pointer exception when running an HBase query.

      When querying an HBase table whose row-key is string type, the Impala server may raise a null pointer exception.

      Severity: Medium

      Resolution: Fixed in 0.2

      Page generated August 17, 2015.