This is the documentation for CDH 5.0.x.
Documentation for other versions is available at Cloudera Documentation.

Cloudera Impala Incompatible Changes

Impala Impala 1.3.x contains the following incompatible changes. These are things such as file format changes, removed features, or changes to implementation, default configuration, dependencies, or prerequisites that could cause issues during or after an Impala upgrade.

Even added SQL statements or clauses can produce incompatibilities, if you have databases, tables, or columns whose names conflict with the new keywords. See Appendix C - Impala Reserved Words for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.

Continue reading:

Incompatible Changes Introduced in Cloudera Impala 1.3.3 / CDH 5.0.5

No incompatible changes. The SSL security fix does not require any change in the way you interact with Impala.

  Note: Impala 1.3.3 is only available as part of CDH 5.0.5, not under CDH 4.

Incompatible Changes Introduced in Cloudera Impala 1.3.2

With the fix for IMPALA-1019, you can use HDFS caching for files that are accessed by Impala.

  Note: Impala 1.3.2 is only available as part of CDH 5.0.4, not under CDH 4.

Incompatible Changes Introduced in Cloudera Impala 1.3.1

  • In Impala 1.3.1 and higher, the REGEXP and RLIKE operators now match a regular expression string that occurs anywhere inside the target string, the same as if the regular expression was enclosed on each side by .*. See REGEXP Operator for examples. Previously, these operators only succeeded when the regular expression matched the entire target string. This change improves compatibility with the regular expression support for popular database systems. There is no change to the behavior of the regexp_extract() and regexp_replace() built-in functions.

  • The result set for the SHOW FUNCTIONS statement includes a new first column, with the data type of the return value. See SHOW Statement for examples.

Incompatible Changes Introduced in Cloudera Impala 1.3.0

  • The EXPLAIN_LEVEL query option now accepts numeric options from 0 (most concise) to 3 (most verbose), rather than only 0 or 1. If you formerly used SET EXPLAIN_LEVEL=1 to get detailed explain plans, switch to SET EXPLAIN_LEVEL=3. If you used the mnemonic keyword (SET EXPLAIN_LEVEL=verbose), you do not need to change your code because now level 3 corresponds to verbose. See EXPLAIN_LEVEL for details about the allowed explain levels, and Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles for usage information.

  • The keyword DECIMAL is now a reserved word. If you have any databases, tables, columns, or other objects already named DECIMAL, quote any references to them using backticks (``) to avoid name conflicts with the keyword.
      Note: Although the DECIMAL keyword is a reserved word, currently Impala does not support DECIMAL as a data type for columns.
  • The query option named YARN_POOL during the CDH 5 beta period is now named REQUEST_POOL to reflect its broader use with the Impala admission control feature. See REQUEST_POOL for information about the option, and Admission Control and Query Queuing for details about its use with the admission control feature.

  • There are some changes to the list of reserved words; see Appendix C - Impala Reserved Words for the most current list:

    • The names of aggregate functions are no longer reserved words, so you can have databases, tables, columns, or other objects named AVG, MIN, and so on without any name conflicts.

    • The internal function names DISTINCTPC and DISTINCTPCSA are no longer reserved words, although DISTINCT is still a reserved word.

    • The keywords CLOSE_FN and PREPARE_FN are now reserved words. See CREATE FUNCTION Statement for their role in the CREATE FUNCTION statement, and Thread-Safe Work Area for UDFs for usage information.

  • The HDFS property dfs.client.file-block-storage-locations.timeout was renamed to dfs.client.file-block-storage-locations.timeout.millis, to emphasize that the unit of measure is milliseconds, not seconds. Impala requires a timeout of at least 10 seconds, making the minimum value for this setting 10000. On systems not managed by Cloudera Manager, you might need to edit the hdfs-site.xml file in the Impala configuration directory for the new name and minimum value.

Incompatible Changes Introduced in Cloudera Impala 1.2.4

There are no incompatible changes introduced in Impala 1.2.4.

Previously, after creating a table in Hive, you had to issue the INVALIDATE METADATA statement with no table name, a potentially expensive operation on clusters with many databases, tables, and partitions. Starting in Impala 1.2.4, you can issue the statement INVALIDATE METADATA table_name for a table newly created through Hive. Loading the metadata for only this one table is faster and involves less network overhead. Therefore, you might revisit your setup DDL scripts to add the table name to INVALIDATE METADATA statements, in cases where you create and populate the tables through Hive before querying them through Impala.

Incompatible Changes Introduced in Cloudera Impala 1.2.3

Because the feature set of Impala 1.2.3 is identical to Impala 1.2.2, there are no new incompatible changes. See Incompatible Changes Introduced in Cloudera Impala 1.2.2 if you are upgrading from Impala 1.2.1 or 1.1.x.

Incompatible Changes Introduced in Cloudera Impala 1.2.2

The following changes to SQL syntax and semantics in Impala 1.2.2 could require updates to your SQL code, or schema objects such as tables or views:

  • With the addition of the CROSS JOIN keyword, you might need to rewrite any queries that refer to a table named CROSS or use the name CROSS as a table alias:

    -- Formerly, 'cross' in this query was an alias for t1
    -- and it was a normal join query.
    -- In 1.2.2 and higher, CROSS JOIN is a keyword, so 'cross'
    -- is not interpreted as a table alias, and the query
    -- uses the special CROSS JOIN processing rather than a
    -- regular join.
    select * from t1 cross join t2...
    
    -- Now if CROSS is used in other context such as a table or column name,
    -- use backticks to escape it.
    create table `cross` (x int);
    select * from `cross`;
  • Formerly, a DROP DATABASE statement in Impala would not remove the top-level HDFS directory for that database. The DROP DATABASE has been enhanced to remove that directory. (You still need to drop all the tables inside the database first; this change only applies to the top-level directory for the entire database.)

  • The keyword PARQUET is introduced as a synonym for PARQUETFILE in the CREATE TABLE and ALTER TABLE statements, because that is the common name for the file format. (As opposed to SequenceFile and RCFile where the "File" suffix is part of the name.) Documentation examples have been changed to prefer the new shorter keyword. The PARQUETFILE keyword is still available for backward compatibility with older Impala versions.
  • New overloads are available for several operators and built-in functions, allowing you to insert their result values into smaller numeric columns such as INT, SMALLINT, TINYINT, and FLOAT without using a CAST() call. If you remove the CAST() calls from INSERT statements, those statements might not work with earlier versions of Impala.

Because many users are likely to upgrade straight from Impala 1.x to Impala 1.2.2, also read Incompatible Changes Introduced in Cloudera Impala 1.2.1 for things to note about upgrading to Impala 1.2.x in general.

In a Cloudera Manager environment, the catalog service is not recognized or managed by Cloudera Manager versions prior to 4.8. Cloudera Manager 4.8 and higher require the catalog service to be present for Impala. Therefore, if you upgrade to Cloudera Manager 4.8 or higher, you must also upgrade Impala to 1.2.1 or higher. Likewise, if you upgrade Impala to 1.2.1 or higher, you must also upgrade Cloudera Manager to 4.8 or higher.

Incompatible Changes Introduced in Cloudera Impala 1.2.1

The following changes to SQL syntax and semantics in Impala 1.2.1 could require updates to your SQL code, or schema objects such as tables or views:

  • In Impala 1.2.1 and higher, all NULL values come at the end of the result set for ORDER BY ... ASC queries, and at the beginning of the result set for ORDER BY ... DESC queries. In effect, NULL is considered greater than all other values for sorting purposes. The original Impala behavior always put NULL values at the end, even for ORDER BY ... DESC queries. The new behavior in Impala 1.2.1 makes Impala more compatible with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting behavior for NULL by adding the clause NULLS FIRST or NULLS LAST at the end of the ORDER BY clause.

    See NULL for more information.

Impala 1.2.1 goes along with CDH 4.5 and Cloudera Manager 4.8. If you used the beta version Impala 1.2.0 that came with the beta of CDH 5, Impala 1.2.1 includes all the features of Impala 1.2.0 except for resource management, which relies on the YARN framework from CDH 5.

The new catalogd service might require changes to any user-written scripts that stop, start, or restart Impala services, install or upgrade Impala packages, or issue REFRESH or INVALIDATE METADATA statements:

  • See Impala Installation , Upgrading Impala and Starting Impala, for usage information for the catalogd daemon.

  • The REFRESH and INVALIDATE METADATA statements are no longer needed when the CREATE TABLE, INSERT, or other table-changing or data-changing operation is performed through Impala. These statements are still needed if such operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the statements only need to be issued on one Impala node rather than on all nodes. See REFRESH Statement and INVALIDATE METADATA Statement for the latest usage information for those statements.

  • See The Impala Catalog Service for background information on the catalogd service.

In a Cloudera Manager environment, the catalog service is not recognized or managed by Cloudera Manager versions prior to 4.8. Cloudera Manager 4.8 and higher require the catalog service to be present for Impala. Therefore, if you upgrade to Cloudera Manager 4.8 or higher, you must also upgrade Impala to 1.2.1 or higher. Likewise, if you upgrade Impala to 1.2.1 or higher, you must also upgrade Cloudera Manager to 4.8 or higher.

Incompatible Changes Introduced in Cloudera Impala 1.2.0 (Beta)

There are no incompatible changes to SQL syntax in Impala 1.2.0 (beta).

Because Impala 1.2.0 is bundled with the CDH 5 beta download and depends on specific levels of Apache Hadoop components supplied with CDH 5, you can only install it in combination with the CDH 5 beta.

The new catalogd service might require changes to any user-written scripts that stop, start, or restart Impala services, install or upgrade Impala packages, or issue REFRESH or INVALIDATE METADATA statements:

  • See Impala Installation , Upgrading Impala and Starting Impala, for usage information for the catalogd daemon.

  • The REFRESH and INVALIDATE METADATA statements are no longer needed when the CREATE TABLE, INSERT, or other table-changing or data-changing operation is performed through Impala. These statements are still needed if such operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the statements only need to be issued on one Impala node rather than on all nodes. See REFRESH Statement and INVALIDATE METADATA Statement for the latest usage information for those statements.

  • See The Impala Catalog Service for background information on the catalogd service.

The new resource management feature interacts with both YARN and Llama services, which are available in CDH 5. These services are set up for you automatically in a Cloudera Manager (CM) environment. For information about setting up the YARN and Llama services, see the instructions for YARN and Llama in the CDH 5 Installation Guide. See Using YARN Resource Management with Impala (CDH 5 Only) for usage information for Impala resource management.

Incompatible Changes Introduced in Cloudera Impala 1.1.1

There are no incompatible changes in Impala 1.1.1.

Previously, it was not possible to create Parquet data through Impala and reuse that table within Hive. Now that Parquet support is available for Hive 10, reusing existing Impala Parquet data files in Hive requires updating the table metadata. Use the following command if you are already running Impala 1.1.1:

ALTER TABLE table_name SET FILEFORMAT PARQUETFILE;

If you are running a level of Impala that is older than 1.1.1, do the metadata update through Hive:

ALTER TABLE table_name SET SERDE 'parquet.hive.serde.ParquetHiveSerDe';
ALTER TABLE table_name SET FILEFORMAT
  INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
  OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";

Impala 1.1.1 and higher can reuse Parquet data files created by Hive, without any action required.

As usual, make sure to upgrade the impala-lzo-cdh4 package to the latest level at the same time as you upgrade the Impala server.

Incompatible Change Introduced in Cloudera Impala 1.1

  • The REFRESH statement now requires a table name; in Impala 1.0, the table name was optional. This syntax change is part of the internal rework to make REFRESH a true Impala SQL statement so that it can be called through the JDBC and ODBC APIs. REFRESH now reloads the metadata immediately, rather than marking it for update the next time any affected table is accessed. The previous behavior, where omitting the table name caused a refresh of the entire Impala metadata catalog, is available through the new INVALIDATE METADATA statement. INVALIDATE METADATA can be specified with a table name to affect a single table, or without a table name to affect the entire metadata catalog; the relevant metadata is reloaded the next time it is requested during the processing for a SQL statement. See REFRESH Statement and INVALIDATE METADATA Statement for the latest details about these statements.

Incompatible Changes Introduced in Cloudera Impala 1.0

  • If you use LZO-compressed text files, when you upgrade Impala to version 1.0, also update the impala-lzo-cdh4 to the latest level. See Using LZO-Compressed Text Files for details.
  • Cloudera Manager 4.5.2 and higher only supports Impala 1.0 and higher, and vice versa. If you upgrade to Impala 1.0 or higher managed by Cloudera Manager, you must also upgrade Cloudera Manager to version 4.5.2 or higher. If you upgrade from an earlier version of Cloudera Manager, and were using Impala, you must also upgrade Impala to version 1.0 or higher. The beta versions of Impala are no longer supported as of the release of Impala 1.0.

Incompatible Change Introduced in Version 0.7 of the Cloudera Impala Beta Release

  • The defaults for the -nn and -nn_port flags have changed and are now read from core-site.xml. Impala prints the values of -nn and -nn_port to the log when it starts. The ability to set -nn and -nn_port on the command line is deprecated in 0.7 and may be removed in Impala 0.8.

Incompatible Change Introduced in Version 0.6 of the Cloudera Impala Beta Release

  • Cloudera Manager 4.5 supports only version 0.6 of the Cloudera Impala Beta Release. It does not support the earlier beta versions. If you upgrade your Cloudera Manager installation, you must also upgrade Impala to beta version 0.6. If you upgrade Impala to beta version 0.6, you must upgrade Cloudera Manager to 4.5.

Incompatible Change Introduced in Version 0.4 of the Cloudera Impala Beta Release

  • Cloudera Manager 4.1.3 supports only version 0.4 of the Cloudera Impala Beta Release. It does not support the earlier beta versions. If you upgrade your Cloudera Manager installation, you must also upgrade Impala to beta version 0.4. If you upgrade Impala to beta version 0.4, you must upgrade Cloudera Manager to 4.1.3.

Incompatible Change Introduced in Version 0.3 of the Cloudera Impala Beta Release

  • Cloudera Manager 4.1.2 supports only version 0.3 of the Cloudera Impala Beta Release. It does not support the earlier beta versions. If you upgrade your Cloudera Manager installation, you must also upgrade Impala to beta version 0.3. If you upgrade Impala to beta version 0.3, you must upgrade Cloudera Manager to 4.1.2.