New Features in Impala

Impala contains the following changes and enhancements from previous releases.

New Features in Impala Version 1.1.1

Impala 1.1.1 includes new features for security and stability.

New user-visible features include:

  • Additional security feature: auditing. New startup options for impalad let you capture information about Impala queries that succeed or are blocked due to insufficient privileges. To take full advantage of this feature with Cloudera Manager, upgrade to Cloudera Manager 4.7 or higher. For details, see Impala Security.
  • Parquet data files generated by Impala 1.1.1 are now compatible with the Parquet support in Hive. See Incompatible Changes for the procedure to update older Impala-created Parquet files to be compatible with the Hive Parquet support.
  • Additional improvements to stability and resource utilization for Impala queries.
  • Additional enhancements for compatibility with existing file formats.

New Features in Impala Version 1.1

Impala 1.1 includes new features for security, performance, and usability.

New user-visible features include:

  • Extensive new security features, built on top of the Sentry open source project. Impala now supports fine-grained authorization based on roles. A policy file determines which privileges on which schema objects (servers, databases, tables, and HDFS paths) are available to users based on their membership in groups. By assigning privileges for views, you can control access to table data at the column level. For details, see Impala Security.
  • Impala 1.1 works with Cloudera Manager 4.6 or higher. To use Cloudera Manager to manage authorization for the Impala web UI (the web pages served from port 25000 by default), use Cloudera Manager 4.6.2 or higher.
  • Impala can now create, alter, drop, and query views. Views provide a flexible way to set up simple aliases for complex queries; hide query details from applications and users; and simplify maintenance as you rename or reorganize databases, tables, and columns. See the overview section Views and the statements CREATE VIEW Statement, ALTER VIEW Statement, and DROP VIEW Statement.
  • Performance is improved through a number of automatic optimizations. Resource consumption is also reduced for Impala queries. These improvements apply broadly across all kinds of workloads and file formats. The major areas of performance enhancement include:
    • Improved disk and thread scheduling, which applies to all queries.
    • Improved hash join and aggregation performance, which applies to queries with large build tables or a large number of groups.
    • Dictionary encoding with Parquet, which applies to Parquet tables with short string columns.
    • Improved performance on systems with SSDs, which applies to all queries and file formats.
  • Some new built-in functions are implemented: translate() to substitute characters within strings, user() to check the login ID of the connected user.
  • The new WITH clause for SELECT statements lets you simplify complicated queries in a way similar to creating a view. The effects of the WITH clause only last for the duration of one query, unlike views, which are persistent schema objects that can be used by multiple sessions or applications. See WITH Clause.
  • An enhancement to DESCRIBE statement, DESCRIBE FORMATTED table_name, displays more detailed information about the table. This information includes the file format, location, delimiter, ownership, external or internal, creation and access times, and partitions. The information is returned as a result set that can be interpreted and used by a management or monitoring application. See DESCRIBE Statement.
  • You can now insert a subset of columns for a table, with other columns being left as all NULL values. Or you can specify the columns in any order in the destination table, rather than having to match the order of the corresponding columns in the source. VALUES clause. This feature is known as "column permutation". See INSERT Statement.
  • The new LOAD DATA statement lets you load data into a table directly from an HDFS data file. This technique lets you minimize the number of steps in your ETL process, and provides more flexibility. For example, you can bring data into an Impala table in one step. Formerly, you might have created an external table where the data files are not entirely under your control, or copied the data files to Impala data directories manually, or loaded the original data into one table and then used the INSERT statement to copy it to a new table with a different file format, partitioning scheme, and so on. See LOAD DATA Statement.
  • Improvements to Impala-HBase integration:
  • You can issue REFRESH as a SQL statement through any of the programming interfaces that Impala supports. REFRESH formerly had to be issued as a command through the impala-shell interpreter, and was not available through a JDBC or ODBC API call. As part of this change, the functionality of the REFRESH statement is divided between two statements. In Impala 1.1, REFRESH requires a table name argument and immediately reloads the metadata; the new INVALIDATE METADATA statement works the same as the Impala 1.0 REFRESH did: the table name argument is optional, and the metadata for one or all tables is marked as stale, but not actually reloaded until the table is queried. When you create a new table in the Hive shell or through a different Impala node, you must enter INVALIDATE METADATA with no table parameter before you can see the new table in impala-shell. See REFRESH Statement and INVALIDATE METADATA Statement.

New Features in Impala Version 1.0.1

The primary enhancements in Impala 1.0.1 are internal, for compatibility with the new Cloudera Manager 4.6 release. Try out the new Impala Query Monitoring feature in Cloudera Manager 4.6, which requires Impala 1.0.1.

New user-visible features include:

  • The VALUES clause lets you INSERT one or more rows using literals, function return values, or other expressions. For performance and scalability, you should still use INSERT ... SELECT for bringing large quantities of data into an Impala table. The VALUES clause is a convenient way to set up small tables, particularly for initial testing of SQL features that do not require large amounts of data. See VALUES Clause for details.
  • The -B and -o options of the impala-shell command can turn query results into delimited text files and store them in an output file. The plain text results are useful for using with other Hadoop components or Unix tools. In benchmark tests, it is also faster to produce plain rather than pretty-printed results, and write to a file rather than to the screen, giving a more accurate picture of the actual query time.
  • Several bug fixes. See Known Issues Fixed in the 1.0.1 Release for details.

New Features in Impala Version 1.0

This version has multiple performance improvements and adds the following functionality:

New Features in Version 0.7 of the Cloudera Impala Beta Release

This version has multiple performance improvements and adds the following functionality:

  • Several bug fixes. See Known Issues Fixed in Version 0.7 of the Beta Release.
  • Support for the Parquet file format. For more information on file formats, see Understanding File Formats.
  • Added support for Avro.
  • Support for the memory limits. For more information, see the example on modifying memory limits in Modifying Impala Startup Options.
  • Bigger and faster joins through the addition of partitioned joins to the already supported broadcast joins.
  • Fully distributed aggregations.
  • Fully distributed top-n computation.
  • Support for creating and altering tables.
  • Support for GROUP BY with floats and doubles.

In this version, both CDH 4.1 and 4.2 are supported, but due to performance improvements added, we highly recommend you use CDH 4.2 or higher to see the full benefit. If you are using Cloudera Manager, version 4.5 is required.

New Features in Version 0.6 of the Cloudera Impala Beta Release

  • Several bug fixes. See Known Issues Fixed in Version 0.6 of the Beta Release.
  • Added support for Impala on SUSE and Debian/Ubuntu. Impala is now supported on:
    • RHEL5.7/6.2 and Centos5.7/6.2
    • SUSE 11 with Service Pack 1 or later
    • Ubuntu 10.04/12.04 and Debian 6.03
  • Cloudera Manager 4.5 and CDH 4.2 support Impala 0.6.
  • Support for the RCFile file format. For more information on file formats, see Understanding File Formats.

New Features in Version 0.5 of the Cloudera Impala Beta Release

New Features in Version 0.4 of the Cloudera Impala Beta Release

  • Several bug fixes. See Known Issues Fixed in Version 0.4 of the Beta Release.
  • Added support for Impala on RHEL5.7/Centos5.7. Impala is now supported on RHEL5.7/6.2 and Centos5.7/6.2.
  • Cloudera Manager 4.1.3 supports Impala 0.4.
  • The Impala debug webserver now has the ability to serve static files from ${IMPALA_HOME}/www. This can be disabled by setting --enable_webserver_doc_root=false on the command line. As a result, Impala now uses the Twitter Bootstrap library to style its debug webpages, and the /queries page now tracks the last 25 queries run by each Impala daemon.
  • Additional metrics available on the Impala Debug Webpage.

New Features in Version 0.3 of the Cloudera Impala Beta Release

  • Several bug fixes. See Known Issues Fixed in Version 0.3 of the Beta Release.
  • The state-store-service binary has been renamed statestored.
  • The location of the Impala configuration files has changed from the /usr/lib/impala/conf directory to the /etc/impala/conf directory.

New Features in Version 0.2 of the Cloudera Impala Beta Release