SQL Differences Between Impala and Hive
The current release of Impala does not support the following SQL features that you might be familiar with from HiveQL:
- Non-scalar data types such as maps, arrays, structs
- Extensibility mechanisms such as TRANSFORM, custom file formats, or custom SerDes; user-defined functions are supported starting in Impala 1.2
- XML and JSON functions
- Certain aggregation functions from HiveQL: variance, var_pop, var_samp, stddev_pop, stddev_samp, covar_pop, covar_samp, corr, percentile, percentile_approx, histogram_numeric, collect_set; Impala supports these aggregation functions: MAX(), MIN(), SUM(), AVG(), and COUNT()
- User Defined Aggregate Functions (UDAFs)
- User Defined Table Generating Functions (UDTFs)
- Lateral views
- Authorization features such as roles
- Multiple DISTINCT clauses per query
Impala does not currently support these HiveQL statements:
- ANALYZE TABLE (the Impala equivalent is COMPUTE STATS)
- DESCRIBE COLUMN
- DESCRIBE DATABASE
- EXPORT TABLE
- IMPORT TABLE
- SHOW PARTITIONS
- SHOW TABLE EXTENDED
- SHOW INDEXES
- SHOW COLUMNS
The semantics of Impala SQL statements varies from HiveQL in some cases where they use similar SQL statement and clause names:
- Impala uses different syntax and names for query hints. See Joins for the Impala details.
- Impala does not expose MapReduce specific features of SORT BY, DISTRIBUTE BY, or CLUSTER BY.
- Impala does not require queries to include a FROM clause.
Impala supports a limited set of implicit casts. This can help avoid undesired results from unexpected casting
- Impala does not implicitly cast between string and numeric or Boolean types.
- Impala does perform implicit casts among the numeric types or from string to timestamp.
- Impala does not store timestamps using the local timezone, to avoid undesired results from unexpected time zone issues. Timestamps are stored relative to GMT.
- Impala does not return column overflows as NULL, so that customers can distinguish between NULL data and overflow conditions similar to how they do so with traditional database systems. Impala returns the largest or smallest value in the range for the type. For example, valid values for a tinyint range from -128 to 127. In Impala, a tinyint with a value of -200 returns -128 rather than NULL. A tinyint with a value of 200 returns 127.
- Impala does not provide virtual columns.
- Impala does not expose locking.
- Impala does not expose some configuration properties.
|<< Previous: Impala SQL Language Elements||Next: Built-in Function Support >>|