Impala Built-In Functions

Impala supports several categories of built-in functions. These functions let you perform mathematical calculations, string manipulation, date calculations, and other kinds of data transformations directly in SELECT statements. The built-in functions let a SQL query return results with all formatting, calculating, and type conversions applied, rather than performing time-consuming postprocessing in another application. By applying function calls where practical, you can make a SQL query that is as convenient as an expression in a procedural programming language or a formula in a spreadsheet.

The categories of functions supported by Impala are:

You call any of these functions through the SELECT statement. For most functions, you can omit the FROM clause and supply literal values for any required arguments:

select abs(-1);
+---------+
| abs(-1) |
+---------+
| 1       |
+---------+

select concat('The rain ', 'in Spain');
+---------------------------------+
| concat('the rain ', 'in spain') |
+---------------------------------+
| The rain in Spain               |
+---------------------------------+

select power(2,5);
+-------------+
| power(2, 5) |
+-------------+
| 32          |
+-------------+

When you use a FROM clause and specify a column name as a function argument, the function is applied for each item in the result set:

select concat('Country = ',country_code) from all_countries where population > 100000000;
select round(price) as dollar_value from product_catalog where price between 0.0 and 100.0;

Typically, if any argument to a built-in function is NULL, the result value is also NULL:

select cos(null);
+-----------+
| cos(null) |
+-----------+
| NULL      |
+-----------+

select power(2,null);
+----------------+
| power(2, null) |
+----------------+
| NULL           |
+----------------+

select concat('a',null,'b');
+------------------------+
| concat('a', null, 'b') |
+------------------------+
| NULL                   |
+------------------------+

Aggregate functions are a special category with different rules. These functions calculate a return value across all the items in a result set, so they require a FROM clause in the query:

select count(product_id) from product_catalog;
select max(height), avg(height) from census_data where age > 20;

Aggregate functions also ignore NULL values rather than returning a NULL result. For example, if some rows have NULL for a particular column, those rows are ignored when computing the AVG() for that column. Likewise, specifying COUNT(col_name) in a query counts only those rows where col_name contains a non-NULL value.

Aggregate functions are a special category with different rules. These functions calculate a return value across all the items in a result set, so they do require a FROM clause in the query:

select count(product_id) from product_catalog;
select max(height), avg(height) from census_data where age > 20;

Aggregate functions also ignore NULL values rather than returning a NULL result. For example, if some rows have NULL for a particular column, those rows are ignored when computing the AVG() for that column. Likewise, specifying COUNT(col_name) in a query counts only those rows where col_name contains a non-NULL value.

Analytic functions are a variation on aggregate functions. Instead of returning a single value, or an identical value for each group of rows, they can compute values that vary based on a "window" consisting of other rows around them in the result set.