This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Built-in Functions

Impala supports several categories of built-in functions. These functions let you perform mathematical calculations, string manipulation, date calculations, and other kinds of data transformations directly in SELECT statements. The built-in functions let a SQL query return results with all formatting, calculating, and type conversions applied, rather than performing time-consuming postprocessing in another application. By applying function calls where practical, you can make a SQL query that is as convenient as an expression in a procedural programming language or a formula in a spreadsheet.

The categories of functions supported by Impala are:

You call any of these functions through the SELECT statement. For most functions, you can omit the FROM clause and supply literal values for any required arguments:

select abs(-1);
select concat('The rain ', 'in Spain');
select power(2,5);

When you use a FROM clause and specify a column name as a function argument, the function is applied for each item in the result set:

select concat('Country = ',country_code) from all_countries where population > 100000000;
select round(price) as dollar_value from product_catalog where price between 0.0 and 100.0;

Typically, if any argument to a built-in function is NULL, the result value is also NULL:

select cos(null);
select power(2,null);
select concat('a',null,'b');

Aggregate functions are a special category with different rules. These functions calculate a return value across all the items in a result set, so they require a FROM clause in the query:

select count(product_id) from product_catalog;
select max(height), avg(height) from census_data where age > 20;

Aggregate functions also ignore NULL values rather than returning a NULL result. For example, if some rows have NULL for a particular column, those rows are ignored when computing the AVG() for that column. Likewise, specifying COUNT(col_name) in a query counts only those rows where col_name contains a non-NULL value.

Aggregate functions are a special category with different rules. These functions calculate a return value across all the items in a result set, so they do require a FROM clause in the query:

select count(product_id) from product_catalog;
select max(height), avg(height) from census_data where age > 20;

Aggregate functions also ignore NULL values rather than returning a NULL result. For example, if some rows have NULL for a particular column, those rows are ignored when computing the AVG() for that column. Likewise, specifying COUNT(col_name) in a query counts only those rows where col_name contains a non-NULL value.

Continue reading:

Page generated September 3, 2015.