The DISTINCT operator in a SELECT statement filters the result set to remove duplicates:
-- Returns the unique values from one column. -- NULL is included in the set of values if any rows have a NULL in this column. select distinct c_birth_country from customer; -- Returns the unique combinations of values from multiple columns. select distinct c_salutation, c_last_name from customer;
You can use DISTINCT in combination with an aggregation function, typically COUNT(), to find how many different values a column contains:
-- Counts the unique values from one column. -- NULL is not included as a distinct value in the count. select count(distinct c_birth_country) from customer; -- Counts the unique combinations of values from multiple columns. select count(distinct c_salutation, c_last_name) from customer;
One construct that Impala SQL does not support is using DISTINCT in more than one aggregation function in the same query. For example, you could not have a single query with both COUNT(DISTINCT c_first_name) and COUNT(DISTINCT c_last_name) in the SELECT list.
Zero-length strings: For purposes of clauses such as DISTINCT and GROUP BY, Impala considers zero-length strings (""), NULL, and space to all be different values.