This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

NDV Function

An aggregate function that returns an approximate value similar to the result of COUNT(DISTINCT col), the "number of distinct values". It is much faster than the combination of COUNT and DISTINCT, and uses a constant amount of memory and thus is less memory-intensive for columns with high cardinality.

This is the mechanism used internally by the COMPUTE STATS statement for computing the number of distinct values in a column.

Usage notes:

Because this number is an estimate, it might not reflect the precise number of different values in the column, especially if the cardinality is very low or very high. If the estimated number is higher than the number of rows in the table, Impala adjusts the value internally during query planning.

Currently, the return value is always a STRING. The return type is subject to change in future releases. Always use CAST() to convert the result to whichever data type is appropriate for your computations.

Page generated September 3, 2015.