# VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions

An aggregate function that returns the variance of a set of numbers. This is a mathematical property that signifies how far the values spread apart from the mean. The return value can be zero (if the input is a single value, or a set of identical values), or a positive number otherwise.

**Syntax:**

{ VARIANCE | VAR[IANCE]_SAMP | VAR[IANCE]_POP } ([DISTINCT | ALL]expression)

This function works with any numeric data type.

**Return type:** `DOUBLE` in Impala 2.0 and higher; `STRING` in earlier releases

This function is typically used in mathematical formulas related to probability distributions.

The
`VARIANCE_SAMP()`
and
`VARIANCE_POP()`
functions
compute the
sample variance
and
population variance, respectively,
of the input values.
(`VARIANCE()` is an alias for `VARIANCE_SAMP()`.)
Both functions evaluate all input rows matched by the query.
The difference is that `STDDEV_SAMP()`
is scaled by `1/(N-1)`
while `STDDEV_POP()`
is scaled by `1/N`.

The functions `VAR_SAMP()` and `VAR_POP()` are the same
as `VARIANCE_SAMP()` and `VARIANCE_POP()`, respectively.
These aliases are available in Impala 2.0 and later.

If no input rows match the query, the result of any of these functions is `NULL`.
If a single input row matches the query, the result of any of these functions is `"0.0"`.

**Examples:**

This example demonstrates how `VARIANCE()` and
`VARIANCE_SAMP()` return the same result,
while `VARIANCE_POP()` uses a slightly
different calculation to reflect that the input
data is considered part of a larger

[localhost:21000] > select variance(score) from test_scores; +-----------------+ | variance(score) | +-----------------+ | 812.25 | +-----------------+ [localhost:21000] > select variance_samp(score) from test_scores; +----------------------+ | variance_samp(score) | +----------------------+ | 812.25 | +----------------------+ [localhost:21000] > select variance_pop(score) from test_scores; +---------------------+ | variance_pop(score) | +---------------------+ | 811.438 | +---------------------+

This example demonstrates that, because the return value of these
aggregate functions is a `STRING`,
you convert the result with `CAST`
if you need to do further calculations as a numeric value.

[localhost:21000] > create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores; +-------------------+ | summary | +-------------------+ | Inserted 1 row(s) | +-------------------+ [localhost:21000] > desc score_stats; +--------------------+--------------+---------+ | name | type | comment | +--------------------+--------------+---------+ | standard_deviation | decimal(7,4) | | | variance | decimal(7,4) | | +--------------------+--------------+---------+

**Restrictions:**
This function cannot be used as an analytic function; it does not currently support the `OVER()` clause.

The `STDDEV()`, `STDDEV_POP()`, and `STDDEV_SAMP()`
functions compute the standard deviation (square root of the variance)
based on the results of `VARIANCE()`,
`VARIANCE_POP()`, and `VARIANCE_SAMP()` respectively.
See STDDEV, STDDEV_SAMP, STDDEV_POP Functions for details
about the standard deviation property.

<< | ||

Terms and Conditions Privacy Policy |