DECIMAL Data Type (CDH 5.1 or higher only)

A numeric data type with fixed scale and precision, used in CREATE TABLE and ALTER TABLE statements. Suitable for financial and other arithmetic calculations where the imprecise representation and rounding behavior of FLOAT and DOUBLE make those types impractical.

Syntax:

In the column definition of a CREATE TABLE statement:

column_name DECIMAL[(precision[,scale])]

DECIMAL with no precision or scale values is equivalent to DECIMAL(9,0).

Precision and Scale:

precision represents the total number of digits that can be represented by the column, regardless of the location of the decimal point. This value must be between 1 and 38. For example, representing integer values up to 9999, and floating-point values up to 99.99, both require a precision of 4. You can also represent corresponding negative values, without any change in the precision. For example, the range -9999 to 9999 still only requires a precision of 4.

scale represents the number of fractional digits. This value must be less than or equal to precision. A scale of 0 produces integral values, with no fractional part. If precision and scale are equal, all the digits come after the decimal point, making all the values between 0 and 0.999... or 0 and -0.999...

When precision and scale are omitted, a DECIMAL value is treated as DECIMAL(9,0), that is, an integer value ranging from -999,999,999 to 999,999,999. This is the largest DECIMAL value that can still be represented in 4 bytes. If precision is specified but scale is omitted, Impala uses a value of zero for the scale.

Both precision and scale must be specified as integer literals, not any other kind of constant expressions.

To check the precision or scale for arbitrary values, you can call the precision() and scale() built-in functions. For example, you might use these values to figure out how many characters are required for various fields in a report, or to understand the rounding characteristics of a formula as applied to a particular DECIMAL column.

Range:

The maximum precision value is 38. Thus, the largest integral value is represented by DECIMAL(38,0) (999... with 9 repeated 38 times). The most precise fractional value (between 0 and 1, or 0 and -1) is represented by DECIMAL(38,38), with 38 digits to the right of the decimal point. The value closest to 0 would be .0000...1 (37 zeros and the final 1). The value closest to 1 would be .999... (9 repeated 38 times).

For a given precision and scale, the range of DECIMAL values is the same in the positive and negative directions. For example, DECIMAL(4,2) can represent from -99.99 to 99.99. This is different from other integral numeric types where the positive and negative bounds differ slightly.

When you use DECIMAL values in arithmetic expressions, the precision and scale of the result value are determined as follows:

  • For addition and subtraction, the precision and scale are based on the maximum possible result, that is, if all the digits of the input values were 9s and the absolute values were added together.

  • For multiplication, the precision is the sum of the precisions of the input values. The scale is the sum of the scales of the input values.

  • For division, Impala sets the precision and scale to values large enough to represent the whole and fractional parts of the result.

  • For UNION, the scale is the larger of the scales of the input values, and the precision is increased if necessary to accommodate any additional fractional digits. If the same input value has the largest precision and the largest scale, the result value has the same precision and scale. If one value has a larger precision but smaller scale, the scale of the result value is increased. For example, DECIMAL(20,2) UNION DECIMAL(8,6) produces a result of type DECIMAL(24,6). The extra 4 fractional digits of scale (6-2) are accommodated by extending the precision by the same amount (20+4).

  • To doublecheck, you can always call the PRECISION() and SCALE() functions on the results of an arithmetic expression to see the relevant values, or use a CREATE TABLE AS SELECT statement to define a column based on the return type of the expression.

Compatibility:

  • Using the DECIMAL type is only supported under CDH 5.1.0 and higher.
  • Use the DECIMAL data type in Impala for applications where you used the NUMBER data type in Oracle. The Impala DECIMAL type does not support the Oracle idioms of * for scale or negative values for precision.

Conversions and casting:

Casting an integer or floating-point value N to TIMESTAMP produces a value that is N seconds past the start of the epoch date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone. If the setting --use_local_tz_for_unix_timestamp_conversions=true is in effect, the resulting TIMESTAMP represents a date and time in the local time zone.

Impala automatically converts between DECIMAL and other numeric types where possible. A DECIMAL with zero scale is converted to or from the smallest appropriate integral type. A DECIMAL with a fractional part is automatically converted to or from the smallest appropriate floating-point type. If the destination type does not have sufficient precision or scale to hold all possible values of the source type, Impala raises an error and does not convert the value.

For example, these statements show how expressions of DECIMAL and other types are reconciled to the same type in the context of UNION queries and INSERT statements:

[localhost:21000] > select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
+----------------+
| x              |
+----------------+
| 1.5000         |
| 1.0000         |
+----------------+
[localhost:21000] > create table int_vs_decimal as select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
+-------------------+
| summary           |
+-------------------+
| Inserted 2 row(s) |
+-------------------+
[localhost:21000] > desc int_vs_decimal;
+------+---------------+---------+
| name | type          | comment |
+------+---------------+---------+
| x    | decimal(14,4) |         |
+------+---------------+---------+

To avoid potential conversion errors, you can use CAST() to convert DECIMAL values to FLOAT, TINYINT, SMALLINT, INT, BIGINT, STRING, TIMESTAMP, or BOOLEAN. You can use exponential notation in DECIMAL literals or when casting from STRING, for example 1.0e6 to represent one million.

If you cast a value with more fractional digits than the scale of the destination type, any extra fractional digits are truncated (not rounded). Casting a value to a target type with not enough precision produces a result of NULL and displays a runtime warning.

[localhost:21000] > select cast(1.239 as decimal(3,2));
+-----------------------------+
| cast(1.239 as decimal(3,2)) |
+-----------------------------+
| 1.23                        |
+-----------------------------+
[localhost:21000] > select cast(1234 as decimal(3));
+----------------------------+
| cast(1234 as decimal(3,0)) |
+----------------------------+
| NULL                       |
+----------------------------+
WARNINGS: Expression overflowed, returning NULL

When you specify integer literals, for example in INSERT ... VALUES statements or arithmetic expressions, those numbers are interpreted as the smallest applicable integer type. You must use CAST() calls for some combinations of integer literals and DECIMAL precision. For example, INT has a maximum value that is 10 digits long, TINYINT has a maximum value that is 3 digits long, and so on. If you specify a value such as 123456 to go into a DECIMAL column, Impala checks if the column has enough precision to represent the largest value of that integer type, and raises an error if not. Therefore, use an expression like CAST(123456 TO DECIMAL(9,0)) for DECIMAL columns with precision 9 or less, CAST(50 TO DECIMAL(2,0)) for DECIMAL columns with precision 2 or less, and so on. For DECIMAL columns with precision 10 or greater, Impala automatically interprets the value as the correct DECIMAL type; however, because DECIMAL(10) requires 8 bytes of storage while DECIMAL(9) requires only 4 bytes, only use precision of 10 or higher when actually needed.

[localhost:21000] > create table decimals_9_0 (x decimal);
[localhost:21000] > insert into decimals_9_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);
ERROR: AnalysisException: Possible loss of precision for target table 'decimal_testing.decimals_9_0'.
Expression '1' (type: INT) would need to be cast to DECIMAL(9,0) for column 'x'
[localhost:21000] > insert into decimals_9_0 values (cast(1 as decimal)), (cast(2 as decimal)), (cast(4 as decimal)), (cast(8 as decimal)), (cast(16 as decimal)), (cast(1024 as decimal)), (cast(32768 as decimal)), (cast(65536 as decimal)), (cast(1000000 as decimal));

[localhost:21000] > create table decimals_10_0 (x decimal(10,0));
[localhost:21000] > insert into decimals_10_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);

Be aware that in memory and for binary file formats such as Parquet or Avro, DECIMAL(10) or higher consumes 8 bytes while DECIMAL(9) (the default for DECIMAL) or lower consumes 4 bytes. Therefore, to conserve space in large tables, use the smallest-precision DECIMAL type that is appropriate and CAST() literal values where necessary, rather than declaring DECIMAL columns with high precision for convenience.

To represent a very large or precise DECIMAL value as a literal, for example one that contains more digits than can be represented by a BIGINT literal, use a quoted string or a floating-point value for the number, and CAST() to the desired DECIMAL type:

insert into decimals_38_5 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000),
  (cast("999999999999999999999999999999" as decimal(38,5))),
  (cast(999999999999999999999999999999. as decimal(38,5)));
  • The result of an aggregate function such as MAX(), SUM(), or AVG() on DECIMAL values is promoted to a scale of 38, with the same precision as the underlying column. Thus, the result can represent the largest possible value at that particular precision.

  • STRING columns, literals, or expressions can be converted to DECIMAL as long as the overall number of digits and digits to the right of the decimal point fit within the specified precision and scale for the declared DECIMAL type. By default, a DECIMAL value with no specified scale or precision can hold a maximum of 9 digits of an integer value. If there are more digits in the string value than are allowed by the DECIMAL scale and precision, the result is NULL.

    The following examples demonstrate how STRING values with integer and fractional parts are represented when converted to DECIMAL. If the scale is 0, the number is treated as an integer value with a maximum of precision digits. If the precision is greater than 0, the scale must be increased to account for the digits both to the left and right of the decimal point. As the precision increases, output values are printed with additional trailing zeros after the decimal point if needed. Any trailing zeros after the decimal point in the STRING value must fit within the number of digits specified by the precision.

    [localhost:21000] > select cast('100' as decimal); -- Small integer value fits within 9 digits of scale.
    +-----------------------------+
    | cast('100' as decimal(9,0)) |
    +-----------------------------+
    | 100                         |
    +-----------------------------+
    [localhost:21000] > select cast('100' as decimal(3,0)); -- Small integer value fits within 3 digits of scale.
    +-----------------------------+
    | cast('100' as decimal(3,0)) |
    +-----------------------------+
    | 100                         |
    +-----------------------------+
    [localhost:21000] > select cast('100' as decimal(2,0)); -- 2 digits of scale is not enough!
    +-----------------------------+
    | cast('100' as decimal(2,0)) |
    +-----------------------------+
    | NULL                        |
    +-----------------------------+
    [localhost:21000] > select cast('100' as decimal(3,1)); -- (3,1) = 2 digits left of the decimal point, 1 to the right. Not enough.
    +-----------------------------+
    | cast('100' as decimal(3,1)) |
    +-----------------------------+
    | NULL                        |
    +-----------------------------+
    [localhost:21000] > select cast('100' as decimal(4,1)); -- 4 digits total, 1 to the right of the decimal point.
    +-----------------------------+
    | cast('100' as decimal(4,1)) |
    +-----------------------------+
    | 100.0                       |
    +-----------------------------+
    [localhost:21000] > select cast('98.6' as decimal(3,1)); -- (3,1) can hold a 3 digit number with 1 fractional digit.
    +------------------------------+
    | cast('98.6' as decimal(3,1)) |
    +------------------------------+
    | 98.6                         |
    +------------------------------+
    [localhost:21000] > select cast('98.6' as decimal(15,1)); -- Larger scale allows bigger numbers but still only 1 fractional digit.
    +-------------------------------+
    | cast('98.6' as decimal(15,1)) |
    +-------------------------------+
    | 98.6                          |
    +-------------------------------+
    [localhost:21000] > select cast('98.6' as decimal(15,5)); -- Larger precision allows more fractional digits, outputs trailing zeros.
    +-------------------------------+
    | cast('98.6' as decimal(15,5)) |
    +-------------------------------+
    | 98.60000                      |
    +-------------------------------+
    [localhost:21000] > select cast('98.60000' as decimal(15,1)); -- Trailing zeros in the string must fit within 'scale' digits (1 in this case).
    +-----------------------------------+
    | cast('98.60000' as decimal(15,1)) |
    +-----------------------------------+
    | NULL                              |
    +-----------------------------------+
    
    
  • Most built-in arithmetic functions such as SIN() and COS() continue to accept only DOUBLE values because they are so commonly used in scientific context for calculations of IEEE 754-compliant values. The built-in functions that accept and return DECIMAL are:
    • ABS()
    • CEIL()
    • COALESCE()
    • FLOOR()
    • FNV_HASH()
    • GREATEST()
    • IF()
    • ISNULL()
    • LEAST()
    • NEGATIVE()
    • NULLIF()
    • POSITIVE()
    • PRECISION()
    • ROUND()
    • SCALE()
    • TRUNCATE()
    • ZEROIFNULL()
    See Impala Built-In Functions for details.
  • BIGINT, INT, SMALLINT, and TINYINT values can all be cast to DECIMAL. The number of digits to the left of the decimal point in the DECIMAL type must be sufficient to hold the largest value of the corresponding integer type. Note that integer literals are treated as the smallest appropriate integer type, meaning there is sometimes a range of values that require one more digit of DECIMAL scale than you might expect. For integer values, the precision of the DECIMAL type can be zero; if the precision is greater than zero, remember to increase the scale value by an equivalent amount to hold the required number of digits to the left of the decimal point.

    The following examples show how different integer types are converted to DECIMAL.

    [localhost:21000] > select cast(1 as decimal(1,0));
    +-------------------------+
    | cast(1 as decimal(1,0)) |
    +-------------------------+
    | 1                       |
    +-------------------------+
    [localhost:21000] > select cast(9 as decimal(1,0));
    +-------------------------+
    | cast(9 as decimal(1,0)) |
    +-------------------------+
    | 9                       |
    +-------------------------+
    [localhost:21000] > select cast(10 as decimal(1,0));
    +--------------------------+
    | cast(10 as decimal(1,0)) |
    +--------------------------+
    | 10                       |
    +--------------------------+
    [localhost:21000] > select cast(10 as decimal(1,1));
    +--------------------------+
    | cast(10 as decimal(1,1)) |
    +--------------------------+
    | 10.0                     |
    +--------------------------+
    [localhost:21000] > select cast(100 as decimal(1,1));
    +---------------------------+
    | cast(100 as decimal(1,1)) |
    +---------------------------+
    | 100.0                     |
    +---------------------------+
    [localhost:21000] > select cast(1000 as decimal(1,1));
    +----------------------------+
    | cast(1000 as decimal(1,1)) |
    +----------------------------+
    | 1000.0                     |
    +----------------------------+
    
    
  • When a DECIMAL value is converted to any of the integer types, any fractional part is truncated (that is, rounded towards zero):

    [localhost:21000] > create table num_dec_days (x decimal(4,1));
    [localhost:21000] > insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1)));
    [localhost:21000] > insert into num_dec_days values (cast(0.1 as decimal(4,1))), (cast(.9 as decimal(4,1))), (cast(9.1 as decimal(4,1))), (cast(9.9 as decimal(4,1)));
    [localhost:21000] > select cast(x as int) from num_dec_days;
    +----------------+
    | cast(x as int) |
    +----------------+
    | 1              |
    | 2              |
    | 4              |
    | 0              |
    | 0              |
    | 9              |
    | 9              |
    +----------------+
    
    
  • You cannot directly cast TIMESTAMP or BOOLEAN values to or from DECIMAL values. You can turn a DECIMAL value into a time-related representation using a two-step process, by converting it to an integer value and then using that result in a call to a date and time function such as from_unixtime().

    [localhost:21000] > select from_unixtime(cast(cast(1000.0 as decimal) as bigint));
    +-------------------------------------------------------------+
    | from_unixtime(cast(cast(1000.0 as decimal(9,0)) as bigint)) |
    +-------------------------------------------------------------+
    | 1970-01-01 00:16:40                                         |
    +-------------------------------------------------------------+
    [localhost:21000] > select now() + interval cast(x as int) days from num_dec_days; -- x is a DECIMAL column.
    
    [localhost:21000] > create table num_dec_days (x decimal(4,1));
    [localhost:21000] > insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1)));
    [localhost:21000] > select now() + interval cast(x as int) days from num_dec_days; -- The 4.5 value is truncated to 4 and becomes '4 days'.
    +--------------------------------------+
    | now() + interval cast(x as int) days |
    +--------------------------------------+
    | 2014-05-13 23:11:55.163284000        |
    | 2014-05-14 23:11:55.163284000        |
    | 2014-05-16 23:11:55.163284000        |
    +--------------------------------------+
    
    
  • Because values in INSERT statements are checked rigorously for type compatibility, be prepared to use CAST() function calls around literals, column references, or other expressions that you are inserting into a DECIMAL column.

NULL considerations: Casting any non-numeric value to this type produces a NULL value.

DECIMAL differences from integer and floating-point types:

With the DECIMAL type, you are concerned with the number of overall digits of a number rather than powers of 2 (as in TINYINT, SMALLINT, and so on). Therefore, the limits with integral values of DECIMAL types fall around 99, 999, 9999, and so on rather than 32767, 65535, 2 32 -1, and so on. For fractional values, you do not need to account for imprecise representation of the fractional part according to the IEEE-954 standard (as in FLOAT and DOUBLE). Therefore, when you insert a fractional value into a DECIMAL column, you can compare, sum, query, GROUP BY, and so on that column and get back the original values rather than some "close but not identical" value.

FLOAT and DOUBLE can cause problems or unexpected behavior due to inability to precisely represent certain fractional values, for example dollar and cents values for currency. You might find output values slightly different than you inserted, equality tests that do not match precisely, or unexpected values for GROUP BY columns. DECIMAL can help reduce unexpected behavior and rounding errors, at the expense of some performance overhead for assignments and comparisons.

Literals and expressions:
  • When you use an integer literal such as 1 or 999 in a SQL statement, depending on the context, Impala will treat it as either the smallest appropriate DECIMAL type, or the smallest integer type (TINYINT, SMALLINT, INT, or BIGINT). To minimize memory usage, Impala prefers to treat the literal as the smallest appropriate integer type.

  • When you use a floating-point literal such as 1.1 or 999.44 in a SQL statement, depending on the context, Impala will treat it as either the smallest appropriate DECIMAL type, or the smallest floating-point type (FLOAT or DOUBLE). To avoid loss of accuracy, Impala prefers to treat the literal as a DECIMAL.

Storage considerations:

  • Only the precision determines the storage size for DECIMAL values; the scale setting has no effect on the storage size.
  • Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these text-based file formats, leading zeros are not stored, but trailing zeros are stored. In these tables, each DECIMAL value takes up as many bytes as there are digits in the value, plus an extra byte if the decimal point is present and an extra byte for negative values. Once the values are loaded into memory, they are represented in 4, 8, or 16 bytes as described in the following list items. The on-disk representation varies depending on the file format of the table.
  • Parquet and Avro tables use binary formats, In these tables, Impala stores each value in as few bytes as possible depending on the precision specified for the DECIMAL column.
    • In memory, DECIMAL values with precision of 9 or less are stored in 4 bytes.
    • In memory, DECIMAL values with precision of 10 through 18 are stored in 8 bytes.
    • In memory, DECIMAL values with precision greater than 18 are stored in 16 bytes.

File format considerations:

  • The DECIMAL data type can be stored in any of the file formats supported by Impala, as described in How Impala Works with Hadoop File Formats. Impala only writes to tables that use the Parquet and text formats, so those formats are the focus for file format compatibility.
  • Impala can query Avro, RCFile, or SequenceFile tables containing DECIMAL columns, created by other Hadoop components, on CDH 5 only.
  • You can use DECIMAL columns in Impala tables that are mapped to HBase tables. Impala can query and insert into such tables.
  • Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these tables, each DECIMAL value takes up as many bytes as there are digits in the value, plus an extra byte if the decimal point is present. The binary format of Parquet or Avro files offers more compact storage for DECIMAL columns.
  • Parquet and Avro tables use binary formats, In these tables, Impala stores each value in 4, 8, or 16 bytes depending on the precision specified for the DECIMAL column.

UDF considerations: When writing a C++ UDF, use the DecimalVal data type defined in /usr/include/impala_udf/udf.h.

Partitioning:

You can use a DECIMAL column as a partition key. Doing so provides a better match between the partition key values and the HDFS directory names than using a DOUBLE or FLOAT partitioning column:

Schema evolution considerations:

  • For text-based formats (text, RCFile, and SequenceFile tables), you can issue an ALTER TABLE ... REPLACE COLUMNS statement to change the precision and scale of an existing DECIMAL column. As long as the values in the column fit within the new precision and scale, they are returned correctly by a query. Any values that do not fit within the new precision and scale are returned as NULL, and Impala reports the conversion error. Leading zeros do not count against the precision value, but trailing zeros after the decimal point do.
    [localhost:21000] > create table text_decimals (x string);
    [localhost:21000] > insert into text_decimals values ("1"), ("2"), ("99.99"), ("1.234"), ("000001"), ("1.000000000");
    [localhost:21000] > select * from text_decimals;
    +-------------+
    | x           |
    +-------------+
    | 1           |
    | 2           |
    | 99.99       |
    | 1.234       |
    | 000001      |
    | 1.000000000 |
    +-------------+
    [localhost:21000] > alter table text_decimals replace columns (x decimal(4,2));
    [localhost:21000] > select * from text_decimals;
    +-------+
    | x     |
    +-------+
    | 1.00  |
    | 2.00  |
    | 99.99 |
    | NULL  |
    | 1.00  |
    | NULL  |
    +-------+
    ERRORS:
    Backend 0:Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.234)
    file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/634d4bd3aa0
    e8420-b4b13bab7f1be787_56794587_data.0
    record: 1.234
    Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.000000000)
    file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/cd40dc68e20
    c565a-cc4bd86c724c96ba_311873428_data.0
    record: 1.000000000
    
    
  • For binary formats (Parquet and Avro tables), although an ALTER TABLE ... REPLACE COLUMNS statement that changes the precision or scale of a DECIMAL column succeeds, any subsequent attempt to query the changed column results in a fatal error. (The other columns can still be queried successfully.) This is because the metadata about the columns is stored in the data files themselves, and ALTER TABLE does not actually make any updates to the data files. If the metadata in the data files disagrees with the metadata in the metastore database, Impala cancels the query.

Examples:

CREATE TABLE t1 (x DECIMAL, y DECIMAL(5,2), z DECIMAL(25,0));
INSERT INTO t1 VALUES (5, 99.44, 123456), (300, 6.7, 999999999);
SELECT x+y, ROUND(y,1), z/98.6 FROM t1;
SELECT CAST(1000.5 AS DECIMAL);

Restrictions:

Currently, the COMPUTE STATS statement under CDH 4 does not store any statistics for DECIMAL columns. When Impala runs under CDH 5, which has better support for DECIMAL in the metastore database, COMPUTE STATS does collect statistics for DECIMAL columns and Impala uses the statistics to optimize query performance.

HBase considerations: This data type is fully compatible with HBase tables.

Parquet considerations: This type is fully compatible with Parquet tables.

Text table considerations: Values of this type are potentially larger in text tables than in tables using Parquet or other binary formats.

Column statistics considerations: Because this type has a fixed size, the maximum and average size fields are always filled in for column statistics, even before you run the COMPUTE STATS statement.

Related information:

Numeric Literals, TINYINT Data Type, SMALLINT Data Type, INT Data Type, BIGINT Data Type, DECIMAL Data Type (CDH 5.1 or higher only), Impala Mathematical Functions (especially PRECISION() and SCALE())