Impala Type Conversion Functions
Conversion functions are usually used in combination with other functions, to explicitly pass the expected data types. Impala has strict rules regarding data types for function parameters. For example, Impala does not automatically convert a DOUBLE value to FLOAT, a BIGINT value to INT, or other conversion where precision could be lost or overflow could occur. Also, for reporting or dealing with loosely defined schemas in big data contexts, you might frequently need to convert values to or from the STRING type.
Impala supports the following type conversion functions:
- CAST(expr AS type)
- Purpose: Converts the value of an expression to any other type. If the expression value is of a type that cannot be converted to the target type, the
result is NULL.
Usage notes: Use CAST when passing a column value or literal to a function that expects a parameter with a different type. Frequently used in SQL operations such as CREATE TABLE AS SELECT and INSERT ... VALUES to ensure that values from various sources are of the appropriate type for the destination columns. Where practical, do a one-time CAST() operation during the ingestion process to make each column into the appropriate type, rather than using many CAST() operations in each query; doing type conversions for each row during each query can be expensive for tables with millions or billions of rows.
The way this function deals with time zones when converting to or from TIMESTAMP values is affected by the --use_local_tz_for_unix_timestamp_conversions startup flag for the impalad daemon. See TIMESTAMP Data Type for details about how Impala handles time zone considerations for the TIMESTAMP data type.
SELECT CONCAT('Here are the first ',10,' results.'); -- Fails SELECT CONCAT('Here are the first ',CAST(10 AS STRING),' results.'); -- Succeeds
The following example starts with a text table where every column has a type of STRING, which might be how you ingest data of unknown schema until you can verify the cleanliness of the underly values. Then it uses CAST() to create a new Parquet table with the same data, but using specific numeric data types for the columns with numeric data. Using numeric types of appropriate sizes can result in substantial space savings on disk and in memory, and performance improvements in queries, over using strings or larger-than-necessary numeric types.
CREATE TABLE t1 (name STRING, x STRING, y STRING, z STRING); CREATE TABLE t2 STORED AS PARQUET AS SELECT name, CAST(x AS BIGINT) x, CAST(y AS TIMESTAMP) y, CAST(z AS SMALLINT) z FROM t1;
For details of casts from each kind of data type, see the description of the appropriate type: TINYINT Data Type, SMALLINT Data Type, INT Data Type, BIGINT Data Type, FLOAT Data Type, DOUBLE Data Type, DECIMAL Data Type (CDH 6.0 / Impala 3.0 or higher only), STRING Data Type, CHAR Data Type (CDH 5.2 or higher only), VARCHAR Data Type (CDH 5.2 or higher only), TIMESTAMP Data Type, BOOLEAN Data Type
- TYPEOF(type value)
- Purpose: Returns the name of the data type corresponding to an expression. For types with extra attributes, such as length for CHAR and VARCHAR, or precision and scale for DECIMAL, includes the full specification of the type.
Return type: STRING
Usage notes: Typically used in interactive exploration of a schema, or in application code that programmatically generates schema definitions such as CREATE TABLE statements, for example, to get the type of an expression such as col1 / col2 or CONCAT(col1, col2, col3). This function is especially useful for arithmetic expressions involving DECIMAL types because the precision and scale of the result is can be different than that of the operands.
Added in: CDH 5.5.0 / Impala 2.3.0
SELECT TYPEOF(2), TYPEOF(2+2); +-----------+---------------+ | typeof(2) | typeof(2 + 2) | +-----------+---------------+ | TINYINT | SMALLINT | +-----------+---------------+