This is the documentation for CDH 5.0.x. Documentation for other versions is available at Cloudera Documentation.

Data Types

Continue reading:

For the notation to write literals of each of these data types, see Literals.

See SQL Differences Between Impala and Hive for differences between Impala and Hive data types.

BIGINT Data Type

An 8-byte integer data type used in CREATE TABLE and ALTER TABLE statements.

Range: -9223372036854775808 .. 9223372036854775807. There is no UNSIGNED subtype.

Conversions: Impala automatically converts to a floating-point type (FLOAT or DOUBLE) automatically. Use CAST() to convert to TINYINT, SMALLINT, INT, STRING, or TIMESTAMP. Casting an integer value N to TIMESTAMP produces a value that is N seconds past the start of the epoch date (January 1, 1970).

Related information: Literals, INT Data Type, SMALLINT Data Type, TINYINT Data Type, Mathematical Functions

BOOLEAN Data Type

A data type used in CREATE TABLE and ALTER TABLE statements, representing a single true/false choice.

Range: TRUE or FALSE. Do not use quotation marks around the TRUE and FALSE literal values. You can write the literal values in uppercase, lowercase, or mixed case. The values queried from a table are always returned in lowercase, true or false.

Conversions: Impala does not automatically convert any other type to BOOLEAN. You can use CAST() to convert any integer or float-point type to BOOLEAN: a value of 0 represents false, and any non-zero value is converted to true. You cannot cast a STRING value to BOOLEAN, although you can cast a BOOLEAN value to STRING, returning '1' for true values and '0' for false values.

Related information: Literals, Conditional Functions

DOUBLE Data Type

An 8-byte (double precision) floating-point data type used in CREATE TABLE and ALTER TABLE statements.

Range: 4.94065645841246544e-324d .. 1.79769313486231570e+308, positive or negative

Conversions: Impala does not automatically convert DOUBLE to any other type. You can use CAST() to convert DOUBLE values to FLOAT, TINYINT, SMALLINT, INT, BIGINT, STRING, TIMESTAMP, or BOOLEAN. You can use exponential notation in DOUBLE literals or when casting from STRING, for example 1.0e6 to represent one million.

The data type REAL is an alias for DOUBLE.

Related information: Literals, Mathematical Functions

FLOAT Data Type

A 4-byte (single precision) floating-point data type used in CREATE TABLE and ALTER TABLE statements.

Range: 1.40129846432481707e-45 .. 3.40282346638528860e+38, positive or negative

Conversions: Impala automatically converts FLOAT to more precise DOUBLE values, but not the other way around. You can use CAST() to convert FLOAT values to TINYINT, SMALLINT, INT, BIGINT, STRING, TIMESTAMP, or BOOLEAN. You can use exponential notation in FLOAT literals or when casting from STRING, for example 1.0e6 to represent one million.

Related information: Literals, Mathematical Functions

INT Data Type

A 4-byte integer data type used in CREATE TABLE and ALTER TABLE statements.

Range: -2147483648 .. 2147483647. There is no UNSIGNED subtype.

Conversions: Impala automatically converts to a larger integer type (BIGINT) or a floating-point type (FLOAT or DOUBLE) automatically. Use CAST() to convert to TINYINT, SMALLINT, STRING, or TIMESTAMP. Casting an integer value N to TIMESTAMP produces a value that is N seconds past the start of the epoch date (January 1, 1970).

The data type INTEGER is an alias for INT.

Related information: Literals, TINYINT Data Type, BIGINT Data Type, SMALLINT Data Type, TINYINT Data Type, Mathematical Functions

REAL Data Type

An alias for the DOUBLE data type. See DOUBLE Data Type for details.

Examples:

These examples show how you can use the type names REAL and DOUBLE interchangeably, and behind the scenes Impala treats them always as DOUBLE.

[localhost:21000] > create table r1 (x real);
[localhost:21000] > describe r1;
+------+--------+---------+
| name | type   | comment |
+------+--------+---------+
| x    | double |         |
+------+--------+---------+
[localhost:21000] > insert into r1 values (1.5), (cast (2.2 as double));
[localhost:21000] > select cast (1e6 as real);
+---------------------------+
| cast(1000000.0 as double) |
+---------------------------+
| 1000000                   |
+---------------------------+

SMALLINT Data Type

A 2-byte integer data type used in CREATE TABLE and ALTER TABLE statements.

Range: -32768 .. 32767. There is no UNSIGNED subtype.

Conversions: Impala automatically converts to a larger integer type (INT or BIGINT) or a floating-point type (FLOAT or DOUBLE) automatically. Use CAST() to convert to TINYINT, STRING, or TIMESTAMP. Casting an integer value N to TIMESTAMP produces a value that is N seconds past the start of the epoch date (January 1, 1970).

Related information: Literals, TINYINT Data Type, BIGINT Data Type, TINYINT Data Type, INT Data Type, Mathematical Functions

STRING Data Type

A data type used in CREATE TABLE and ALTER TABLE statements.

Length: 32,767 bytes. (Strictly speaking, the maximum length corresponds to the C/C++ constant INT_MAX, which is 32,767 for typical Linux systems.) Do not use any length constraint when declaring STRING columns, as you might be familiar with from VARCHAR, CHAR, or similar column types from relational database systems.

Character sets: For full support in all Impala subsystems, restrict string values to the ASCII character set. UTF-8 character data can be stored in Impala and retrieved through queries, but UTF-8 strings containing non-ASCII characters are not guaranteed to work properly with string manipulation functions, comparison operators, or the ORDER BY clause. For any national language aspects such as collation order or interpreting extended ASCII variants such as ISO-8859-1 or ISO-8859-2 encodings, Impala does not include such metadata with the table definition. If you need to sort, manipulate, or display data depending on those national language characteristics of string data, use logic on the application side.

Conversions:

  • Impala does not automatically convert STRING to any numeric type. Impala does automatically convert STRING to TIMESTAMP if the value matches one of the accepted TIMESTAMP formats; see TIMESTAMP Data Type for details.

  • You can use CAST() to convert STRING values to TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, or TIMESTAMP.

  • You cannot directly cast a STRING value to BOOLEAN. You can use a CASE expression to evaluate string values such as 'T', 'true', and so on and return Boolean true and false values as appropriate.

  • You can cast a BOOLEAN value to STRING, returning '1' for true values and '0' for false values.

Related information: Literals, String Functions, Date and Time Functions

TIMESTAMP Data Type

A data type used in CREATE TABLE and ALTER TABLE statements, representing a point in time.

Range: Internally, the resolution of the time portion of a TIMESTAMP value is in nanoseconds.

INTERVAL expressions:

You can perform date arithmetic by adding or subtracting a specified number of time units, using the INTERVAL keyword and the + and - operators or date_add() and date_sub() functions. You can specify units as YEAR[S], MONTH[S], WEEK[S], DAY[S], HOUR[S], MINUTE[S], SECOND[S], MILLISECOND[S], MICROSECOND[S], and NANOSECOND[S]. You can only specify one time unit in each interval expression, for example INTERVAL 3 DAYS or INTERVAL 25 HOURS, but you can produce any granularity by adding together successive INTERVAL values, such as timestamp_value + INTERVAL 3 WEEKS - INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS.

For example:

select now() + interval 1 day;
select date_sub(now(), interval 5 minutes);
insert into auction_details
  select auction_id, auction_start_time, auction_start_time + interval 2 days + interval 12 hours
  from new_auctions;

Time zones: Impala does not store timestamps using the local timezone to avoid undesired results from unexpected time zone issues. Timestamps are stored relative to GMT.

Conversions: Impala automatically converts STRING literals of the correct format into TIMESTAMP values. Timestamp values are accepted in the format YYYY-MM-DD HH:MM:SS.sssssssss, and can consist of just the date, or just the time, with or without the fractional second portion. For example, you can specify TIMESTAMP values such as '1966-07-30', '08:30:00', or '1985-09-25 17:45:30.005'. You can cast an integer or floating-point value N to TIMESTAMP, producing a value that is N seconds past the start of the epoch date (January 1, 1970).

  Note: In Impala 1.3 and higher, the FROM_UNIXTIME() and UNIX_TIMESTAMP() functions allow a wider range of format strings, with more flexibility in element order, repetition of letter placeholders, and separator characters. See Date and Time Functions for details.

Partitioning:

Although you cannot use a TIMESTAMP column as a partition key, you can extract the individual years, months, days, hours, and so on and partition based on those columns. Because the partition key column values are represented in HDFS directory names, rather than as fields in the data files themselves, you can also keep the original TIMESTAMP values if desired, without duplicating data or wasting storage space. See Partition Key Columns for more details on partitioning with date and time values.

Examples:

select cast('1966-07-30' as timestamp);
select cast('1985-09-25 17:45:30.005' as timestamp);
select cast('08:30:00' as timestamp);
select hour('1970-01-01 15:30:00');         -- Succeeds, returns 15.
select hour('1970-01-01 15:30');            -- Returns NULL because seconds field required.
select hour('1970-01-01 27:30:00');         -- Returns NULL because hour value out of range.
select dayofweek('2004-06-13');             -- Returns 1, representing Sunday.
select dayname('2004-06-13');               -- Returns 'Sunday'.
select date_add('2004-06-13', 365);         -- Returns 2005-06-13 with zeros for hh:mm:ss fields.
select day('2004-06-13');                   -- Returns 13.
select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 dates?
select now();                               -- Returns current date and time in UTC timezone.

create table dates_and_times (t timestamp);
insert into dates_and_times values
  ('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now());

Related information: Literals; to convert to or from different date formats, or perform date arithmetic, use the date and time functions described in Date and Time Functions. In particular, the from_unixtime() function requires a case-sensitive format string such as "yyyy-MM-dd HH:mm:ss.SSSS", matching one of the allowed variations of TIMESTAMP value (date plus time, only date, only time, optional fractional seconds).

TINYINT Data Type

A 1-byte integer data type used in CREATE TABLE and ALTER TABLE statements.

Range: -128 .. 127. There is no UNSIGNED subtype.

Conversions: Impala automatically converts to a larger integer type (SMALLINT, INT, or BIGINT) or a floating-point type (FLOAT or DOUBLE) automatically. Use CAST() to convert to STRING or TIMESTAMP. Casting an integer value N to TIMESTAMP produces a value that is N seconds past the start of the epoch date (January 1, 1970).

Related information: Literals, INT Data Type, BIGINT Data Type, SMALLINT Data Type, Mathematical Functions

Page generated September 3, 2015.