Overview of Impala Identifiers

Identifiers are the names of databases, tables, or columns that you specify in a SQL statement. The rules for identifiers govern what names you can give to things you create, the notation for referring to names containing unusual characters, and other aspects such as case sensitivity.

  • The minimum length of an identifier is 1 character.

  • The maximum length of an identifier is currently 128 characters, enforced by the metastore database.

  • An identifier must start with an alphanumeric or underscore character. The remainder can contain any combination of alphanumeric characters and underscores. Quoting the identifier with backticks has no effect on the allowed characters in the name.

  • An identifier can contain only ASCII characters.

  • To use an identifier name that matches one of the Impala reserved keywords (listed in Impala Reserved Words), surround the identifier with `` characters (backticks). Quote the reserved word even if it is part of a fully qualified name. The following example shows how a reserved word can be used as a column name if it is quoted with backticks in the CREATE TABLE statement, and how the column name must also be quoted with backticks in a query:

    [localhost:21000] > create table reserved (`data` string);
    
    [localhost:21000] > select data from reserved;
    ERROR: AnalysisException: Syntax error in line 1:
    select data from reserved
           ^
    Encountered: DATA
    Expected: ALL, CASE, CAST, DISTINCT, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, STRAIGHT_JOIN, TRUE, IDENTIFIER
    CAUSED BY: Exception: Syntax error
    
    [localhost:21000] > select reserved.data from reserved;
    ERROR: AnalysisException: Syntax error in line 1:
    select reserved.data from reserved
                    ^
    Encountered: DATA
    Expected: IDENTIFIER
    CAUSED BY: Exception: Syntax error
    
    [localhost:21000] > select reserved.`data` from reserved;
    
    [localhost:21000] >
    
  • Impala identifiers are always case-insensitive. That is, tables named t1 and T1 always refer to the same table, regardless of quote characters. Internally, Impala always folds all specified table and column names to lowercase. This is why the column headers in query output are always displayed in lowercase.

See Overview of Impala Aliases for how to define shorter or easier-to-remember aliases if the original names are long or cryptic identifiers. Aliases follow the same rules as identifiers when it comes to case insensitivity. Aliases can be longer than identifiers (up to the maximum length of a Java string) and can include additional characters such as spaces and dashes when they are quoted using backtick characters.

Another way to define different names for the same tables or columns is to create views. See Overview of Impala Views for details.