Running Commands and SQL Statements in impala-shell

For information on available commands, see impala-shell Command Reference. You can see the full set of available commands by pressing TAB twice, for example:

[impalad-host:21000] >
connect   describe  explain   help      history   insert    quit      refresh   select    set       shell     show      use       version
[impalad-host:21000] >

For example:

[localhost:21000] > select *
                  > from t1
                  > limit 5;
+---------+-----------+
| s1      | s2        |
+---------+-----------+
| hello   | world     |
| goodbye | cleveland |
+---------+-----------+

A comment is considered part of the statement it precedes, so when you enter a -- or /* */ comment, you get a continuation prompt until you finish entering a statement ending with a semicolon:

[localhost:21000] > -- This is a test comment
                  > show tables like 't*';
+--------+
| name   |
+--------+
| t1     |
| t2     |
| tab1   |
| tab2   |
| tab3   |
| text_t |
+--------+

Use the up-arrow and down-arrow keys to cycle through and edit previous commands. impala-shell uses the readline library and so supports a standard set of keyboard shortcuts for editing and cursor movement, such as Ctrl-A for beginning of line and Ctrl-E for end of line.

In CDH 5.7 / Impala 2.5 and higher, you can define substitution variables to be used within SQL statements processed by impala-shell. On the command line, you specify the option --var=variable_name=value. Within an interactive session or a script file processed by the -f option, you specify a SET command using the notation SET VAR:variable_name=value. Within a SQL statement, you substitute the value by using the notation ${var:variable_name}.

For example, here are some impala-shell commands that define substitution variables and then use them in SQL statements executed through the -q and -f options. Notice how the -q argument strings are single-quoted to prevent shell expansion of the ${var:value} notation, and any string literals within the queries are enclosed by double quotation marks.

$ impala-shell --var=tname=table1 --var=colname=x --var=coltype=string -q 'create table ${var:tname} (${var:colname} ${var:coltype}) stored as parquet'
Starting Impala Shell without Kerberos authentication
Connected to hostname
Server version: impalad_version
Query: create table table1 (x string) stored as parquet

$ NEW_STRING="hello world"
$ impala-shell --var=tname=table1 --var=insert_val="$NEW_STRING" -q 'insert into ${var:tname} values ("${var:insert_val}")'
Starting Impala Shell without Kerberos authentication
Connected to hostname
Server version: impalad_version
Query: insert into table1 values ("hello world")
Inserted 1 row(s) in 1.40s

$ for VAL in foo bar bletch
do
  impala-shell --var=tname=table1 --var=insert_val="$VAL" -q 'insert into ${var:tname} values ("${var:insert_val}")'
done
...
Query: insert into table1 values ("foo")
Inserted 1 row(s) in 0.22s
Query: insert into table1 values ("bar")
Inserted 1 row(s) in 0.11s
Query: insert into table1 values ("bletch")
Inserted 1 row(s) in 0.21s

$ echo "Search for what substring?" ; read answer
Search for what substring?
b
$ impala-shell --var=tname=table1 -q 'select x from ${var:tname} where x like "%${var:answer}%"'
Starting Impala Shell without Kerberos authentication
Connected to hostname
Server version: impalad_version
Query: select x from table1 where x like "%b%"
+--------+
| x      |
+--------+
| bletch |
| bar    |
+--------+
Fetched 2 row(s) in 0.83s

Here is a substitution variable passed in by the --var option, and then referenced by statements issued interactively. Then the variable is cleared with the UNSET command, and defined again with the SET command.

$ impala-shell --quiet --var=tname=table1
Starting Impala Shell without Kerberos authentication
***********************************************************************************
banner_message
***********************************************************************************
[hostname:21000] > select count(*) from ${var:tname};
+----------+
| count(*) |
+----------+
| 4        |
+----------+
[hostname:21000] > unset var:tname;
Unsetting variable TNAME
[hostname:21000] > select count(*) from ${var:tname};
Error: Unknown variable TNAME
[hostname:21000] > set var:tname=table1;
[hostname:21000] > select count(*) from ${var:tname};
+----------+
| count(*) |
+----------+
| 4        |
+----------+

The following example shows how the SOURCE command can execute a series of statements from a file:

$ cat commands.sql
show databases;
show tables in default;
show functions in _impala_builtins like '*minute*';

$ impala-shell -i localhost
...
[localhost:21000] > source commands.sql;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default          | Default Hive database                        |
+------------------+----------------------------------------------+
Fetched 2 row(s) in 0.06s
Query: show tables in default
+-----------+
| name      |
+-----------+
| customers |
| sample_07 |
| sample_08 |
| web_logs  |
+-----------+
Fetched 4 row(s) in 0.02s
Query: show functions in _impala_builtins like '*minute*'
+-------------+--------------------------------+-------------+---------------+
| return type | signature                      | binary type | is persistent |
+-------------+--------------------------------+-------------+---------------+
| INT         | minute(TIMESTAMP)              | BUILTIN     | true          |
| TIMESTAMP   | minutes_add(TIMESTAMP, BIGINT) | BUILTIN     | true          |
| TIMESTAMP   | minutes_add(TIMESTAMP, INT)    | BUILTIN     | true          |
| TIMESTAMP   | minutes_sub(TIMESTAMP, BIGINT) | BUILTIN     | true          |
| TIMESTAMP   | minutes_sub(TIMESTAMP, INT)    | BUILTIN     | true          |
+-------------+--------------------------------+-------------+---------------+
Fetched 5 row(s) in 0.03s

The following example shows how a file that is run by the SOURCE command, or through the -q or -f options of impala-shell, can contain additional SOURCE commands. The first file, nested1.sql, runs an impala-shell command and then also runs the commands from nested2.sql. This ability for scripts to call each other is often useful for code that sets up schemas for applications or test environments.

$ cat nested1.sql
show functions in _impala_builtins like '*minute*';
source nested2.sql
$ cat nested2.sql
show functions in _impala_builtins like '*hour*'

$ impala-shell -i localhost -f nested1.sql
Starting Impala Shell without Kerberos authentication
Connected to localhost:21000
...
Query: show functions in _impala_builtins like '*minute*'
+-------------+--------------------------------+-------------+---------------+
| return type | signature                      | binary type | is persistent |
+-------------+--------------------------------+-------------+---------------+
| INT         | minute(TIMESTAMP)              | BUILTIN     | true          |
| TIMESTAMP   | minutes_add(TIMESTAMP, BIGINT) | BUILTIN     | true          |
| TIMESTAMP   | minutes_add(TIMESTAMP, INT)    | BUILTIN     | true          |
| TIMESTAMP   | minutes_sub(TIMESTAMP, BIGINT) | BUILTIN     | true          |
| TIMESTAMP   | minutes_sub(TIMESTAMP, INT)    | BUILTIN     | true          |
+-------------+--------------------------------+-------------+---------------+
Fetched 5 row(s) in 0.01s
Query: show functions in _impala_builtins like '*hour*'
+-------------+------------------------------+-------------+---------------+
| return type | signature                    | binary type | is persistent |
+-------------+------------------------------+-------------+---------------+
| INT         | hour(TIMESTAMP)              | BUILTIN     | true          |
| TIMESTAMP   | hours_add(TIMESTAMP, BIGINT) | BUILTIN     | true          |
| TIMESTAMP   | hours_add(TIMESTAMP, INT)    | BUILTIN     | true          |
| TIMESTAMP   | hours_sub(TIMESTAMP, BIGINT) | BUILTIN     | true          |
| TIMESTAMP   | hours_sub(TIMESTAMP, INT)    | BUILTIN     | true          |
+-------------+------------------------------+-------------+---------------+
Fetched 5 row(s) in 0.01s

Rerunning impala-shell Commands

In CDH 5.13 / Impala 2.10 and higher, you can use the rerun command, or its abbreviation @, to re-execute commands from the history list. The argument can be a positive integer (reflecting the number shown in history output) or a negative integer (reflecting the N'th last command in the history output. For example:

[localhost:21000] > select * from p1 order by t limit 5;
...
[localhost:21000] > show table stats p1;
+-----------+--------+--------+------------------------------------------------------------+
| #Rows     | #Files | Size   | Location                                                   |
+-----------+--------+--------+------------------------------------------------------------+
| 134217728 | 50     | 4.66MB | hdfs://test.example.com:8020/user/hive/warehouse/jdr.db/p1 |
+-----------+--------+--------+------------------------------------------------------------+
[localhost:21000] > compute stats p1;
+-----------------------------------------+
| summary                                 |
+-----------------------------------------+
| Updated 1 partition(s) and 3 column(s). |
+-----------------------------------------+
[localhost:21000] > history;
[1]: use jdr;
[2]: history;
[3]: show tables;
[4]: select * from p1 order by t limit 5;
[5]: show table stats p1;
[6]: compute stats p1;
[7]: history;
[localhost:21000] > @-2; <- Rerun the 2nd last command in the history list
Rerunning compute stats p1;
+-----------------------------------------+
| summary                                 |
+-----------------------------------------+
| Updated 1 partition(s) and 3 column(s). |
+-----------------------------------------+
[localhost:21000] > history; <- History list is not updated by rerunning commands
                                or by repeating the last command, in this case 'history'.
[1]: use jdr;
[2]: history;
[3]: show tables;
[4]: select * from p1 order by t limit 5;
[5]: show table stats p1;
[6]: compute stats p1;
[7]: history;
[localhost:21000] > @4; <- Rerun command #4 in the history list using short form '@'.
Rerunning select * from p1 order by t limit 5;
...
[localhost:21000] > rerun 4; <- Rerun command #4 using long form 'rerun'.
Rerunning select * from p1 order by t limit 5;
...