Using the Impala Shell (impala-shell Command)

You can use the Impala shell tool (impala-shell) to set up databases and tables, insert data, and issue queries. For ad hoc queries and exploration, you can submit SQL statements in an interactive session. To automate your work, you can specify command-line options to process a single statement or a script file. The impala-shell interpreter accepts all the same SQL statements listed in Impala SQL Statements, plus some shell-only commands that you can use for tuning performance and diagnosing problems.

The impala-shell command fits into the familiar Unix toolchain:

  • The -q option lets you issue a single query from the command line, without starting the interactive interpreter. You could use this option to run impala-shell from inside a shell script or with the command invocation syntax from a Python, Perl, or other kind of script.
  • The -f option lets you process a file containing multiple SQL statements, such as a set of reports or DDL statements to create a group of tables and views.
  • The --var option lets you pass substitution variables to the statements that are executed by that impala-shell session, for example the statements in a script file processed by the -f option. You encode the substitution variable on the command line using the notation --var=variable_name=value. Within a SQL statement, you substitute the value by using the notation ${var:variable_name}. This feature is available in CDH 5.7 / Impala 2.5 and higher.
  • The -o option lets you save query output to a file.
  • The -B option turns off pretty-printing, so that you can produce comma-separated, tab-separated, or other delimited text files as output. (Use the --output_delimiter option to choose the delimiter character; the default is the tab character.)
  • In non-interactive mode, query output is printed to stdout or to the file specified by the -o option, while incidental output is printed to stderr, so that you can process just the query output as part of a Unix pipeline.
  • In interactive mode, impala-shell uses the readline facility to recall and edit previous commands.

Cloudera Manager installs impala-shell automatically. You might install impala-shell manually on other systems not managed by Cloudera Manager, so that you can issue queries from client systems that are not also running the Impala daemon or other Apache Hadoop components.

For information about establishing a connection to a DataNode running the impalad daemon through the impala-shell command, see Connecting to impalad through impala-shell.

For a list of the impala-shell command-line options, see impala-shell Configuration Options. For reference information about the impala-shell interactive commands, see impala-shell Command Reference.