Connecting to impalad through impala-shell

Within an impala-shell session, you can only issue queries while connected to an instance of the impalad daemon. You can specify the connection information:
  • Through command-line options when you run the impala-shell command.
  • Through a configuration file that is read when you run the impala-shell command.
  • During an impala-shell session, by issuing a CONNECT command.
See impala-shell Configuration Options for the command-line and configuration file options you can use.

You can connect to any DataNode where an instance of impalad is running, and that host coordinates the execution of all queries sent to it.

For simplicity during development, you might always connect to the same host, perhaps running impala-shell on the same host as impalad and specifying the hostname as localhost.

In a production environment, you might enable load balancing, in which you connect to specific host/port combination but queries are forwarded to arbitrary hosts. This technique spreads the overhead of acting as the coordinator node among all the DataNodes in the cluster. See Using Impala through a Proxy for High Availability for details.

To connect the Impala shell during shell startup:

  1. Locate the hostname of a DataNode within the cluster that is running an instance of the impalad daemon. If that DataNode uses a non-default port (something other than port 21000) for impala-shell connections, find out the port number also.
  2. Use the -i option to the impala-shell interpreter to specify the connection information for that instance of impalad:
    # When you are logged into the same machine running impalad.
    # The prompt will reflect the current hostname.
    $ impala-shell
    
    # When you are logged into the same machine running impalad.
    # The host will reflect the hostname 'localhost'.
    $ impala-shell -i localhost
    
    # When you are logged onto a different host, perhaps a client machine
    # outside the Hadoop cluster.
    $ impala-shell -i some.other.hostname
    
    # When you are logged onto a different host, and impalad is listening
    # on a non-default port. Perhaps a load balancer is forwarding requests
    # to a different host/port combination behind the scenes.
    $ impala-shell -i some.other.hostname:port_number
    

To connect the Impala shell after shell startup:

  1. Start the Impala shell with no connection:
    $ impala-shell

    You should see a prompt like the following:

    Welcome to the Impala shell. Press TAB twice to see a list of available commands.
    
    Copyright (c) 2012 Cloudera, Inc. All rights reserved.
    
    (Shell
          build version: Impala Shell v2.0.x (hash) built on
          date)
    [Not connected] > 
  2. Locate the hostname of a DataNode within the cluster that is running an instance of the impalad daemon. If that DataNode uses a non-default port (something other than port 21000) for impala-shell connections, find out the port number also.
  3. Use the connect command to connect to an Impala instance. Enter a command of the form:
    [Not connected] > connect impalad-host
    [impalad-host:21000] >

To start impala-shell in a specific database:

You can use all the same connection options as in previous examples. For simplicity, these examples assume that you are logged into one of the DataNodes that is running the impalad daemon.

  1. Find the name of the database containing the relevant tables, views, and so on that you want to operate on.
  2. Use the -d option to the impala-shell interpreter to connect and immediately switch to the specified database, without the need for a USE statement or fully qualified names:
    # Subsequent queries with unqualified names operate on
    # tables, views, and so on inside the database named 'staging'.
    $ impala-shell -i localhost -d staging
    
    # It is common during development, ETL, benchmarking, and so on
    # to have different databases containing the same table names
    # but with different contents or layouts.
    $ impala-shell -i localhost -d parquet_snappy_compression
    $ impala-shell -i localhost -d parquet_gzip_compression
    

To run one or several statements in non-interactive mode:

You can use all the same connection options as in previous examples. For simplicity, these examples assume that you are logged into one of the DataNodes that is running the impalad daemon.

  1. Construct a statement, or a file containing a sequence of statements, that you want to run in an automated way, without typing or copying and pasting each time.
  2. Invoke impala-shell with the -q option to run a single statement, or the -f option to run a sequence of statements from a file. The impala-shell command returns immediately, without going into the interactive interpreter.
    # A utility command that you might run while developing shell scripts
    # to manipulate HDFS files.
    $ impala-shell -i localhost -d database_of_interest -q 'show tables'
    
    # A sequence of CREATE TABLE, CREATE VIEW, and similar DDL statements
    # can go into a file to make the setup process repeatable.
    $ impala-shell -i localhost -d database_of_interest -f recreate_tables.sql