Connecting to impalad
Within an impala-shell session, you can only issue queries while connected to an instance of the impalad daemon. You can specify the connection information through command-line options when you run the impala-shell command, or during an impala-shell session by issuing a CONNECT command. You can connect to any DataNode where an instance of impalad is running, and that node coordinates the execution of all queries sent to it.
For simplicity, you might always connect to the same node, perhaps running impala-shell on the same node as impalad and specifying the host name as localhost. Routing all SQL statements to the same node can help to avoid issuing frequent REFRESH statements, as is necessary when table data or metadata is updated through a different node.
For load balancing or general flexibility, you might connect to an arbitrary node for each impala-shell session. In this case, depending on whether table data or metadata might have been updated through another node, you might issue a REFRESH statement to bring the metadata for all tables up to date on this node (for a long-lived session that will query many tables) or issue specific REFRESH table_name statements just for the tables you intend to query.
To connect the Impala shell to any DataNode with an impalad daemon:
Start the Impala shell with no connection:
You should see a prompt like the following:
Welcome to the Impala shell. Press TAB twice to see a list of available commands. Copyright (c) 2012 Cloudera, Inc. All rights reserved. (Shell build version: Impala Shell v1.0.1 (9ef893a) built on Fri May 31 17:50:30 PDT 2013) [Not connected] >
Use the connect command to connect to an Impala instance. Enter
a command of the form:
[Not connected] > connect impalad-host [impalad-host:21000] >Note
:Replace impalad-host with the host name you have configured for any DataNode running Impala in your environment. The changed prompt indicates a successful connection.