Cloudera Impala is an open-source add-on to the Cloudera Enterprise Core that returns rapid responses to queries.
Under CDH 5, Impala is included as part of the CDH installation and no separate steps are needed. Therefore, the instruction steps in this section apply to CDH 4 only.
What is Included in an Impala Installation
Impala is made up of a set of components that can be installed on multiple nodes throughout your cluster. The key installation step for performance is to install the impalad daemon (which does most of the query processing work) on all data nodes in the cluster.
The Impala package installs these binaries:
impalad - The Impala daemon. Plans and executes queries against HDFS and HBase data. Run one impalad process on each node in the cluster that has a data node.
statestored - Name service that tracks location and status of all impalad instances in the cluster. Run one instance of this daemon on a node in your cluster. Most production deployments run this daemon on the namenode.
catalogd - Metadata coordination service that broadcasts changes from Impala DDL and DML statements to all affected Impala nodes, so that new tables, newly loaded data, and so on are immediately visible to queries submitted through any Impala node. (Prior to Impala 1.2, you had to run the REFRESH or INVALIDATE METADATA statement on each node to synchronize changed metadata. Now those statements are only required if you perform the DDL or DML through Hive.) Run one instance of this daemon on a node in your cluster, preferable on the same host as the statestored daemon.
impala-shell - Command-line interface for issuing queries to the Impala daemon. You install this on one or more hosts anywhere on your network, not necessarily data nodes or even within the same cluster as Impala. It can connect remotely to any instance of the Impala daemon.
Before doing the installation, ensure that you have all necessary prerequisites. See Cloudera Impala Requirements for details.
Impala Installation Procedure for CDH 4 Users
You can install Impala under CDH 4 in one of two ways:
- Using the Cloudera Manager installer. This is the recommended technique for doing a reliable and verified Impala installation. Cloudera Manager 4.8 or higher can automatically install, configure, manage, and monitor Impala 1.2.1 and higher. The latest Cloudera Manager is always preferable, because newer Cloudera Manager releases have configuration settings for the most recent Impala features.
- Using a manual process for systems not managed by Cloudera Manager. You must do additional verification steps in this case, to check that Impala can interact with other Hadoop components correctly, and that your cluster is configured for efficient Impala execution.
|<< Guidelines for Designing Impala Schemas||Installing Impala with Cloudera Manager >>|