Installing Impala without Cloudera Manager
Before installing Impala manually, make sure all applicable nodes have the appropriate hardware configuration, levels of operating system and CDH, and any other software prerequisites. See Cloudera Impala Requirements for details.
You can install Impala across many nodes or on one node:
- Installing Impala across multiple machines creates a distributed configuration. For best performance, install Impala on all DataNodes.
- Installing Impala on a single machine produces a pseudo-distributed cluster.
To install Impala on a node:
- Install CDH as described in the Installation section of the CDH 5 Installation Guide.
Install the Hive metastore somewhere in your cluster, as described in Hive Installation in the CDH 5 Installation Guide. As part of this process, you configure the Hive metastore to use an external database as a metastore. Impala uses this same database for its own table metadata. You can choose either a MySQL or PostgreSQL database as the metastore. The process for configuring each type of database is described in the CDH 5 Installation Guide).
Cloudera recommends setting up a Hive metastore service rather than connecting directly to the metastore database; this configuration is required when running Impala under CDH 4.1. Make sure the /etc/impala/hive-site.xml file contains the following setting, substituting the appropriate host name for metastore_server_host:
<property> <name>hive.metastore.uris</name> <value>thrift://metastore_server_host:9083</value> </property> <property> <name>hive.metastore.client.socket.timeout</name> <value>3600</value> <description>MetaStore Client socket timeout in seconds</description> </property>
(Optional) If you installed the full Hive component on any node, you can verify that the metastore is configured properly by starting the Hive
console and querying for the list of available tables. Once you confirm that the console starts, exit the console
to continue the installation:
$ hive Hive history file=/tmp/root/hive_job_log_root_201207272011_678722950.txt hive> show tables; table1 table2 hive> quit; $
- Confirm that your package management command is aware of the Impala repository settings, as described in Cloudera Impala Requirements. (This is a different repository than for CDH.) You might need to download a repo or list file into a system directory underneath /etc.
Use one of the following sets of commands to install the Impala package:
For RHEL, Oracle Linux, or CentOS systems:
$ sudo yum install impala # Binaries for daemons $ sudo yum install impala-server # Service start/stop script $ sudo yum install impala-state-store # Service start/stop script $ sudo yum install impala-catalog # Service start/stop script
For SUSE systems:
$ sudo zypper install impala # Binaries for daemons $ sudo zypper install impala-server # Service start/stop script $ sudo zypper install impala-state-store # Service start/stop script $ sudo zypper install impala-catalog # Service start/stop script
For Debian or Ubuntu systems:
$ sudo apt-get install impala # Binaries for daemons $ sudo apt-get install impala-server # Service start/stop script $ sudo apt-get install impala-state-store # Service start/stop script $ sudo apt-get install impala-catalog # Service start/stop scriptNote
:Cloudera recommends that you not install Impala on any HDFS NameNode. Installing Impala on NameNodes provides no additional data locality, and executing queries with such a configuration might cause memory contention and negatively impact the HDFS NameNode.
- Copy the client hive-site.xml, core-site.xml, and hdfs-site.xml, configuration files to the Impala configuration directory, which defaults to /etc/impala/conf. Create this directory if it does not already exist.
Use one of the following commands to install
impala-shell on the machines from which you want to issue
queries. You can install impala-shell on any supported machine
that can connect to DataNodes that are running impalad.
For RHEL/CentOS systems:
$ sudo yum install impala-shell
For SUSE systems:
$ sudo zypper install impala-shell
For Debian/Ubuntu systems:
$ sudo apt-get install impala-shell
- Complete any required or recommended configuration, as described in Post-Installation Configuration for Impala. Some of these configuration changes are mandatory. (They are applied automatically when you install using Cloudera Manager.)
Once installation and configuration are complete, see Starting Impala for how to activate the software on the appropriate nodes in your cluster.
If this is your first time setting up and using Impala in this cluster, run through some of the exercises in Impala Tutorial to verify that you can do basic operations such as creating tables and querying them.