This is the documentation for CDH 5.0.x. Documentation for other versions is available at Cloudera Documentation.

Installing Impala without Cloudera Manager

Before installing Impala manually, make sure all applicable nodes have the appropriate hardware configuration, levels of operating system and CDH, and any other software prerequisites. See Cloudera Impala Requirements for details.

You can install Impala across many hosts or on one host:

  • Installing Impala across multiple machines creates a distributed configuration. For best performance, install Impala on all DataNodes.
  • Installing Impala on a single machine produces a pseudo-distributed cluster.

To install Impala on a host:

  1. Install CDH as described in the Installation section of the CDH 4 Installation Guide or the CDH 5 Installation Guide.
  2. Install the Hive metastore somewhere in your cluster, as described in the Hive Installation topic in the CDH 4 Installation Guide or the CDH 5 Installation Guide. As part of this process, you configure the Hive metastore to use an external database as a metastore. Impala uses this same database for its own table metadata. You can choose either a MySQL or PostgreSQL database as the metastore. The process for configuring each type of database is described in the CDH Installation Guide).

    Cloudera recommends setting up a Hive metastore service rather than connecting directly to the metastore database; this configuration is required when running Impala under CDH 4.1. Make sure the /etc/impala/hive-site.xml file contains the following setting, substituting the appropriate host name for metastore_server_host:

    <property>
    <name>hive.metastore.uris</name>
    <value>thrift://metastore_server_host:9083</value>
    </property>
    <property>
    <name>hive.metastore.client.socket.timeout</name>
    <value>3600</value>
    <description>MetaStore Client socket timeout in seconds</description>
    </property>
  3. (Optional) If you installed the full Hive component on any host, you can verify that the metastore is configured properly by starting the Hive console and querying for the list of available tables. Once you confirm that the console starts, exit the console to continue the installation:
    $ hive
    Hive history file=/tmp/root/hive_job_log_root_201207272011_678722950.txt
    hive> show tables;
    table1
    table2
    hive> quit;
    $
  4. Confirm that your package management command is aware of the Impala repository settings, as described in Cloudera Impala Requirements. (For CDH 4, this is a different repository than for CDH.) You might need to download a repo or list file into a system directory underneath /etc.
  5. Use one of the following sets of commands to install the Impala package:

    For RHEL, Oracle Linux, or CentOS systems:

    $ sudo yum install impala             # Binaries for daemons
    $ sudo yum install impala-server      # Service start/stop script
    $ sudo yum install impala-state-store # Service start/stop script
    $ sudo yum install impala-catalog     # Service start/stop script
    

    For SUSE systems:

    $ sudo zypper install impala             # Binaries for daemons
    $ sudo zypper install impala-server      # Service start/stop script
    $ sudo zypper install impala-state-store # Service start/stop script
    $ sudo zypper install impala-catalog     # Service start/stop script
    

    For Debian or Ubuntu systems:

    $ sudo apt-get install impala             # Binaries for daemons
    $ sudo apt-get install impala-server      # Service start/stop script
    $ sudo apt-get install impala-state-store # Service start/stop script
    $ sudo apt-get install impala-catalog     # Service start/stop script
    
      Note: Cloudera recommends that you not install Impala on any HDFS NameNode. Installing Impala on NameNodes provides no additional data locality, and executing queries with such a configuration might cause memory contention and negatively impact the HDFS NameNode.
  6. Copy the client hive-site.xml, core-site.xml, hdfs-site.xml, and hbase-site.xml configuration files to the Impala configuration directory, which defaults to /etc/impala/conf. Create this directory if it does not already exist.
  7. Use one of the following commands to install impala-shell on the machines from which you want to issue queries. You can install impala-shell on any supported machine that can connect to DataNodes that are running impalad.

    For RHEL/CentOS systems:

    $ sudo yum install impala-shell

    For SUSE systems:

    $ sudo zypper install impala-shell

    For Debian/Ubuntu systems:

    $ sudo apt-get install impala-shell
  8. Complete any required or recommended configuration, as described in Post-Installation Configuration for Impala. Some of these configuration changes are mandatory. (They are applied automatically when you install using Cloudera Manager.)

Once installation and configuration are complete, see Starting Impala for how to activate the software on the appropriate nodes in your cluster.

If this is your first time setting up and using Impala in this cluster, run through some of the exercises in Impala Tutorial to verify that you can do basic operations such as creating tables and querying them.

Page generated September 3, 2015.