This is the documentation for Cloudera Manager 4.8.4.
Documentation for other versions is available at Cloudera Documentation.

The Impala Service

  Warning: Cloudera Manager 4.8 supports only Impala 1.2, and does not support Impala 1.1.1 or earlier. (See http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Release-Notes/cmrn_incompat_changes.html for more information.)

You can install Cloudera Impala through the Cloudera Manager installation wizard, using either parcels or packages, and have the service started as part of the First Run process. All configuration settings, including the Hive metastore setup, are handled by Cloudera Manager as part of the installation wizard. See Installation Path A - Automated Installation by Cloudera Manager for more information.

If you elect not to include the Impala service using the Installation Wizard, you can use the Add Service wizard to perform the installation.

Impala depends on ZooKeeper, HDFS, HBase, and Hive. All these services must be present in order to run the Impala service.

Simply follow the steps in the Add Service wizard. It will automatically configure and start the dependent services and the Impala service. See Adding Services for instructions.

Installing Impala after Upgrading Cloudera Manager

If you have just upgraded Cloudera Manager from a version that did not support Impala, the Impala software is not installed automatically. (Upgrading Cloudera Manager does not automatically upgrade CDH or other Cloudera products).

You can add Impala using Parcels; go to the Hosts tab, and select the Parcels tab. You should see at least one Impala parcel available for download. See Using Parcels for detailed instructions on using parcels to install or upgrade Impala.

If you do not see any Impala parcels available, verify that the Impala parcel repo URL (http://archive.cloudera.com/impala/parcels/latest/) has been configured in the Parcels configuration page. Click the Edit Settings button on the Parcels page to go to the Parcel configuration settings. See Parcel Configuration Settings for more details.

Configuring the Impala Service

There are several types of configuration settings you may need to apply, depending on your situation.

Managing Resources for Impala

Once you have installed Impala, you can coordinate its use of cluster resources in relation to MapReduce needs for the same resources. See Setting up a Multi-tenant Cluster for Impala and MapReduce below, as well as Resource Management in Managing Clusters with Cloudera Manager.

Running Impala with CDH 4.1

  Note: If you are running CDH 4.1, and the Bypass Hive Metastore Server option is enabled, you must add the following to the Impala Safety Valve for hive-site.xml, replacing <hive_metastore_server_host> with the name of your Hive metastore server host:
<property>
  <name> hive.metastore.local</name>
  <value>false</value> 
</property> 
<property>
  <name> hive.metastore.uris</name>
  <value>thrift://<hive_metastore_server_host>:9083</value> 
</property>
Otherwise, Impala queries will fail.

Configuring Hive Table Statistics

Configuring Hive table statistics is highly recommended when using Impala. It allows Impala to make optimizations that can result in significant (over 10x) performance improvements for some joins. If these are not available, Impala will still function, but at lower performance.

Configuring Hive to Store Statistics in MySQL

By default, Hive writes statistics to a Derby database backed by a file named /var/lib/hive/TempStatsStore. However, in production systems Cloudera recommends that you store statistics in a database. Hive table statistics are not supported for PostgreSQL or Oracle. To configure Hive to store statistics in MySQL:
  1. Set up a MySQL server. For instructions on setting up MySQL, see Installing and Configuring a MySQL Database .

    This database will be heavily loaded, so it should not be installed on the same host as anything critical such as the Hive Metastore Server, the database hosting the Hive Metastore, or Cloudera Manager Server. When collecting statistics on a large table and/or in a large cluster, this host may become slow or unresponsive.

  2. Create a statistics database in MySQL:
    mysql> create database stats_db_name DEFAULT CHARACTER SET utf8;
    Query OK, 1 row affected (0.00 sec)
    
    mysql> grant all on stats_db_name.* TO 'stats_user'@'%' IDENTIFIED BY 'stats_password';
    Query OK, 0 rows affected (0.00 sec)
  3. Add the following into the HiveServer2 Configuration Safety Valve for hive-site.xml:
    <property>
      <name>hive.stats.dbclass</name>
      <value>jdbc:mysql</value>
    </property>
    <property>
      <name>hive.stats.jdbcdriver</name>
      <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
      <name>hive.stats.dbconnectionstring</name>
      <value>jdbc:mysql://<stats_mysql_host>:3306/<stats_db_name>?useUnicode=true&amp;
    characterEncoding=UTF-8&amp;user=<stats_user>&amp;password=<stats_password></value>
    </property>
    <property> 
      <name>hive.aux.jars.path</name> 
      <value>file:///usr/share/java/mysql-connector-java.jar</value>
    </property>  
  4. Restart HiveServer2.

Configuring Secure Access for the Impala Web Server

Cloudera Manager supports two methods of authentication for secure access to the Impala web server interfaces: password-based authentication and SSL certificate support. Both of these can be configured through properties of the Impala and Impala StateStore daemons. Authentication for the two types of daemons can be configured independently.

The Impala StateStore Daemon Web UI can be accessed from a link on the menu bar of the Impala Service. To access the Web UI for an Impala Daemon, you must go to the Instances tab and select the Impala Daemon instance you want to contact. The Impala Daemon Web UI link is found on the menu bar for the specific daemon instance.

Note that you can disable/enable access to both the Impala StateStore web server and the Impala daemon web server via configuration properties.

Configuring Password Authentication

To configure password-based authentication:

  1. Go to the Impala service page (from the Services menu, select Impala)
  2. Under the Configuration menu, select View and Edit.
  3. Search for "password" using the Search box within the Configuration page.

    This should display the password-related properties (Username and Password properties) for the Impala daemon and the Impala StateStore daemon. Note that if there are multiple role groups configured for Impala daemon instances, the search should display all of them.

  4. Enter a username and password into these fields, and Save Changes.
  5. Restart the Impala service in order to have these configuration changes take effect. (You can do this from the Actions menu at the top of the page.)

Now when you access the Web UI for the Impala daemon or StateStore daemon, you are asked to log in before access is granted.

Configuring SSL Certificate Support

To configure certificate-based authentication:

  1. Create or obtain an SSL certificate.
  2. Place the certificate, in .pem format, on the host where the Impala StateStore daemon is running, and on each host where an Impala daemon is running. It can be placed in any location (path) you choose. (Note that if all your Impala daemons are members of the same role group, then the .pem file must have the same path on each host).
  3. Go to the Impala service.
  4. Select Configuration > View and Edit.
  5. Search for "certificate" using the Search box within the Configuration page.

    This should display the certificate file location properties for the Impala daemon and the Impala StateStore daemon. Note that if there are multiple role groups configured for Impala daemon instances, the search should display all of them.

  6. In the property fields, enter the full path name to your certificate file, and Save Changes.
  7. Restart the Impala service in order to have these configuration changes take effect. (You can do this from the Actions menu at the top of the page.) Note that if Cloudera Manager cannot find the .pem file on the host for a specific role instance, that role will fail to start.

Now when you access the Web UI for the Impala daemon or StateStore daemon, https will be used.

Deploying Impala with Hue

For CDH 4.3 and earlier, in order to use Cloudera Impala with Hue, you must add the host name of the Impala Daemon in the Hue Server safety valve.
  1. From the Services menu, select the Hue service.
  2. Select Configuration > View and Edit.
  3. Search for the word "safety". This will display a set of Hue Safety Valve properties.
  4. Add information about your Impala Daemon host to the Hue Server Configuration Safety Valve for hue_safety_valve_server.ini found under the Hue Server (Default) / Advanced category:
    [impala]
    server_host=<impalad_hostname>
    server_port=21000

    Substitute your actual hostname for <impalad_hostname>. You can chose any one of your Impala Daemon hosts, assuming you have more than one.

  5. Click Save Changes.
  6. Restart the Hue Service.

Configuring Sentry (Hive Authorization) for Impala

To configure Impala to use Sentry for Hive Authorization, you must first configure Hive to use Sentry. Once that has been done, you can enable Sentry for Impala.

To enable Sentry for Hive, see Setting Up Hive Authorization with Sentry.

When that has been done, you can enable Sentry for Impala following the instructions at Enabling Sentry for Impala.