This is the documentation for CDH 5.0.x. Documentation for other versions is available at Cloudera Documentation.

Appendix B - Troubleshooting Impala

Use the following steps to diagnose and debug problems with any aspect of Impala.

In general, if queries issued against Impala fail, you can try running these same queries against Hive.

  • If a query fails against both Impala and Hive, it is likely that there is a problem with your query or other elements of your environments.
    • Review the Language Reference to ensure your query is valid.
    • Review the contents of the Impala logs for any information that may be useful in identifying the source of the problem.
  • If a query fails against Impala but not Hive, it is likely that there is a problem with your Impala installation.

The following table lists common problems and potential solutions.

Symptom

Explanation

Recommendation

Joins fail to complete.

There may be insufficient memory. During a join, data from the second, third, and so on sets to be joined is loaded into memory. If Impala chooses an inefficient join order or join mechanism, the query could exceed the total memory available.

Start by gathering statistics with the COMPUTE STATS statement for each table involved in the join. Consider specifying the [SHUFFLE] hint so that data from the joined tables is split up between nodes rather than broadcast to each node. If tuning at the SQL level is not sufficient, add more memory to your system or join smaller data sets.

Queries return incorrect results.

Impala metadata may be outdated after changes are performed in Hive.

Where possible, use the appropriate Impala statement (INSERT, LOAD DATA, CREATE TABLE, ALTER TABLE, COMPUTE STATS, and so on) rather than switching back and forth between Impala and Hive. Impala automatically broadcasts the results of DDL and DML operations to all Impala nodes in the cluster, but does not automatically recognize when such changes are made through Hive. After inserting data, adding a partition, or other operation in Hive, refresh the metadata for the table as described in REFRESH Statement.

Queries are slow to return results.

Some impalad instances may not have started. Using a browser, connect to the host running the Impala state store. Connect using an address of the form http://hostname:port/metrics.

  Note: Replace hostname and port with the hostname and port of your Impala state store host machine and web server port. The default port is 25010.
The number of impalad instances listed should match the expected number of impalad instances installed in the cluster. There should also be one impalad instance installed on each DataNode

Ensure Impala is installed on all DataNodes. Start any impalad instances that are not running.

Queries are slow to return results.

Impala may not be configured to use native checksumming. Native checksumming uses machine-specific instructions to compute checksums over HDFS data very quickly. Review Impala logs. If you find instances of "INFO util.NativeCodeLoader: Loaded the native-hadoop" messages, native checksumming is not enabled.

Ensure Impala is configured to use native checksumming as described in Post-Installation Configuration for Impala.

Queries are slow to return results.

Impala may not be configured to use data locality tracking.

Test Impala for data locality tracking and make configuration changes as necessary. Information on this process can be found in Post-Installation Configuration for Impala.

Attempts to complete Impala tasks such as executing INSERT-SELECT actions fail. The Impala logs include notes that files could not be opened due to permission denied.

This can be the result of permissions issues. For example, you could use the Hive shell as the hive user to create a table. After creating this table, you could attempt to complete some action, such as an INSERT-SELECT on the table. Because the table was created using one user and the INSERT-SELECT is attempted by another, this action may fail due to permissions issues.

In general, ensure the Impala user has sufficient permissions. In the preceding example, ensure the Impala user has sufficient permissions to the table that the Hive user created.

Impala fails to start up, with the impalad logs referring to errors connecting to the statestore service and attempts to re-register.

A large number of databases, tables, partitions, and so on can require metadata synchronization on startup that takes longer than the default timeout for the statestore service.

Increase the statestore timeout value above its default of 10 seconds. For instructions, see Increasing the Statestore Timeout.

Page generated September 3, 2015.