Impala provides high-performance, low-latency SQL queries on data stored in popular Apache Hadoop file formats. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies. (You will often see the term "interactive" applied to these kinds of fast queries with human-scale response times.)
Impala integrates with the Apache Hive metastore database, to share databases and tables between both components. The high level of integration with Hive, and compatibility with the HiveQL syntax, lets you use either Impala or Hive to create tables, issue queries, load data, and so on.
The following are some of the key advantages of Impala:
- Impala integrates with the existing CDH ecosystem, meaning data can be stored, shared, and accessed using the various solutions included with CDH. This also avoids data silos and minimizes expensive data movement.
- Impala provides access to data stored in CDH without requiring the Java skills required for MapReduce jobs. Impala can access data directly from the HDFS file system. Impala also provides a SQL front-end to access data in the HBase database system, or in the Amazon Simple Storage System (S3).
- Impala returns results typically within seconds or a few minutes, rather than the many minutes or hours that are often required for Hive queries to complete.
- Impala is pioneering the use of the Parquet file format, a columnar storage layout that is optimized for large-scale queries typical in data warehouse scenarios.
- Impala Concepts and Architecture
- Planning for Impala Deployment
- Impala Tutorials
- Impala Administration
- Impala SQL Language Reference
- Using the Impala Shell (impala-shell Command)
- Tuning Impala for Performance
- Scalability Considerations for Impala
- Partitioning for Impala Tables
- How Impala Works with Hadoop File Formats
- Using Impala to Query HBase Tables
- Using Impala to Query the Amazon S3 Filesystem (Unsupported Preview)
- Using Impala with Isilon Storage
- Using Impala Logging
- Troubleshooting Impala
- Ports Used by Impala
- Impala Reserved Words
Related information throughout the CDH 5 library:
In CDH 5, the Impala documentation for Release Notes, Installation, Upgrading, and Security has been integrated alongside the corresponding information for other Hadoop components: