Cloudera Impala User Guide
Cloudera Impala provides high-performance, low-latency SQL queries on data
stored in popular Apache Hadoop file formats.
The fast response for queries enables interactive exploration and
fine-tuning of analytic queries, rather than long batch jobs
traditionally associated with SQL-on-Hadoop technologies.
(You will often see the term
Impala integrates with the Apache Hive metastore database, to share databases and tables between both components. The high level of integration with Hive, and compatibility with the HiveQL syntax, lets you use either Impala or Hive to create tables, issue queries, load data, and so on.
The following are some of the key advantages of Impala:
- Impala integrates with the existing CDH ecosystem, meaning data can be stored, shared, and accessed using the various solutions included with CDH. This also avoids data silos and minimizes expensive data movement.
- Impala provides access to data stored in CDH without requiring the Java skills required for MapReduce jobs. Impala can access data directly from the HDFS file system. Impala also provides a SQL front-end to access data in the HBase database system.
- Impala returns results typically within seconds or a few minutes, rather than the many minutes or hours that are often required for Hive queries to complete.
- Impala is pioneering the use of the Parquet file format, a columnar storage layout that is optimized for large-scale queries typical in data warehouse scenarios.
In CDH 5, the Impala documentation for Release Notes, Installation, Upgrading, and Security has been integrated alongside the corresponding information for other Hadoop components: