Cloudera Impala Guide
Cloudera Impala provides high-performance, low-latency SQL queries on data
stored in popular Apache Hadoop file formats.
The fast response for queries enables interactive exploration and
fine-tuning of analytic queries, rather than long batch jobs
traditionally associated with SQL-on-Hadoop technologies.
(You will often see the term
Impala integrates with the Apache Hive metastore database, to share databases and tables between both components. The high level of integration with Hive, and compatibility with the HiveQL syntax, lets you use either Impala or Hive to create tables, issue queries, load data, and so on.
The following are some of the key advantages of Impala:
- Impala integrates with the existing CDH ecosystem, meaning data can be stored, shared, and accessed using the various solutions included with CDH. This also avoids data silos and minimizes expensive data movement.
- Impala provides access to data stored in CDH without requiring the Java skills required for MapReduce jobs. Impala can access data directly from the HDFS file system. Impala also provides a SQL front-end to access data in the HBase database system.
- Impala returns results typically within seconds or a few minutes, rather than the many minutes or hours that are often required for Hive queries to complete.
- Impala is pioneering the use of the Parquet file format, a columnar storage layout that is optimized for large-scale queries typical in data warehouse scenarios.
- Impala Concepts
- Planning for Impala Deployment
- Impala Tutorial
- Impala Administration
- Impala SQL Language Reference
- Using the Impala Shell (impala-shell Command)
- Tuning Impala for Performance
- Scalability Considerations for Impala
- How Impala Works with Hadoop File Formats
- Using Impala to Query HBase Tables
- Using Impala Logging
- Troubleshooting Impala
- Ports Used by Impala
- Impala Reserved Words
In CDH 5, the Impala documentation for Release Notes, Installation, Upgrading, and Security has been integrated alongside the corresponding information for other Hadoop components: