Overview of Impala Views
Views are lightweight logical constructs that act as aliases for queries. You can specify a view name in a query (a SELECT statement or the SELECT portion of an INSERT statement) where you would usually specify a table name.
A view lets you:
- Issue complicated queries with compact and simple syntax:
-- Take a complicated reporting query, plug it into a CREATE VIEW statement... create view v1 as select c1, c2, avg(c3) from t1 group by c3 order by c1 desc limit 10; -- ... and now you can produce the report with 1 line of code. select * from v1;
- Reduce maintenance, by avoiding the duplication of complicated queries across multiple applications in multiple languages:
create view v2 as select t1.c1, t1.c2, t2.c3 from t1 join t2 on (t1.id = t2.id); -- This simple query is safer to embed in reporting applications than the longer query above. -- The view definition can remain stable even if the structure of the underlying tables changes. select c1, c2, c3 from v2;
- Build a new, more refined query on top of the original query by adding new clauses, select-list expressions, function calls, and so on:
create view average_price_by_category as select category, avg(price) as avg_price from products group by category; create view expensive_categories as select category, avg_price from average_price_by_category order by avg_price desc limit 10000; create view top_10_expensive_categories as select category, avg_price from expensive_categories limit 10;This technique lets you build up several more or less granular variations of the same query, and switch between them when appropriate.
- Set up aliases with intuitive names for tables, columns, result sets from joins, and so on:
-- The original tables might have cryptic names inherited from a legacy system. create view action_items as select rrptsk as assignee, treq as due_date, dmisc as notes from vxy_t1_br; -- You can leave original names for compatibility, build new applications using more intuitive ones. select assignee, due_date, notes from action_items;
- Swap tables with others that use different file formats, partitioning schemes, and so on without any downtime for data copying or conversion:
create table slow (x int, s string) stored as textfile; create view report as select s from slow where x between 20 and 30; -- Query is kind of slow due to inefficient table definition, but it works. select * from report; create table fast (s string) partitioned by (x int) stored as parquet; -- ...Copy data from SLOW to FAST. Queries against REPORT view continue to work... -- After changing the view definition, queries will be faster due to partitioning, -- binary format, and compression in the new table. alter view report as select s from fast where x between 20 and 30; select * from report;
- Avoid coding lengthy subqueries and repeating the same subquery text in many other queries.
- Set up fine-grained security where a user can query some columns from a table but not other columns. Because CDH 5.5 / Impala 2.3 and higher support column-level authorization, this technique is no longer required. If you formerly implemented column-level security through views, see Hive SQL Syntax for Use with Sentry for details about the column-level authorization feature.
The SQL statements that configure views are CREATE VIEW Statement, ALTER VIEW Statement, and DROP VIEW Statement. You can specify view names when querying data (SELECT Statement) and copying data from one table to another (INSERT Statement). The WITH clause creates an inline view, that only exists for the duration of a single query.
[localhost:21000] > create view trivial as select * from customer; [localhost:21000] > create view some_columns as select c_first_name, c_last_name, c_login from customer; [localhost:21000] > select * from some_columns limit 5; Query finished, fetching results ... +--------------+-------------+---------+ | c_first_name | c_last_name | c_login | +--------------+-------------+---------+ | Javier | Lewis | | | Amy | Moses | | | Latisha | Hamilton | | | Michael | White | | | Robert | Moran | | +--------------+-------------+---------+ [localhost:21000] > create view ordered_results as select * from some_columns order by c_last_name desc, c_first_name desc limit 1000; [localhost:21000] > select * from ordered_results limit 5; Query: select * from ordered_results limit 5 Query finished, fetching results ... +--------------+-------------+---------+ | c_first_name | c_last_name | c_login | +--------------+-------------+---------+ | Thomas | Zuniga | | | Sarah | Zuniga | | | Norma | Zuniga | | | Lloyd | Zuniga | | | Lisa | Zuniga | | +--------------+-------------+---------+ Returned 5 row(s) in 0.48s
The previous example uses descending order for ORDERED_RESULTS because in the sample TPCD-H data, there are some rows with empty strings for both C_FIRST_NAME and C_LAST_NAME, making the lowest-ordered names unuseful in a sample query.
create view visitors_by_day as select day, count(distinct visitors) as howmany from web_traffic group by day; create view top_10_days as select day, howmany from visitors_by_day order by howmany limit 10; select * from top_10_days;
[localhost:21000] > create view v1 as select * from t1; [localhost:21000] > describe formatted v1; Query finished, fetching results ... +------------------------------+------------------------------+------------+ | name | type | comment | +------------------------------+------------------------------+------------+ | # col_name | data_type | comment | | | NULL | NULL | | x | int | None | | y | int | None | | s | string | None | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | views | NULL | | Owner: | doc_demo | NULL | | CreateTime: | Mon Jul 08 15:56:27 EDT 2013 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Protect Mode: | None | NULL | | Retention: | 0 | NULL | | Table Type: | VIRTUAL_VIEW | NULL | | Table Parameters: | NULL | NULL | | | transient_lastDdlTime | 1373313387 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | null | NULL | | InputFormat: | null | NULL | | OutputFormat: | null | NULL | | Compressed: | No | NULL | | Num Buckets: | 0 | NULL | | Bucket Columns: |  | NULL | | Sort Columns: |  | NULL | | | NULL | NULL | | # View Information | NULL | NULL | | View Original Text: | SELECT * FROM t1 | NULL | | View Expanded Text: | SELECT * FROM t1 | NULL | +------------------------------+------------------------------+------------+
Prior to Impala 1.4.0, it was not possible to use the CREATE TABLE LIKE view_name syntax. In Impala 1.4.0 and higher, you can create a table with the same column definitions as a view using the CREATE TABLE LIKE technique. Although CREATE TABLE LIKE normally inherits the file format of the original table, a view has no underlying file format, so CREATE TABLE LIKE view_name produces a text table by default. To specify a different file format, include a STORED AS file_format clause at the end of the CREATE TABLE LIKE statement.
Complex type considerations:
For tables containing complex type columns (ARRAY, STRUCT, or MAP), you typically use join queries to refer to the complex values. You can use views to hide the join notation, making such tables seem like traditional denormalized tables, and making those tables queryable by business intelligence tools that do not have built-in support for those complex types. See Accessing Complex Type Data in Flattened Form Using Views for details.
You cannot insert into an Impala view. (In some database systems, this operation is allowed and inserts rows into the base table.) You can use a view name on the right-hand side of an INSERT statement, in the SELECT part.
If a view applies to a partitioned table, any partition pruning considers the clauses on both the original query and any additional WHERE predicates in the query that refers to the view. Prior to Impala 1.4, only the WHERE clauses on the original query from the CREATE VIEW statement were used for partition pruning.
An ORDER BY clause without an additional LIMIT clause is ignored in any view definition. If you need to sort the entire result set from a view, use an ORDER BY clause in the SELECT statement that queries the view. You can still make a simple "top 10" report by combining the ORDER BY and LIMIT clauses in the same view definition:
[localhost:21000] > create table unsorted (x bigint); [localhost:21000] > insert into unsorted values (1), (9), (3), (7), (5), (8), (4), (6), (2); [localhost:21000] > create view sorted_view as select x from unsorted order by x; [localhost:21000] > select x from sorted_view; -- ORDER BY clause in view has no effect. +---+ | x | +---+ | 1 | | 9 | | 3 | | 7 | | 5 | | 8 | | 4 | | 6 | | 2 | +---+ [localhost:21000] > select x from sorted_view order by x; -- View query requires ORDER BY at outermost level. +---+ | x | +---+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | +---+ [localhost:21000] > create view top_3_view as select x from unsorted order by x limit 3; [localhost:21000] > select x from top_3_view; -- ORDER BY and LIMIT together in view definition are preserved. +---+ | x | +---+ | 1 | | 2 | | 3 | +---+
The TABLESAMPLE clause of the SELECT statement does not apply to a table reference derived from a view, a subquery, or anything other than a real base table. This clause only works for tables backed by HDFS or HDFS-like data files, therefore it does not apply to Kudu or HBase tables.