This is the documentation for Cloudera 5.4.x. Documentation for other versions is available at Cloudera Documentation.

SHOW Statement

The SHOW statement is a flexible way to get information about different types of Impala objects.

Syntax:

SHOW DATABASES [[LIKE] 'pattern']
SHOW SCHEMAS [[LIKE] 'pattern'] - an alias for SHOW DATABASES
SHOW TABLES [IN database_name] [[LIKE] 'pattern']
SHOW [AGGREGATE | ANALYTIC] FUNCTIONS [IN database_name] [[LIKE] 'pattern']
SHOW CREATE TABLE [database_name].table_name
SHOW TABLE STATS [database_name.]table_name
SHOW COLUMN STATS [database_name.]table_name
SHOW PARTITIONS [database_name.]table_name
SHOW FILES IN [database_name.]table_name [PARTITION (key_col=value [, key_col=value]]

SHOW ROLES
SHOW CURRENT ROLES
SHOW ROLE GRANT GROUP group_name
SHOW GRANT ROLE role_name

Issue a SHOW object_type statement to see the appropriate objects in the current database, or SHOW object_type IN database_name to see objects in a specific database.

The optional pattern argument is a quoted string literal, using Unix-style * wildcards and allowing | for alternation. The preceding LIKE keyword is also optional. All object names are stored in lowercase, so use all lowercase letters in the pattern string. For example:

show databases 'a*';
show databases like 'a*';
show tables in some_db like '*fact*';
use some_db;
show tables '*dim*|*fact*';

Cancellation: Cannot be cancelled.

Continue reading:

SHOW FILES Statement

The SHOW FILES statement displays the files that constitute a specified table, or a partition within a partitioned table. This syntax is available in CDH 5.4 and higher only. The output includes the names of the files, the size of each file, and the applicable partition for a partitioned table. The size includes a suffix of B for bytes, MB for megabytes, and GB for gigabytes.

  Note: This statement applies to tables and partitions stored on HDFS, or in the Amazon Simple Storage System (S3). It does not apply to views. It does not apply to tables mapped onto HBase, because HBase does not use the same file-based storage layout.

Usage notes:

You can use this statement to verify the results of your ETL process: that is, that the expected files are present, with the expected sizes. You can examine the file information to detect conditions such as empty files, missing files, or inefficient layouts due to a large number of small files. When you use INSERT statements to copy from one table to another, you can see how the file layout changes due to file format conversions, compaction of small input files into large data blocks, and multiple output files from parallel queries and partitioned inserts.

The output from this statement does not include files that Impala considers to be hidden or invisible, such as those whose names start with a dot or an underscore, or that end with the suffixes .copying or .tmp.

The information for partitioned tables complements the output of the SHOW PARTITIONS statement, which summarizes information about each partition. SHOW PARTITIONS produces some output for each partition, while SHOW FILES does not produce any output for empty partitions because they do not include any data files.

HDFS permissions:

The user ID that the impalad daemon runs under, typically the impala user, must have read permission for all the table files, read and execute permission for all the directories that make up the table, and execute permission for the database directory and all its parent directories.

Examples:

The following example constructs SHOW FILES statements for an unpartitioned tables using text format:

[localhost:21000] > create table unpartitioned_text (x bigint, s string);
[localhost:21000] > insert into unpartitioned_text (x, s) select id, name from oreilly.sample_data limit 20e6;
[localhost:21000] > show files in unpartitioned_text;
+-------------------------------------------------------------------------------------+----------+-----------+
| path                                                                                | size     | partition |
+-------------------------------------------------------------------------------------+----------+-----------+
| hdfs://impala_data_dir/show_files.db/unpartitioned_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB |           |
+-------------------------------------------------------------------------------------+----------+-----------+
[localhost:21000] > insert into unpartitioned_text (x, s) select id, name from oreilly.sample_data limit 100e6;
[localhost:21000] > show files in unpartitioned_text;
+---------------------------------------------------------------------------------------------+----------+-----------+
| path                                                                                        | size     | partition |
+---------------------------------------------------------------------------------------------+----------+-----------+
| hdfs://impala_data_dir/show_files.db/unpartitioned_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB |           |
| hdfs://impala_data_dir/show_files.db/unpartitioned_text/ac3dba252a8952b8_1663177415_data.0. | 2.19GB   |           |
+---------------------------------------------------------------------------------------------+----------+-----------+

This example illustrates how, after issuing some INSERT ... VALUES statements, the table now contains some tiny files of just a few bytes. Such small files could cause inefficient processing of parallel queries that are expecting multi-megabyte input files. The example shows how you might compact the small files by doing an INSERT ... SELECT into a different table, possibly converting the data to Parquet in the process:

[localhost:21000] > insert into unpartitioned_text values (10,'hello'), (20, 'world');
[localhost:21000] > insert into unpartitioned_text values (-1,'foo'), (-1000, 'bar');
[localhost:21000] > show files in unpartitioned_text;
+---------------------------------------------------------------------------------------------+----------+-----------+
| path                                                                                        | size     | partition |
+---------------------------------------------------------------------------------------------+----------+-----------+
| hdfs://impala_data_dir/show_files.db/unpartitioned_text/4f11b8bdf8b6aa92_238145083_data.0.  | 18B      |           |
| hdfs://impala_data_dir/show_files.db/unpartitioned_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB |           |
| hdfs://impala_data_dir/show_files.db/unpartitioned_text/ac3dba252a8952b8_1663177415_data.0. | 2.19GB   |           |
| hdfs://impala_data_dir/show_files.db/unpartitioned_text/cfb8252452445682_1868457216_data.0. | 17B      |           |
+---------------------------------------------------------------------------------------------+----------+-----------+
[localhost:21000] > create table unpartitioned_parquet stored as parquet as select * from unpartitioned_text;
+---------------------------+
| summary                   |
+---------------------------+
| Inserted 120000002 row(s) |
+---------------------------+
[localhost:21000] > show files in unpartitioned_parquet;
+----------------------------------------------------------------------------------------------------+----------+-----------+
| path                                                                                               | size     | partition |
+----------------------------------------------------------------------------------------------------+----------+-----------+
| hdfs://impala_data_dir/show_files.db/unpartitioned_parquet/60798d96ba630184_549959007_data.0.parq  | 255.36MB |           |
| hdfs://impala_data_dir/show_files.db/unpartitioned_parquet/60798d96ba630184_549959007_data.1.parq  | 178.52MB |           |
| hdfs://impala_data_dir/show_files.db/unpartitioned_parquet/60798d96ba630185_549959007_data.0.parq  | 255.37MB |           |
| hdfs://impala_data_dir/show_files.db/unpartitioned_parquet/60798d96ba630185_549959007_data.1.parq  | 57.71MB  |           |
| hdfs://impala_data_dir/show_files.db/unpartitioned_parquet/60798d96ba630186_2141167244_data.0.parq | 255.40MB |           |
| hdfs://impala_data_dir/show_files.db/unpartitioned_parquet/60798d96ba630186_2141167244_data.1.parq | 175.52MB |           |
| hdfs://impala_data_dir/show_files.db/unpartitioned_parquet/60798d96ba630187_1006832086_data.0.parq | 255.40MB |           |
| hdfs://impala_data_dir/show_files.db/unpartitioned_parquet/60798d96ba630187_1006832086_data.1.parq | 214.61MB |           |
+----------------------------------------------------------------------------------------------------+----------+-----------+

The following example shows a SHOW FILES statement for a partitioned text table with data in two different partitions, and two empty partitions. The partitions with no data are not represented in the SHOW FILES output.

[localhost:21000] > create table partitioned_text (x bigint, y int, s string) partitioned by (year bigint, month bigint, day bigint);
[localhost:21000] > insert overwrite partitioned_text (x, y, s) partition (year=2014,month=1,day=1) select id, val, name from oreilly.normalized_parquet
where id between 1 and 1000000;
[localhost:21000] > insert overwrite partitioned_text (x, y, s) partition (year=2014,month=1,day=2) select id, val, name from oreilly.normalized_parquet
where id between 1000001 and 2000000;
[localhost:21000] > alter table partitioned_text add partition (year=2014,month=1,day=3);
[localhost:21000] > alter table partitioned_text add partition (year=2014,month=1,day=4);
[localhost:21000] > show partitions partitioned_text;
+-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format | Incremental stats |
+-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
| 2014  | 1     | 1   | -1    | 4      | 25.16MB | NOT CACHED   | NOT CACHED        | TEXT   | false             |
| 2014  | 1     | 2   | -1    | 4      | 26.22MB | NOT CACHED   | NOT CACHED        | TEXT   | false             |
| 2014  | 1     | 3   | -1    | 0      | 0B      | NOT CACHED   | NOT CACHED        | TEXT   | false             |
| 2014  | 1     | 4   | -1    | 0      | 0B      | NOT CACHED   | NOT CACHED        | TEXT   | false             |
| Total |       |     | -1    | 8      | 51.38MB | 0B           |                   |        |                   |
+-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
[localhost:21000] > show files in partitioned_text;
+----------------------------------------------------------------------------------------------------------------+--------+-------------------------+
| path                                                                                                           | size   | partition               |
+----------------------------------------------------------------------------------------------------------------+--------+-------------------------+
| hdfs://impala_data_dir/show_files.db/partitioned_text/year=2014/month=1/day=1/80732d9dc80689f_1418645991_data.0.  | 5.77MB | year=2014/month=1/day=1 |
| hdfs://impala_data_dir/show_files.db/partitioned_text/year=2014/month=1/day=1/80732d9dc8068a0_1418645991_data.0.  | 6.25MB | year=2014/month=1/day=1 |
| hdfs://impala_data_dir/show_files.db/partitioned_text/year=2014/month=1/day=1/80732d9dc8068a1_147082319_data.0.   | 7.16MB | year=2014/month=1/day=1 |
| hdfs://impala_data_dir/show_files.db/partitioned_text/year=2014/month=1/day=1/80732d9dc8068a2_2111411753_data.0.  | 5.98MB | year=2014/month=1/day=1 |
| hdfs://impala_data_dir/show_files.db/partitioned_text/year=2014/month=1/day=2/21a828cf494b5bbb_501271652_data.0.  | 6.42MB | year=2014/month=1/day=2 |
| hdfs://impala_data_dir/show_files.db/partitioned_text/year=2014/month=1/day=2/21a828cf494b5bbc_501271652_data.0.  | 6.62MB | year=2014/month=1/day=2 |
| hdfs://impala_data_dir/show_files.db/partitioned_text/year=2014/month=1/day=2/21a828cf494b5bbd_1393490200_data.0. | 6.98MB | year=2014/month=1/day=2 |
| hdfs://impala_data_dir/show_files.db/partitioned_text/year=2014/month=1/day=2/21a828cf494b5bbe_1393490200_data.0. | 6.20MB | year=2014/month=1/day=2 |
+----------------------------------------------------------------------------------------------------------------+--------+-------------------------+

The following example shows a SHOW FILES statement for a partitioned Parquet table. The number and sizes of files are different from the equivalent partitioned text table used in the previous example, because INSERT operations for Parquet tables are parallelized differently than for text tables. (Also, the amount of data is so small that it can be written to Parquet without involving all the hosts in this 4-node cluster.)

[localhost:21000] > create table partitioned_parquet (x bigint, y int, s string) partitioned by (year bigint, month bigint, day bigint) stored as parquet;
[localhost:21000] > insert into partitioned_parquet partition (year,month,day) select x, y, s, year, month, day from partitioned_text;
[localhost:21000] > show partitions partitioned_parquet;
+-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  | Incremental stats |
+-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
| 2014  | 1     | 1   | -1    | 3      | 17.89MB | NOT CACHED   | NOT CACHED        | PARQUET | false             |
| 2014  | 1     | 2   | -1    | 3      | 17.89MB | NOT CACHED   | NOT CACHED        | PARQUET | false             |
| Total |       |     | -1    | 6      | 35.79MB | 0B           |                   |         |                   |
+-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
[localhost:21000] > show files in partitioned_parquet;
+---------------------------------------------------------------------------------------------------------+--------+-------------------------+
| path                                                                                                    | size   | partition               |
+---------------------------------------------------------------------------------------------------------+--------+-------------------------+
| hdfs://impala_data_dir/show_files.db/partitioned_parquet/year=2014/month=1/day=1/1134113650_data.0.parq | 4.49MB | year=2014/month=1/day=1 |
| hdfs://impala_data_dir/show_files.db/partitioned_parquet/year=2014/month=1/day=1/617567880_data.0.parq  | 5.14MB | year=2014/month=1/day=1 |
| hdfs://impala_data_dir/show_files.db/partitioned_parquet/year=2014/month=1/day=1/2099499416_data.0.parq | 8.27MB | year=2014/month=1/day=1 |
| hdfs://impala_data_dir/show_files.db/partitioned_parquet/year=2014/month=1/day=2/945567189_data.0.parq  | 8.80MB | year=2014/month=1/day=2 |
| hdfs://impala_data_dir/show_files.db/partitioned_parquet/year=2014/month=1/day=2/2145850112_data.0.parq | 4.80MB | year=2014/month=1/day=2 |
| hdfs://impala_data_dir/show_files.db/partitioned_parquet/year=2014/month=1/day=2/665613448_data.0.parq  | 4.29MB | year=2014/month=1/day=2 |
+---------------------------------------------------------------------------------------------------------+--------+-------------------------+

The following example shows output from the SHOW FILES statement for a table where the data files are stored in Amazon S3:

[localhost:21000] > show files in s3_testing.sample_data_s3;
+-----------------------------------------------------------------------+---------+-----------+
| path                                                                  | size    | partition |
+-----------------------------------------------------------------------+---------+-----------+
| s3a://impala-demo/sample_data/e065453cba1988a6_1733868553_data.0.parq | 24.84MB |           |
+-----------------------------------------------------------------------+---------+-----------+

SHOW ROLES Statement

The SHOW ROLES statement displays roles. This syntax is available in CDH 5.2 and later only, when you are using the Sentry authorization framework along with the Sentry service, as described in Using Impala with the Sentry Service (CDH 5.1 or higher only). It does not apply when you use the Sentry framework with privileges defined in a policy file.

Security considerations:

When authorization is enabled, the output of the SHOW statement is limited to those objects for which you have some privilege. There might be other database, tables, and so on, but their names are concealed. If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object. See Enabling Sentry Authorization for Impala for how to set up authorization and add privileges for specific kinds of objects.

Examples:

Depending on the roles set up within your organization by the CREATE ROLE statement, the output might look something like this:

show roles;
+-----------+
| role_name |
+-----------+
| analyst   |
| role1     |
| sales     |
| superuser |
| test_role |
+-----------+

HDFS permissions: This statement does not touch and HDFS files or directories, therefore no HDFS permissions are required.

Enabling Sentry Authorization for Impala

SHOW CURRENT ROLE

The SHOW CURRENT ROLE statement displays roles assigned to the current user. This syntax is available in CDH 5.2 and later only, when you are using the Sentry authorization framework along with the Sentry service, as described in Using Impala with the Sentry Service (CDH 5.1 or higher only). It does not apply when you use the Sentry framework with privileges defined in a policy file.

Security considerations:

When authorization is enabled, the output of the SHOW statement is limited to those objects for which you have some privilege. There might be other database, tables, and so on, but their names are concealed. If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object. See Enabling Sentry Authorization for Impala for how to set up authorization and add privileges for specific kinds of objects.

Examples:

Depending on the roles set up within your organization by the CREATE ROLE statement, the output might look something like this:

show current roles;
+-----------+
| role_name |
+-----------+
| role1     |
| superuser |
+-----------+

HDFS permissions: This statement does not touch and HDFS files or directories, therefore no HDFS permissions are required.

Enabling Sentry Authorization for Impala

SHOW ROLE GRANT Statement

The SHOW ROLE GRANT statement lists all the roles assigned to the specified group. This statement is only allowed for Sentry administrative users and others users that are part of the specified group. This syntax is available in CDH 5.2 and later only, when you are using the Sentry authorization framework along with the Sentry service, as described in Using Impala with the Sentry Service (CDH 5.1 or higher only). It does not apply when you use the Sentry framework with privileges defined in a policy file.

Security considerations:

When authorization is enabled, the output of the SHOW statement is limited to those objects for which you have some privilege. There might be other database, tables, and so on, but their names are concealed. If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object. See Enabling Sentry Authorization for Impala for how to set up authorization and add privileges for specific kinds of objects.

HDFS permissions: This statement does not touch and HDFS files or directories, therefore no HDFS permissions are required.

Enabling Sentry Authorization for Impala

SHOW GRANT ROLE Statement

The SHOW GRANT ROLE statement list all the grants for the given role name. This statement is only allowed for Sentry administrative users and other users that have been granted the specified role. This syntax is available in CDH 5.2 and later only, when you are using the Sentry authorization framework along with the Sentry service, as described in Using Impala with the Sentry Service (CDH 5.1 or higher only). It does not apply when you use the Sentry framework with privileges defined in a policy file.

Security considerations:

When authorization is enabled, the output of the SHOW statement is limited to those objects for which you have some privilege. There might be other database, tables, and so on, but their names are concealed. If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object. See Enabling Sentry Authorization for Impala for how to set up authorization and add privileges for specific kinds of objects.

HDFS permissions: This statement does not touch and HDFS files or directories, therefore no HDFS permissions are required.

Enabling Sentry Authorization for Impala

SHOW DATABASES

The SHOW DATABASES statement is often the first one you issue when connecting to an instance for the first time. You typically issue SHOW DATABASES to see the names you can specify in a USE db_name statement, then after switching to a database you issue SHOW TABLES to see the names you can specify in SELECT and INSERT statements.

The output of SHOW DATABASES includes the special _impala_builtins database, which lets you view definitions of built-in functions, as described under SHOW FUNCTIONS.

Security considerations:

When authorization is enabled, the output of the SHOW statement is limited to those objects for which you have some privilege. There might be other database, tables, and so on, but their names are concealed. If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object. See Enabling Sentry Authorization for Impala for how to set up authorization and add privileges for specific kinds of objects.

Examples:

This example shows how you might locate a particular table on an unfamiliar system. The DEFAULT database is the one you initially connect to; a database with that name is present on every system. You can issue SHOW TABLES IN db_name without going into a database, or SHOW TABLES once you are inside a particular database.

[localhost:21000] > show databases;
+--------------------+
| name               |
+--------------------+
| _impala_builtins   |
| analyze_testing    |
| avro               |
| ctas               |
| d1                 |
| d2                 |
| d3                 |
| default            |
| file_formats       |
| hbase              |
| load_data          |
| partitioning       |
| regexp_testing     |
| reports            |
| temporary          |
+--------------------+
Returned 14 row(s) in 0.02s
[localhost:21000] > show tables in file_formats;
+--------------------+
| name               |
+--------------------+
| parquet_table      |
| rcfile_table       |
| sequencefile_table |
| textfile_table     |
+--------------------+
Returned 4 row(s) in 0.01s
[localhost:21000] > use file_formats;
[localhost:21000] > show tables like '*parq*';
+--------------------+
| name               |
+--------------------+
| parquet_table      |
+--------------------+
Returned 1 row(s) in 0.01s

HDFS permissions: This statement does not touch and HDFS files or directories, therefore no HDFS permissions are required.

Databases, CREATE DATABASE Statement, DROP DATABASE Statement, USE Statement SHOW TABLES Statement, SHOW FUNCTIONS Statement

SHOW TABLES Statement

Displays the names of tables. By default, lists tables in the current database, or with the IN clause, in a specified database. By default, lists all tables, or with the LIKE clause, only those whose name match a pattern with * wildcards.

Security considerations:

When authorization is enabled, the output of the SHOW statement is limited to those objects for which you have some privilege. There might be other database, tables, and so on, but their names are concealed. If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object. See Enabling Sentry Authorization for Impala for how to set up authorization and add privileges for specific kinds of objects.

The user ID that the impalad daemon runs under, typically the impala user, must have read and execute permissions for all directories that are part of the table. (A table could span multiple different HDFS directories if it is partitioned. The directories could be widely scattered because a partition can reside in an arbitrary HDFS directory based on its LOCATION attribute.)

HDFS permissions: This statement does not touch and HDFS files or directories, therefore no HDFS permissions are required.

Tables, CREATE TABLE Statement, ALTER TABLE Statement, DROP TABLE Statement, DESCRIBE Statement, SHOW CREATE TABLE Statement, SHOW TABLE STATS Statement, SHOW DATABASES, SHOW FUNCTIONS Statement

SHOW CREATE TABLE Statement

As a schema changes over time, you might run a CREATE TABLE statement followed by several ALTER TABLE statements. To capture the cumulative effect of all those statements, SHOW CREATE TABLE displays a CREATE TABLE statement that would reproduce the current structure of a table. You can use this output in scripts that set up or clone a group of tables, rather than trying to reproduce the original sequence of CREATE TABLE and ALTER TABLE statements. When creating variations on the original table, or cloning the original table on a different system, you might need to edit the SHOW CREATE TABLE output to change things such as the database name, LOCATION field, and so on that might be different on the destination system.

Security considerations:

When authorization is enabled, the output of the SHOW statement is limited to those objects for which you have some privilege. There might be other database, tables, and so on, but their names are concealed. If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object. See Enabling Sentry Authorization for Impala for how to set up authorization and add privileges for specific kinds of objects.

HDFS permissions: This statement does not touch and HDFS files or directories, therefore no HDFS permissions are required.

Examples:

The following example shows how various clauses from the CREATE TABLE statement are represented in the output of SHOW CREATE TABLE.

create table show_create_table_demo (id int comment "Unique ID", y double, s string)
  partitioned by (year smallint)
  stored as parquet;

show create table show_create_table_demo;
+----------------------------------------------------------------------------------------+
| result                                                                                 |
+----------------------------------------------------------------------------------------+
| CREATE TABLE scratch.show_create_table_demo (                                          |
|   id INT COMMENT 'Unique ID',                                                          |
|   y DOUBLE,                                                                            |
|   s STRING                                                                             |
| )                                                                                      |
| PARTITIONED BY (                                                                       |
|   year SMALLINT                                                                        |
| )                                                                                      |
| STORED AS PARQUET                                                                      |
| LOCATION 'hdfs://127.0.0.1:8020/user/hive/warehouse/scratch.db/show_create_table_demo' |
| TBLPROPERTIES ('transient_lastDdlTime'='1418152582')                                   |
+----------------------------------------------------------------------------------------+

The following example shows how, after a sequence of ALTER TABLE statements, the output from SHOW CREATE TABLE represents the current state of the table. This output could be used to create a matching table rather than executing the original CREATE TABLE and sequence of ALTER TABLE statements.

alter table show_create_table_demo drop column s;
alter table show_create_table_demo set fileformat textfile;

show create table show_create_table_demo;
+----------------------------------------------------------------------------------------+
| result                                                                                 |
+----------------------------------------------------------------------------------------+
| CREATE TABLE scratch.show_create_table_demo (                                          |
|   id INT COMMENT 'Unique ID',                                                          |
|   y DOUBLE                                                                             |
| )                                                                                      |
| PARTITIONED BY (                                                                       |
|   year SMALLINT                                                                        |
| )                                                                                      |
| STORED AS TEXTFILE                                                                     |
| LOCATION 'hdfs://127.0.0.1:8020/user/hive/warehouse/demo.db/show_create_table_demo'    |
| TBLPROPERTIES ('transient_lastDdlTime'='1418152638')                                   |
+----------------------------------------------------------------------------------------+

CREATE TABLE Statement, DESCRIBE Statement, SHOW TABLES Statement

SHOW TABLE STATS Statement

The SHOW TABLE STATS and SHOW COLUMN STATS variants are important for tuning performance and diagnosing performance issues, especially with the largest tables and the most complex join queries.

Any values that are not available (because the COMPUTE STATS statement has not been run yet) are displayed as -1.

SHOW TABLE STATS provides some general information about the table, such as the number of files, overall size of the data, whether some or all of the data is in the HDFS cache, and the file format, that is useful whether or not you have run the COMPUTE STATS statement. A -1 in the #Rows output column indicates that the COMPUTE STATS statement has never been run for this table. If the table is partitioned, SHOW TABLE STATS provides this information for each partition. (It produces the same output as the SHOW PARTITIONS statement in this case.)

The output of SHOW COLUMN STATS is primarily only useful after the COMPUTE STATS statement has been run on the table. A -1 in the #Distinct Values output column indicates that the COMPUTE STATS statement has never been run for this table. Currently, Impala always leaves the #Nulls column as -1, even after COMPUTE STATS has been run.

These SHOW statements work on actual tables only, not on views.

Security considerations:

When authorization is enabled, the output of the SHOW statement is limited to those objects for which you have some privilege. There might be other database, tables, and so on, but their names are concealed. If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object. See Enabling Sentry Authorization for Impala for how to set up authorization and add privileges for specific kinds of objects.

Examples:

The following examples show how the SHOW TABLE STATS statement displays physical information about a table and the associated data files:

show table stats store_sales;
+-------+--------+----------+--------------+--------+-------------------+
| #Rows | #Files | Size     | Bytes Cached | Format | Incremental stats |
+-------+--------+----------+--------------+--------+-------------------+
| -1    | 1      | 370.45MB | NOT CACHED   | TEXT   | false             |
+-------+--------+----------+--------------+--------+-------------------+

show table stats customer;
+-------+--------+---------+--------------+--------+-------------------+
| #Rows | #Files | Size    | Bytes Cached | Format | Incremental stats |
+-------+--------+---------+--------------+--------+-------------------+
| -1    | 1      | 12.60MB | NOT CACHED   | TEXT   | false             |
+-------+--------+---------+--------------+--------+-------------------+

The following example shows how, after a COMPUTE STATS or COMPUTE INCREMENTAL STATS statement, the #Rows field is now filled in. Because the STORE_SALES table in this example is not partitioned, the COMPUTE INCREMENTAL STATS statement produces regular stats rather than incremental stats, therefore the Incremental stats field remains false.

compute stats customer;
+------------------------------------------+
| summary                                  |
+------------------------------------------+
| Updated 1 partition(s) and 18 column(s). |
+------------------------------------------+

show table stats customer;
+--------+--------+---------+--------------+--------+-------------------+
| #Rows  | #Files | Size    | Bytes Cached | Format | Incremental stats |
+--------+--------+---------+--------------+--------+-------------------+
| 100000 | 1      | 12.60MB | NOT CACHED   | TEXT   | false             |
+--------+--------+---------+--------------+--------+-------------------+

compute incremental stats store_sales;
+------------------------------------------+
| summary                                  |
+------------------------------------------+
| Updated 1 partition(s) and 23 column(s). |
+------------------------------------------+

show table stats store_sales;
+---------+--------+----------+--------------+--------+-------------------+
| #Rows   | #Files | Size     | Bytes Cached | Format | Incremental stats |
+---------+--------+----------+--------------+--------+-------------------+
| 2880404 | 1      | 370.45MB | NOT CACHED   | TEXT   | false             |
+---------+--------+----------+--------------+--------+-------------------+

HDFS permissions:

The user ID that the impalad daemon runs under, typically the impala user, must have read and execute permissions for all directories that are part of the table. (A table could span multiple different HDFS directories if it is partitioned. The directories could be widely scattered because a partition can reside in an arbitrary HDFS directory based on its LOCATION attribute.) The Impala user must also have execute permission for the database directory, and any parent directories of the database directory in HDFS.

COMPUTE STATS Statement, SHOW COLUMN STATS Statement

See How Impala Uses Statistics for Query Optimization for usage information and examples.

SHOW COLUMN STATS Statement

The SHOW TABLE STATS and SHOW COLUMN STATS variants are important for tuning performance and diagnosing performance issues, especially with the largest tables and the most complex join queries.

Security considerations:

When authorization is enabled, the output of the SHOW statement is limited to those objects for which you have some privilege. There might be other database, tables, and so on, but their names are concealed. If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object. See Enabling Sentry Authorization for Impala for how to set up authorization and add privileges for specific kinds of objects.

Examples:

The following examples show the output of the SHOW COLUMN STATS statement for some tables, before the COMPUTE STATS statement is run. Impala deduces some information, such as maximum and average size for fixed-length columns, and leaves and unknown values as -1.

show column stats customer;
+------------------------+--------+------------------+--------+----------+----------+
| Column                 | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
+------------------------+--------+------------------+--------+----------+----------+
| c_customer_sk          | INT    | -1               | -1     | 4        | 4        |
| c_customer_id          | STRING | -1               | -1     | -1       | -1       |
| c_current_cdemo_sk     | INT    | -1               | -1     | 4        | 4        |
| c_current_hdemo_sk     | INT    | -1               | -1     | 4        | 4        |
| c_current_addr_sk      | INT    | -1               | -1     | 4        | 4        |
| c_first_shipto_date_sk | INT    | -1               | -1     | 4        | 4        |
| c_first_sales_date_sk  | INT    | -1               | -1     | 4        | 4        |
| c_salutation           | STRING | -1               | -1     | -1       | -1       |
| c_first_name           | STRING | -1               | -1     | -1       | -1       |
| c_last_name            | STRING | -1               | -1     | -1       | -1       |
| c_preferred_cust_flag  | STRING | -1               | -1     | -1       | -1       |
| c_birth_day            | INT    | -1               | -1     | 4        | 4        |
| c_birth_month          | INT    | -1               | -1     | 4        | 4        |
| c_birth_year           | INT    | -1               | -1     | 4        | 4        |
| c_birth_country        | STRING | -1               | -1     | -1       | -1       |
| c_login                | STRING | -1               | -1     | -1       | -1       |
| c_email_address        | STRING | -1               | -1     | -1       | -1       |
| c_last_review_date     | STRING | -1               | -1     | -1       | -1       |
+------------------------+--------+------------------+--------+----------+----------+

show column stats store_sales;
+-----------------------+-------+------------------+--------+----------+----------+
| Column                | Type  | #Distinct Values | #Nulls | Max Size | Avg Size |
+-----------------------+-------+------------------+--------+----------+----------+
| ss_sold_date_sk       | INT   | -1               | -1     | 4        | 4        |
| ss_sold_time_sk       | INT   | -1               | -1     | 4        | 4        |
| ss_item_sk            | INT   | -1               | -1     | 4        | 4        |
| ss_customer_sk        | INT   | -1               | -1     | 4        | 4        |
| ss_cdemo_sk           | INT   | -1               | -1     | 4        | 4        |
| ss_hdemo_sk           | INT   | -1               | -1     | 4        | 4        |
| ss_addr_sk            | INT   | -1               | -1     | 4        | 4        |
| ss_store_sk           | INT   | -1               | -1     | 4        | 4        |
| ss_promo_sk           | INT   | -1               | -1     | 4        | 4        |
| ss_ticket_number      | INT   | -1               | -1     | 4        | 4        |
| ss_quantity           | INT   | -1               | -1     | 4        | 4        |
| ss_wholesale_cost     | FLOAT | -1               | -1     | 4        | 4        |
| ss_list_price         | FLOAT | -1               | -1     | 4        | 4        |
| ss_sales_price        | FLOAT | -1               | -1     | 4        | 4        |
| ss_ext_discount_amt   | FLOAT | -1               | -1     | 4        | 4        |
| ss_ext_sales_price    | FLOAT | -1               | -1     | 4        | 4        |
| ss_ext_wholesale_cost | FLOAT | -1               | -1     | 4        | 4        |
| ss_ext_list_price     | FLOAT | -1               | -1     | 4        | 4        |
| ss_ext_tax            | FLOAT | -1               | -1     | 4        | 4        |
| ss_coupon_amt         | FLOAT | -1               | -1     | 4        | 4        |
| ss_net_paid           | FLOAT | -1               | -1     | 4        | 4        |
| ss_net_paid_inc_tax   | FLOAT | -1               | -1     | 4        | 4        |
| ss_net_profit         | FLOAT | -1               | -1     | 4        | 4        |
+-----------------------+-------+------------------+--------+----------+----------+

The following examples show the output of the SHOW COLUMN STATS statement for some tables, after the COMPUTE STATS statement is run. Now most of the -1 values are changed to reflect the actual table data. The #Nulls column remains -1 because Impala does not use the number of NULL values to influence query planning.

compute stats customer;
+------------------------------------------+
| summary                                  |
+------------------------------------------+
| Updated 1 partition(s) and 18 column(s). |
+------------------------------------------+

compute stats store_sales;
+------------------------------------------+
| summary                                  |
+------------------------------------------+
| Updated 1 partition(s) and 23 column(s). |
+------------------------------------------+

show column stats customer;
+------------------------+--------+------------------+--------+----------+--------+
| Column                 | Type   | #Distinct Values | #Nulls | Max Size | Avg Size
+------------------------+--------+------------------+--------+----------+--------+
| c_customer_sk          | INT    | 139017           | -1     | 4        | 4      |
| c_customer_id          | STRING | 111904           | -1     | 16       | 16     |
| c_current_cdemo_sk     | INT    | 95837            | -1     | 4        | 4      |
| c_current_hdemo_sk     | INT    | 8097             | -1     | 4        | 4      |
| c_current_addr_sk      | INT    | 57334            | -1     | 4        | 4      |
| c_first_shipto_date_sk | INT    | 4374             | -1     | 4        | 4      |
| c_first_sales_date_sk  | INT    | 4409             | -1     | 4        | 4      |
| c_salutation           | STRING | 7                | -1     | 4        | 3.1308 |
| c_first_name           | STRING | 3887             | -1     | 11       | 5.6356 |
| c_last_name            | STRING | 4739             | -1     | 13       | 5.9106 |
| c_preferred_cust_flag  | STRING | 3                | -1     | 1        | 0.9656 |
| c_birth_day            | INT    | 31               | -1     | 4        | 4      |
| c_birth_month          | INT    | 12               | -1     | 4        | 4      |
| c_birth_year           | INT    | 71               | -1     | 4        | 4      |
| c_birth_country        | STRING | 205              | -1     | 20       | 8.4001 |
| c_login                | STRING | 1                | -1     | 0        | 0      |
| c_email_address        | STRING | 94492            | -1     | 46       | 26.485 |
| c_last_review_date     | STRING | 349              | -1     | 7        | 6.7561 |
+------------------------+--------+------------------+--------+----------+--------+

show column stats store_sales;
+-----------------------+-------+------------------+--------+----------+----------+
| Column                | Type  | #Distinct Values | #Nulls | Max Size | Avg Size |
+-----------------------+-------+------------------+--------+----------+----------+
| ss_sold_date_sk       | INT   | 4395             | -1     | 4        | 4        |
| ss_sold_time_sk       | INT   | 63617            | -1     | 4        | 4        |
| ss_item_sk            | INT   | 19463            | -1     | 4        | 4        |
| ss_customer_sk        | INT   | 122720           | -1     | 4        | 4        |
| ss_cdemo_sk           | INT   | 242982           | -1     | 4        | 4        |
| ss_hdemo_sk           | INT   | 8097             | -1     | 4        | 4        |
| ss_addr_sk            | INT   | 70770            | -1     | 4        | 4        |
| ss_store_sk           | INT   | 6                | -1     | 4        | 4        |
| ss_promo_sk           | INT   | 355              | -1     | 4        | 4        |
| ss_ticket_number      | INT   | 304098           | -1     | 4        | 4        |
| ss_quantity           | INT   | 105              | -1     | 4        | 4        |
| ss_wholesale_cost     | FLOAT | 9600             | -1     | 4        | 4        |
| ss_list_price         | FLOAT | 22191            | -1     | 4        | 4        |
| ss_sales_price        | FLOAT | 20693            | -1     | 4        | 4        |
| ss_ext_discount_amt   | FLOAT | 228141           | -1     | 4        | 4        |
| ss_ext_sales_price    | FLOAT | 433550           | -1     | 4        | 4        |
| ss_ext_wholesale_cost | FLOAT | 406291           | -1     | 4        | 4        |
| ss_ext_list_price     | FLOAT | 574871           | -1     | 4        | 4        |
| ss_ext_tax            | FLOAT | 91806            | -1     | 4        | 4        |
| ss_coupon_amt         | FLOAT | 228141           | -1     | 4        | 4        |
| ss_net_paid           | FLOAT | 493107           | -1     | 4        | 4        |
| ss_net_paid_inc_tax   | FLOAT | 653523           | -1     | 4        | 4        |
| ss_net_profit         | FLOAT | 611934           | -1     | 4        | 4        |
+-----------------------+-------+------------------+--------+----------+----------+

HDFS permissions:

The user ID that the impalad daemon runs under, typically the impala user, must have read and execute permissions for all directories that are part of the table. (A table could span multiple different HDFS directories if it is partitioned. The directories could be widely scattered because a partition can reside in an arbitrary HDFS directory based on its LOCATION attribute.) The Impala user must also have execute permission for the database directory, and any parent directories of the database directory in HDFS.

COMPUTE STATS Statement, SHOW TABLE STATS Statement

See How Impala Uses Statistics for Query Optimization for usage information and examples.

SHOW PARTITIONS Statement

SHOW PARTITIONS displays information about each partition for a partitioned table. (The output is the same as the SHOW TABLE STATS statement, but SHOW PARTITIONS only works on a partitioned table.) Because it displays table statistics for all partitions, the output is more informative if you have run the COMPUTE STATS statement after creating all the partitions. See COMPUTE STATS Statement for details. For example, on a CENSUS table partitioned on the YEAR column:

Security considerations:

When authorization is enabled, the output of the SHOW statement is limited to those objects for which you have some privilege. There might be other database, tables, and so on, but their names are concealed. If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object. See Enabling Sentry Authorization for Impala for how to set up authorization and add privileges for specific kinds of objects.

Examples:

[localhost:21000] > show partitions census;
+-------+-------+--------+------+---------+
| year  | #Rows | #Files | Size | Format  |
+-------+-------+--------+------+---------+
| 2000  | -1    | 0      | 0B   | TEXT    |
| 2004  | -1    | 0      | 0B   | TEXT    |
| 2008  | -1    | 0      | 0B   | TEXT    |
| 2010  | -1    | 0      | 0B   | TEXT    |
| 2011  | 4     | 1      | 22B  | TEXT    |
| 2012  | 4     | 1      | 22B  | TEXT    |
| 2013  | 1     | 1      | 231B | PARQUET |
| Total | 9     | 3      | 275B |         |
+-------+-------+--------+------+---------+

HDFS permissions:

The user ID that the impalad daemon runs under, typically the impala user, must have read and execute permissions for all directories that are part of the table. (A table could span multiple different HDFS directories if it is partitioned. The directories could be widely scattered because a partition can reside in an arbitrary HDFS directory based on its LOCATION attribute.) The Impala user must also have execute permission for the database directory, and any parent directories of the database directory in HDFS.

See How Impala Uses Statistics for Query Optimization for usage information and examples.

SHOW TABLE STATS Statement, Partitioning for Impala Tables

SHOW FUNCTIONS Statement

By default, SHOW FUNCTIONS displays user-defined functions (UDFs) and SHOW AGGREGATE FUNCTIONS displays user-defined aggregate functions (UDAFs) associated with a particular database. The output from SHOW FUNCTIONS includes the argument signature of each function. You specify this argument signature as part of the DROP FUNCTION statement. You might have several UDFs with the same name, each accepting different argument data types.

Security considerations:

When authorization is enabled, the output of the SHOW statement is limited to those objects for which you have some privilege. There might be other database, tables, and so on, but their names are concealed. If you believe an object exists but you cannot see it in the SHOW output, check with the system administrator if you need to be granted a new privilege for that object. See Enabling Sentry Authorization for Impala for how to set up authorization and add privileges for specific kinds of objects.

HDFS permissions: This statement does not touch and HDFS files or directories, therefore no HDFS permissions are required.

Examples:

To display Impala built-in functions, specify the special database name _impala_builtins:

show functions in _impala_builtins;
+----------------+----------------------------------------+
| return type    | signature                              |
+----------------+----------------------------------------+
| BOOLEAN        | ifnull(BOOLEAN, BOOLEAN)               |
| TINYINT        | ifnull(TINYINT, TINYINT)               |
| SMALLINT       | ifnull(SMALLINT, SMALLINT)             |
| INT            | ifnull(INT, INT)                       |
...

show functions in _impala_builtins like '*week*';
+-------------+------------------------------+
| return type | signature                    |
+-------------+------------------------------+
| INT         | weekofyear(TIMESTAMP)        |
| TIMESTAMP   | weeks_add(TIMESTAMP, INT)    |
| TIMESTAMP   | weeks_add(TIMESTAMP, BIGINT) |
| TIMESTAMP   | weeks_sub(TIMESTAMP, INT)    |
| TIMESTAMP   | weeks_sub(TIMESTAMP, BIGINT) |
| INT         | dayofweek(TIMESTAMP)         |
+-------------+------------------------------+

To search for functions that use a particular data type, specify a case-sensitive data type name in all capitals:

show functions in _impala_builtins like '*BIGINT*';
+----------------------------------------+
| name                                   |
+----------------------------------------+
| adddate(TIMESTAMP, BIGINT)             |
| bin(BIGINT)                            |
| coalesce(BIGINT...)                    |
...

Functions, Built-In Functions, Impala User-Defined Functions (UDFs), SHOW DATABASES, SHOW TABLES Statement