Hive Table Statistics

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

If your cluster has Impala then you can use the Impala implementation to compute statistics. The Impala implementation to compute table statistics is available in CDH 5.0.0 or higher and in Impala version 1.2.2 or higher. The Impala implementation of COMPUTE STATS requires no setup steps and is preferred over the Hive implementation. See Overview of Table Statistics. If you are running an older version of Impala, you can collect statistics on a Hive table by running the following command from a Beeline client connected to HiveServer2:
analyze table <table name> compute statistics;
analyze table <table name> compute statistics for columns <all columns of a table>;