Scaling Limits and Guidelines

This topic lists the scalability limitation in Impala. For a given functional feature, it is recommended that you respect these limitations to achieve optimal scalability and performance. For example, while you might be able to create a table with 2000 columns, you will experience performance problems while querying the table. This topic does not cover functional limitations in Impala.

Unless noted otherwise, the limits were tested and certified.

The limits noted as "generally safe" are not certified, but recommended as generally safe. A safe range is not a hard limit as unforeseen errors or troubles in your particular environment can affect the range.

Deployment Limits

  • Number of Impalad Executors
    • 80 nodes in CDH 5.14 and lower
    • 150 nodes in CDH 5.15 and higher
  • Number of Impalad Coordinators: 1 coordinator for at most every 50 executors

    See Dedicated Coordinators for details.

  • The number of Impala clusters per deployment
    • 1 Impala cluster in Impala 3.1 and lower
    • Multiple clusters in Impala 3.2 and higher is generally safe.

Data Storage Limits

There are no hard limits for the following, but you will experience gradual performance degradation as you increase these numbers.

  • Number of databases
  • Number of tables - total, per database
  • Number of partitions - total, per table
  • Number of files - total, per table, per table per partition
  • Number of views - total, per database
  • Number of user-defined functions - total, per database
  • Parquet
    • Number of columns per row group
    • Number of row groups per block
    • Number of HDFS blocks per file

Schema Design Limits

  • Number of columns

Security Limits

  • Number of roles: 10,000 for Sentry

Query Limits - Compile Time

  • Maximum number of columns in a query, included in a SELECT list, INSERT, and in an expression: no limit
  • Number of tables referenced: no limit
  • Number of plan nodes: no limit
  • Number of plan fragments: no limit
  • Depth of expression tree: 1000 hard limit
  • Width of expression tree: 10,000 hard limit

Query Limits - Runtime Time

  • Codegen
    • Very deeply nested expressions within queries can exceed internal Impala limits, leading to excessive memory usage. Setting the query option disable_codegen=true may reduce the impact, at a cost of longer query runtime.