RUNTIME_BLOOM_FILTER_SIZE Query Option
Size (in bytes) of Bloom filter data structure used by the runtime filtering feature.
Default: 1048576 (1 MB)
Maximum: 16 MB
Added in: CDH 5.7.0 (Impala 2.5.0)
This setting affects optimizations for large and complex queries, such as dynamic partition pruning for partitioned tables, and join optimization for queries that join large tables. Larger filters are more effective at handling higher cardinality input sets, but consume more memory per filter.
If your query filters on high-cardinality columns (for example, millions of different values) and you do not get the expected speedup from the runtime filtering mechanism, consider doing some benchmarks with a higher value for RUNTIME_BLOOM_FILTER_SIZE. The extra memory devoted to the Bloom filter data structures can help make the filtering more accurate.
Because the runtime filtering feature is enabled by default only for local processing, the other filtering-related query options have the greatest effect when used in combination with the setting RUNTIME_FILTER_MODE=GLOBAL.
Because the runtime filtering feature applies mainly to resource-intensive and long-running queries, only adjust this query option when tuning long-running queries involving some combination of large partitioned tables and joins involving large tables.
Because the effectiveness of this setting depends so much on query characteristics and data distribution, you typically only use it for specific queries that need some extra tuning, and the ideal value depends on the query. Consider setting this query option immediately before the expensive query and unsetting it immediately afterward.