This is the documentation for Impala 2.1.x, included as part of CDH 5.3.x.
Latest Version of this Page | All Cloudera Docs


Specifies the maximum size of each Parquet data file produced by Impala INSERT statements.


Specify the size in bytes, or with a trailing m or g character to indicate megabytes or gigabytes. For example:

-- 128 megabytes.
set PARQUET_FILE_SIZE=134217728
INSERT OVERWRITE parquet_table SELECT * FROM text_table;

-- 512 megabytes.
INSERT OVERWRITE parquet_table SELECT * FROM text_table;

-- 1 gigabyte.
INSERT OVERWRITE parquet_table SELECT * FROM text_table;

Usage notes:

With tables that are small or finely partitioned, the default Parquet block size (formerly 1 GB, now 256 MB in Impala 2.0 and later) could be much larger than needed for each data file. For INSERT operations into such tables, you can increase parallelism by specifying a smaller PARQUET_FILE_SIZE value, resulting in more HDFS blocks that can be processed by different nodes.

Type: numeric, with optional unit specifier


Currently, the maximum value for this setting is 1 gigabyte (1g). Setting a value higher than 1 gigabyte could result in errors during an INSERT operation.

Default: 0 (produces files with a target size of 256 MB; files might be larger for very wide tables)

For information about the Parquet file format, and how the number and size of data files affects query performance, see Using the Parquet File Format with Impala Tables.

Page generated September 15, 2016.