This is the documentation for Cloudera Impala 2.1.x.
Documentation for other versions is available at Cloudera.com.

PARQUET_FILE_SIZE

Specifies the maximum size of each Parquet data file produced by Impala INSERT statements.

Syntax:

Specify the size in bytes, or with a trailing m or g character to indicate megabytes or gigabytes. For example:

-- 128 megabytes.
set PARQUET_FILE_SIZE=134217728
INSERT OVERWRITE parquet_table SELECT * FROM text_table;

-- 512 megabytes.
set PARQUET_FILE_SIZE=512m;
INSERT OVERWRITE parquet_table SELECT * FROM text_table;

-- 1 gigabyte.
set PARQUET_FILE_SIZE=1g;
INSERT OVERWRITE parquet_table SELECT * FROM text_table;

Usage notes:

With tables that are small or finely partitioned, the default Parquet block size (formerly 1 GB, now 256 MB in Impala 2.0 and later) could be much larger than needed for each data file. For INSERT operations into such tables, you can increase parallelism by specifying a smaller PARQUET_FILE_SIZE value, resulting in more HDFS blocks that can be processed by different nodes.

Default: 0 (produces files with a target size of 256 MB; files might be larger for very wide tables)

For information about the Parquet file format, and how the number and size of data files affects query performance, see Using the Parquet File Format with Impala Tables.

Examples: