Apache Parquet Known Issues
Parquet file writes run out of memory if (number of partitions) times (block size) exceeds available memory
The Parquet output writer allocates one block for each table partition it is processing and writes partitions in parallel. The MapReduce or YARN task will run out of memory if (number of partitions) times (Parquet block size) is greater than the available memory.
Cloudera Bug: CDH-20157, CDH-20253
Workaround: None; if necessary, reduce the number of partitions in the table.
parquet-thrift cannot read Parquet data written by Hive
parquet-thrift cannot read Parquet data written by Hive, and parquet-avro will show an additional record level in lists named array_element.
Cloudera Bug: CDH-22189, CDH-22220
Workaround: None; arrays written by parquet-avro or parquet-thrift cannot currently be read by parquet-hive.