This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

NUM_NODES

Limit the number of nodes that process a query, typically during debugging. Only accepts the values 0 (meaning all nodes) or 1 (meaning all work is done on the coordinator node). If you are diagnosing a problem that you suspect is due to a timing issue due to distributed query processing, you can set NUM_NODES=1 to verify if the problem still occurs when all the work is done on a single node.

You might set the NUM_NODES option to 1 briefly, during INSERT or CREATE TABLE AS SELECT statements. Normally, those statements produce one or more data files per data node. If the write operation involves small amounts of data, a Parquet table, and/or a partitioned table, the default behavior could produce many small files when intuitively you might expect only a single output file. SET NUM_NODES=1 turns off the "distributed" aspect of the write operation, making it more likely to produce only one or a few data files.

Default: 0

Page generated September 3, 2015.