Troubleshooting Hive

This section provides guidance on problems you may encounter while installing, upgrading, or running Hive.

With Hive, the most common troubleshooting aspects involve performance issues and managing disk space. Because Hive uses an underlying compute mechanism such as MapReduce or Spark, sometimes troubleshooting requires diagnosing and changing configuration in those lower layers.

Too Many Small Partitions

It can be tempting to partition your data into many small partitions to try to increase speed and concurrency. However, Hive functions best when data is partitioned into larger partitions. For example, consider partitioning a 100 TB table into 10,000 partitions, each 10 GB in size. In addition, do not use more than 10,000 partitions per table. Having too many small partitions puts significant strain on the Hive MetaStore and does not improve performance.

Hive Queries Fail with "Too many counters" Error


Hive operations use various counters while executing MapReduce jobs. These per-operator counters are enabled by the configuration setting hive.task.progress. This is disabled by default; if it is enabled, Hive may create a large number of counters (4 counters per operator, plus another 20).

By default, CDH restricts the number of MapReduce counters to 120. Hive queries that require more counters will fail with the "Too many counters" error.

What To Do

If you run into this error, set mapreduce.job.counters.max in mapred-site.xml to a higher value.