Troubleshooting Hive on Spark

Problem: Delayed result from the first query after starting a new Hive on Spark session

The first query after starting a new Hive on Spark session might be delayed due to the start-up time for the Spark on YARN cluster. The query waits for YARN containers to initialize. Subsequent queries will be faster.

Problem: Exception Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0) and HiveServer2 is down

HiveServer2 memory is set too small. For more information, see STDOUT for HiveServer2. To fix this issue:

In Cloudera Manager, go to HIVE.
Click Configuration.
Search for Java Heap Size of HiveServer2 in Bytes, and change it to be a larger value. Cloudera recommends a minimum value of 256 MB.
Restart HiveServer2.

Problem: Out-of-memory error

You might get an out-of-memory error similar to the following:

15/03/19 03:43:17 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x9e79a9b1, /10.20.118.103:45603 => /10.20.120.116:39110] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)
      java.lang.OutOfMemoryError: Java heap space

This error indicates that the Spark driver does not have enough off-heap memory. Increase the off-heap memory by setting spark.yarn.driver.memoryOverhead or spark.driver.memory.

Problem: Hive on Spark does not work with HBase

Hive on Spark with HBase is not supported. If you use HBase, use Hive on MapReduce instead of Hive on Spark.

Problem: Spark applications stay alive forever and occupy cluster resources

This can occur if there are multiple concurrent Hive sessions. To manually terminate the Spark applications:

Find the YARN application IDs for the applications by going to Cloudera Manager and clicking Yarn > ResourceManager > ResourceManager Web UI.
Log in to the YARN ResourceManager host.
Open a terminal and run:
```
yarn application -kill <applicationID>
```
applicationID is each YARN application ID you found in step 1.

Configuring Hive on Spark

Configuring Hive on Spark for Hive CLI