Troubleshooting Hive on Spark

Problem: Delayed result from the first query after starting a new Hive on Spark session
The first query after starting a new Hive on Spark session might be delayed due to the start-up time for the Spark on YARN cluster. The query waits for YARN containers to initialize. Subsequent queries will be faster.
Problem: Exception Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0) and HiveServer2 is down
HiveServer2 memory is set too small. For more information, see STDOUT for HiveServer2. To fix this issue:
  1. In Cloudera Manager, go to HIVE.
  2. Click Configuration.
  3. Search for Java Heap Size of HiveServer2 in Bytes, and change it to be a larger value. Cloudera recommends a minimum value of 256 MB.
  4. Restart HiveServer2.
Problem: Out-of-memory error
You might get an out-of-memory error similar to the following:
15/03/19 03:43:17 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x9e79a9b1, /10.20.118.103:45603 => /10.20.120.116:39110] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)
      java.lang.OutOfMemoryError: Java heap space

This error indicates that the Spark driver does not have enough off-heap memory. Increase the off-heap memory by setting spark.yarn.driver.memoryOverhead or spark.driver.memory.

Problem: Hive on Spark does not work with HBase
Hive on Spark with HBase is not supported. If you use HBase, use Hive on MapReduce instead of Hive on Spark.
Problem: Spark applications stay alive forever and occupy cluster resources
This can occur if there are multiple concurrent Hive sessions. To manually terminate the Spark applications:
  1. Find the YARN application IDs for the applications by going to Cloudera Manager and clicking Yarn > ResourceManager > ResourceManager Web UI.
  2. Log in to the YARN ResourceManager host.
  3. Open a terminal and run:
    yarn application -kill <applicationID>

    applicationID is each YARN application ID you found in step 1.