This is the documentation for CDH 4.6.0.
Documentation for other versions is available at Cloudera Documentation.

Configuring HiveServer2

You must make the following configuration changes before using HiveServer2. Failure to do so may result in unpredictable behavior.

Table Lock Manager (Required)

You must properly configure and enable Hive's Table Lock Manager. This requires installing ZooKeeper and setting up a ZooKeeper ensemble; see ZooKeeper Installation.

  Important:

Failure to do this will prevent HiveServer2 from handling concurrent query requests and may result in data corruption.

Enable the lock manager by setting properties in /etc/hive/conf/hive-site.xml as follows (substitute your actual ZooKeeper node names for those in the example):

<property>
  <name>hive.support.concurrency</name>
  <description>Enable Hive's Table Lock Manager Service</description>
  <value>true</value>
</property>

<property>
  <name>hive.zookeeper.quorum</name>
  <description>Zookeeper quorum used by Hive's Table Lock Manager</description>
  <value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
  Important:

Enabling the Table Lock Manager without specifying a list of valid Zookeeper quorum nodes will result in unpredictable behavior. Make sure that both properties are properly configured.

hive.zookeeper.client.port

If ZooKeeper is not using the default value for ClientPort, you need to set hive.zookeeper.client.port in /etc/hive/conf/hive-site.xml to the same value that ZooKeeper is using. Check /etc/zookeeper/conf/zoo.cfg to find the value for ClientPort. If ClientPort is set to any value other than 2181 (the default), set hive.zookeeper.client.port to the same value. For example, if ClientPort is set to 2222, set hive.zookeeper.client.port to 2222 as well:

<property>
  <name>hive.zookeeper.client.port</name>
  <value>2222</value>
  <description>
  The port at which the clients will connect.
  </description>
</property>

JDBC driver

The connection URL format and the driver class are different for HiveServer2 and HiveServer1:

HiveServer version

Connection URL

Driver Class

HiveServer2

jdbc:hive2://<host>:<port>
org.apache.hive.jdbc.HiveDriver

HiveServer1

jdbc:hive://<host>:<port>
org.apache.hadoop.hive.jdbc.HiveDriver

Authentication

HiveServer2 can be configured to authenticate all connections; by default, it allows any client to connect. HiveServer2 supports either Kerberos or LDAP authentication; configure this in the hive.server2.authentication property in the hive-site.xml file. You can also configure Pluggable Authentication, which allows you to use a custom authentication provider for HiveServer2; and HiveServer2 Impersonation, which allows users to execute queries and access HDFS files as the connected user rather than the super user who started the HiveServer2 daemon. For more information, see Hive Security Configuration.

In addition, for non-Kerberos connections, you can configure Secure Socket Layer (SSL) communication between HiveServer2 and clients. See Configuring Encrypted Client/Server Communication for non-Kerberos HiveServer2 Connections

Configuring HiveServer2 for YARN

To use HiveServer2 with YARN, you must set the HADOOP_MAPRED_HOME environment variable: add the following line to /etc/default/hive-server2:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Running HiveServer2 and HiveServer Concurrently

Cloudera recommends running HiveServer2 instead of the original HiveServer (HiveServer1) package in most cases; HiveServer1 is included for backward compatibility. Both HiveServer2 and HiveServer1 can be run concurrently on the same system, sharing the same data sets. This allows you to run HiveServer1 to support, for example, Perl or Python scripts that use the native HiveServer1 Thrift bindings.

Both HiveServer2 and HiveServer1 bind to port 10000 by default, so at least one of them must be configured to use a different port. You can set the port for HiveServer2 in hive-site.xml by means of the hive.server2.thrift.port property. For example:
<property>
  <name>hive.server2.thrift.port</name>
  <value>10001</value>
  <description>TCP port number to listen on, default 10000</description>
</property>

You can also specify the port (and the host IP address in the case of HiveServer2) by setting these environment variables:

HiveServer version

Port

Host Address

HiveServer2

HIVE_SERVER2_THRIFT_PORT

HIVE_SERVER2_THRIFT_BIND_HOST

HiveServer1

HIVE_PORT

<Host bindings cannot be specified>

Using Custom UDFs with HiveServer2

To use custom User-Defined Functions (UDFs) with HiveServer2, do the following:
  1. Copy the UDF JAR files to the machine(s) hosting the HiveServer2 server(s).

    Save the JARs to any directory you choose, and make a note of the path.

  2. Make the JARs available to the current instance of HiveServer2 by setting HIVE_AUX_JARS_PATH to the JARs' full pathname (the one you noted in Step 1) in hive-config.sh
      Note:

    The path can be the directory, or each JAR's full pathname in a comma-separated list.

    If you are using Cloudera Manager, use the HiveServer2 Service Environment Safety Valve to set HIVE_AUX_JARS_PATH.

  3. Add each JAR file's full pathname to the hive.aux.jars.path config property in hive-site.xml and re-start HiveServer2.

    This is to allow JARs to be passed to MapReduce jobs started by Hive.