The Hive Service
Cloudera Manager began to manage the Hive service starting with Cloudera Manager 4.5. As of Cloudera Manager 4.5, there is a new role type called the Hive Metastore Server. This role manages the metastore process when Hive is configured with a Remote Metastore.
Cloudera Manager also supports HiveServer2, an improved version of HiveServer that supports a new Thrift API tailored for JDBC and ODBC clients, Kerberos authentication, and multi-client concurrency. There is also a new CLI for HiveServer2 named BeeLine.
Cloudera Manager does not manage the older HiveServer (HiveServer1).
Cloudera recommends that you deploy HiveServer2 and use it whenever possible. You can still use the original HiveServer when you need to, and run it concurrently with HiveServer2. However, Cloudera Manager does not manage the older HiveServer, you must configure and manage it outside Cloudera Manager. See the HiveServer2 documentation for more information.
You are strongly encouraged to read Configuring the Hive Metastore.
- The Hive Metastore Server
- Considerations When Upgrading CDH
- Considerations When Upgrading Cloudera Manager
- Disabling Bypass Mode
- Using Hive Gateways
The Hive Metastore Server
Cloudera recommends using a Remote Metastore with Hive, especially for CDH4.2 or later. Since the Remote Metastore is recommended, Cloudera Manager treats the Hive Metastore Server as a required role for all Hive services. Here are a couple key reasons why the Remote Metastore setup is advantageous, especially in production settings:
- The Hive Metastore Database password and JDBC drivers don’t need to be shared with every Hive client; only the Hive Metastore Server does. Sharing passwords with many machines is a security concern.
- You can control activity on the Hive Metastore Database. To stop all activity on the database, just stop the Hive Metastore Server. This makes it easy to perform tasks such as backup and upgrade, which require all Hive activity to stop.
Information about the initial configuration of a remote Hive Metastore Server with Cloudera Manager can be found at Installing and Configuring Databases.
The Hive Metastore Server should not be used with CDH3. If you are using CDH3 or you’d like to use the Local Metastore mode, you can control this process by enabling the Bypass Hive Metastore Server mode in the Hive Service Configuration.
Considerations When Upgrading CDH
Hive has undergone major version changes from CDH 4.0 to 4.1 and between CDH4.1 and 4.2. (CDH4.0 had Hive 0.8.0, CDH4.1 used Hive 0.9.0, and CDH4.2 or later has 0.10.0). This requires you to manually back up and upgrade your Hive metastore database when upgrading between major Hive versions.
You should follow the steps in the appropriate in the Cloudera Manager procedure for upgrading CDH to upgrade the metastore before you restart the Hive service. This applies whether you are upgrading to packages or parcels. The procedure for upgrading CDH using packages is at Upgrading CDH 4 Using Packages and Step 4 covers upgrading the metastore. The procedure for upgrading with parcels is at Upgrading to a Newer CDH 4 Version with Parcels and the Hive metastore upgrade is covered in Step 2.
Considerations When Upgrading Cloudera Manager
When upgrading from a version of Cloudera Manager prior to 4.5, Cloudera Manager automatically creates new Hive service(s) to capture the previous implicit Hive dependency from Hue and Impala. Your previous services will continue to function without impact.
Note that if Hue was using a Hive metastore of type Derby, then the newly created Hive service will also use Derby. But since Derby does not allow concurrent connections, Hue will continue to work, but the new Hive Metastore Server will fail to run. The failure is harmless (because nothing uses this new Hive Metastore Server at this point) and intentional, to preserve the set of cluster functionality as it was before upgrade. Cloudera discourages the use of a Derby metastore due to its limitations. You should consider switching to a different supported database type.
Cloudera Manager provides a Hive configuration option to bypass the Hive Metastore Server. When this configuration is enabled, Hive clients, Hue, and Impala connect directly to the Hive Metastore database. Prior to Cloudera Manager 4.5, Hue and Impala talked directly to the Hive Metastore database, so bypass mode is enabled by default when upgrading to Cloudera Manager 4.5 or later. This is to ensure the upgrade doesn't disrupt your existing setup. You should plan to disable bypass mode, especially when using CDH 4.2 or later. Using the Hive Metastore Server is the recommended configuration and the WebHCat Server role requires the Hive Metastore Server to not be bypassed. To disable bypass mode, see Disabling Bypass Mode.
Cloudera Manager 4.5 or later also supports HiveServer2 with CDH4.2. HiveServer2 is not added by default, but can be added as a new role under the Hive service (see Adding Role Instances).
Disabling Bypass ModeIn bypass mode Hive clients directly access the metastore database instead of using the Hive Metastore Server for metastore information.
- Go to the Hive service.
- Select .
- Expand the category.
- Uncheck the Bypass Hive Metastore Server checkbox.
- Click Save Changes.
- Re-deploy Hive client configurations.
- Restart Hive and any Hue or Impala services configured to use that Hive service.
Using Hive Gateways
Because the Hive service does not have worker roles, another mechanism is needed to enable the automatic propagation of client configurations to the other nodes in your cluster. Gateway roles fulfill this function. Gateways in fact aren't really roles and do not have state, but they act as indicators for where client configurations should be placed. Hive gateways are created by default when the Hive service is added.