Hive Metastore High Availability

You can enable Hive metastore high availability (HA), so that your cluster is resilient to failures due to a metastore that becomes unavailable. Each metastore is independent; they do not use a quorum.

Prerequisites

  • Cloudera recommends that each instance of the metastore runs on a separate cluster host, to maximize high availability.
  • Hive metastore HA requires a database that is also highly available, such as MySQL with replication in active-active mode. Refer to the documentation for your database of choice to configure it correctly.

Limitations

Sentry HDFS synchronization does not support Hive metastore HA.

Enabling Hive Metastore High Availability Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

  1. Go to the Hive service.
  2. Click the Configuration tab.
  3. Select Scope > Hive Metastore Server.
  4. Select Category > Advanced.
  5. Locate the Hive Metastore Delegation Token Store property or search for it by typing its name In the Search box.
  6. If you use a secure cluster, enable the Hive token store by setting the value of the Hive Metastore Delegation Token Store property to org.apache.hadoop.hive.thrift.DBTokenStore.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties.

  7. Click Save Changes to commit the changes.
  8. Click the Instances tab.
  9. Click Add Role Instances.
  10. Click the text field under Hive Metastore Server.
  11. Select a host on which to run the additional metastore and click OK.
  12. Click Continue and click Finish.
  13. Check the checkbox next to the new Hive Metastore Server role.
  14. Select Actions for Selected > Start, and click Start to confirm.
  15. Click to display the stale configurations page.
  16. Click Restart Clusterand click Restart Now.
  17. Click Finish after the cluster finishes restarting.

Enabling Hive Metastore High Availability Using the Command Line

To configure the Hive metastore for high availability, you configure each metastore to store its state in a replicated database, then provide the metastore clients with a list of URIs where metastores are available. The client starts with the first URI in the list. If it does not get a response, it randomly picks another URI in the list and attempts to connect. This continues until the client receives a response.

  1. Configure Hive on each of the cluster hosts where you want to run a metastore, following the instructions at Configuring the Hive Metastore.
  2. On the server where the master metastore instance runs, edit the /etc/hive/conf.server/hive-site.xml file, setting the hive.metastore.uris property's value to a list of URIs where a Hive metastore is available for failover.
    <property>
     <name>hive.metastore.uris</name>
     <value>thrift://metastore1.example.com,thrift://metastore2.example.com,thrift://metastore3.example.com</value>
     <description> URI for client to contact metastore server </description>
    </property>
  3. If you use a secure cluster, enable the Hive token store by configuring the value of the hive.cluster.delegation.token.store.class property to org.apache.hadoop.hive.thrift.DBTokenStore.
    <property>
     <name>hive.cluster.delegation.token.store.class</name>
     <value>org.apache.hadoop.hive.thrift.DBTokenStore</value>
    </property>
  4. Save your changes and restart each Hive instance.
  5. Connect to each metastore and update it to use a nameservice instead of a NameNode, as a requirement for high availability.
    1. From the command-line, as the Hive user, retrieve the list of URIs representing the filesystem roots:
      hive --service metatool -listFSRoot
    2. Run the following command with the --dry-run option, to be sure that the nameservice is available and configured correctly. This will not change your configuration.
      hive --service metatool -updateLocation nameservice-uri namenode-uri -dryRun
    3. Run the same command again without the --dry-run option to direct the metastore to use the nameservice instead of a NameNode.
      hive --service metatool -updateLocation nameservice-uri namenode-uri 
  6. Test your configuration by stopping your main metastore instance, and then attempting to connect to one of the other metastores from a client. The following is an example of doing this on a RHEL or Fedora system. The example first stops the local metastore, then connects to the metastore on the host metastore2.example.com and runs the SHOW TABLES command.
    $ sudo service hive-metastore stop
    $ /usr/lib/hive/bin/beeline
    beeline> !connect jdbc:hive2://metastore2.example.com:10000 username password org.apache.hive.jdbc.HiveDriver
    0: jdbc:hive2://localhost:10000> SHOW TABLES;
    show tables;
    +-----------+
    | tab_name  |
    +-----------+
    +-----------+
    No rows selected (0.238 seconds)
    0: jdbc:hive2://localhost:10000> 
  7. Restart the local metastore when you have finished testing.
    $ sudo service hive-metastore start