This is the documentation for CDH 4.7.0.
Documentation for other versions is available at Cloudera Documentation.

Upgrading Hive

Upgrade Hive on all the hosts on which it is running: servers and clients.

  Note: To see which version of Hive is shipping in CDH4, check the Version and Packaging Information. For important information on new and changed components, see the CDH4 Release Notes.

Upgrading Hive from CDH3 to CDH4

  Note:

If you have already performed the steps to uninstall CDH3 and all components, as described under Upgrading from CDH3 to CDH4, you can skip Step 1 below and proceed with installing the new CDH4 version of Hive.

Step 1: Remove Hive

  Warning:

You must make sure no Hive processes are running. If Hive processes are running during the upgrade, the new version will not work correctly.

  1. Exit the Hive console and make sure no Hive scripts are running.
  2. Stop any HiveServer processes that are running. If HiveServer is running as a daemon, use the following command to stop it:
    $ sudo service hive-server stop

    If HiveServer is running from the command line, stop it with <CTRL>-c.

  3. Stop the metastore. If the metastore is running as a daemon, use the following command to stop it:
    $ sudo service hive-metastore stop

    If the metastore is running from the command line, stop it with <CTRL>-c.

  4. Remove Hive:
      Note:

    The following examples show how to uninstall Hive packages on a CDH3 system. Note that CDH3 and CDH4 use different names for the Hive packages: in CDH3 Hive, package names begin with the prefix hadoop-hive, while in CDH4 they begin with the prefix hive. (If you are already running CDH4 and upgrading to the latest version, you do not need to remove Hive: see Upgrading Hive from an Earlier Version of CDH4.)

    $ sudo yum remove hadoop-hive

    To remove Hive on SLES systems:

    $ sudo zypper remove hadoop-hive

    To remove Hive on Ubuntu and Debian systems:

    $ sudo apt-get purge hadoop-hive
      Warning:

    If you are upgrading an Ubuntu or Debian system from CDH3u3 or earlier, you must use apt-get purge (rather than apt-get remove) to make sure the re-install succeeds, but be aware that apt-get purge removes all your configuration data. If you have modified any configuration files, DO NOT PROCEED before backing them up.

Step 2: Install the new Hive version on all hosts (Hive servers and clients)

See Installing Hive.

  Important:

During uninstall, the package manager renames any configuration files you have modified from <file> to <file>.rpmsave. During re-install, the package manager creates a new <file> with applicable defaults. You are responsible for applying any changes captured in the original CDH3 configuration file to the new CDH4 configuration file. In the case of Ubuntu and Debian upgrades, a file will not be installed if there is already a version of that file on the system, and you will be prompted to resolve conflicts; for details, see Automatic handling of configuration files by dpkg.

Step 3: Configure the Hive Metastore

You must configure the Hive metastore and initialize the service before you start the Hive Console. See Configuring the Hive Metastore for detailed instructions.

Step 4: Upgrade the Metastore Schema

The current version of CDH4 includes changes in the Hive metastore schema. If you have been using Hive 0.9 or earlier, you must upgrade the Hive metastore schema after you install the new version of Hive but before you start Hive. To do this, run the appropriate schema upgrade scripts in /usr/lib/hive/scripts/metastore/upgrade/:

  • Schema upgrade scripts from 0.7 to 0.8 and from 0.8 to 0.9 for Derby, MySQL, and PostgreSQL
  • 0.8 and 0.9 schema scripts for Oracle, but no upgrade scripts (you will need to create your own)
  • Schema upgrade scripts from 0.9 to 0.10 for Derby, MySQL, PostgreSQL and Oracle
  Note: To upgrade Hive from CDH3 to CDH4, you must upgrade the schema to 0.8, then to 0.9, and then to 0.10.
  Important:
  • Cloudera strongly encourages you to make a backup copy of your metastore database before running the upgrade scripts. You will need this backup copy if you run into problems during the upgrade or need to downgrade to a previous version.
  • You must upgrade the metastore schema before starting Hive after the upgrade. Failure to do so may result in metastore corruption.
  • To run a script, you must first cd to the directory that script is in: that is /usr/lib/hive/scripts/metastore/upgrade/<database>.

For more information about upgrading the schema, see the README in /usr/lib/hive/scripts/metastore/upgrade/.

Step 5: Configure HiveServer2

HiveServer2 is an improved version of the original HiveServer (HiveServer1). Cloudera recommends using HiveServer2 instead of HiveServer1 as long as you do not depend directly on HiveServer1's Thrift API. Some configuration is required before you initialize HiveServer2; see Configuring HiveServer2 for details.

  Note:

If you need to run HiveServer1

You can continue to run HiveServer1 on CDH4.1 and later if you need it for backward compatibility; for example, you may have existing Perl and Python scripts that use the native HiveServer1 Thrift bindings. You can install and run HiveServer1 and HiveServer2 concurrently on the same system; see Running HiveServer2 and HiveServer Concurrently.

Step 6: Upgrade Scripts, etc., for HiveServer2 (if necessary)

If you have been running HiveServer1, you may need to make some minor modifications to your client-side scripts and applications when you upgrade:

  • HiveServer1 does not support concurrent connections, so many customers run a dedicated instance of HiveServer1 for each client. These can now be replaced by a single instance of HiveServer2.
  • HiveServer2 uses a different connection URL and driver class for the JDBC driver. If you have existing scripts that use JDBC to communicate with HiveServer1, you can modify these scripts to work with HiveServer2 by changing the JDBC driver URL from jdbc:hive://hostname:port to jdbc:hive2://hostname:port, and by changing the JDBC driver class name from org.apache.hive.jdbc.HiveDriver to org.apache.hive.jdbc.HiveDriver.

Step 7: Start the Metastore, HiveServer2, and Beeline

See:

Upgrading Hive from an Earlier Version of CDH4

The instructions that follow assume that you are upgrading Hive as part of an upgrade to CDH4, and have already performed the steps under Upgrading to CDH4.

  Important:

If you are currently running Hive under MRv1, check for the following property and value in /etc/mapred/conf/mapred-site.xml:

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property> 

Remove this property before you proceed; otherwise Hive queries spawned from MapReduce jobs will fail with a null pointer exception (NPE).

To upgrade Hive from an earlier version of CDH4, proceed as follows.

Step 1: Stop all Hive Processes and Daemons

  Warning:

You must make sure no Hive processes are running. If Hive processes are running during the upgrade, the new version will not work correctly.

  1. Exit the Hive console and make sure no Hive scripts are running.
  2. Stop any HiveServer processes that are running. If HiveServer is running as a daemon, use the following command to stop it:
    $ sudo service hive-server stop

    If HiveServer is running from the command line, stop it with <CTRL>-c.

  3. Stop any HiveServer2 processes that are running. If HiveServer2 is running as a daemon, use the following command to stop it:
    $ sudo service hive-server2 stop

    If HiveServer2 is running from the command line, stop it with <CTRL>-c.

  4. Stop the metastore. If the metastore is running as a daemon, use the following command to stop it:
    $ sudo service hive-metastore stop

    If the metastore is running from the command line, stop it with <CTRL>-c.

Step 2: Install the new Hive version on all hosts (Hive servers and clients)

See Installing Hive.

Step 3: Verify that the Hive Metastore is Properly Configured

See Configuring the Hive Metastore for detailed instructions.

Step 4: Upgrade the Metastore Schema

The current version of CDH4 includes changes in the Hive metastore schema. If you have been using Hive 0.9 or earlier, you must upgrade the Hive metastore schema after you install the new version of Hive but before you start Hive. To do this, run the appropriate schema upgrade scripts in /usr/lib/hive/scripts/metastore/upgrade/:

  • Schema upgrade scripts from 0.7 to 0.8 and from 0.8 to 0.9 for Derby, MySQL, and PostgreSQL
  • 0.8 and 0.9 schema scripts for Oracle, but no upgrade scripts (you will need to create your own)
  • Schema upgrade scripts from 0.9 to 0.10 for Derby, MySQL, PostgreSQL and Oracle
  Important:
  • Cloudera strongly encourages you to make a backup copy of your metastore database before running the upgrade scripts. You will need this backup copy if you run into problems during the upgrade or need to downgrade to a previous version.
  • You must upgrade the metastore schema before starting Hive. Failure to do so may result in metastore corruption.
  • To run a script, you must first cd to the directory that script is in: that is /usr/lib/hive/scripts/metastore/upgrade/<database>.

For more information about upgrading the schema, see the README in /usr/lib/hive/scripts/metastore/upgrade/.

Step 5: Configure HiveServer2

HiveServer2 is an improved version of the original HiveServer (HiveServer1). Cloudera recommends using HiveServer2 instead of HiveServer1 in most cases. Some configuration is required before you initialize HiveServer2; see Configuring HiveServer2 for details.

  Note:

If you need to run HiveServer1

You can continue to run HiveServer1 on CDH4.1 and later if you need it for backward compatibility; for example, you may have existing Perl and Python scripts that use the native HiveServer1 Thrift bindings. You can install and run HiveServer1 and HiveServer2 concurrently on the same systems; see Running HiveServer2 and HiveServer Concurrently.

Step 6: Upgrade Scripts, etc., for HiveServer2 (if necessary)

If you have been running HiveServer1, you may need to make some minor modifications to your client-side scripts and applications when you upgrade:

  • HiveServer1 does not support concurrent connections, so many customers run a dedicated instance of HiveServer1 for each client. These can now be replaced by a single instance of HiveServer 2.
  • HiveServer2 uses a different connection URL and driver class for the JDBC driver; scripts may need to be modified to use the new version.
  • Perl and Python scripts that use the native HiveServer1 Thrift bindings may need to be modified to use the HiveServer2 Thrift bindings.

Step 7: Start the Metastore, HiveServer2, and Beeline

See:

The upgrade is now complete.