This is the documentation for CDH 5.1.0.
Documentation for other versions is available at Cloudera Documentation.

Upgrading Hive

Upgrade Hive on all the hosts on which it is running: servers and clients.

  Note: To see which version of Hive is shipping in CDH 5, check the Version and Packaging Information. For important information on new and changed components, see the CDH 5 Release Notes.

Checklist to Help Ensure Smooth Upgrades

The following best practices for configuring and maintaining Hive will help ensure that upgrades go smoothly.
  • Configure periodic backups of the metastore database. Use mysqldump, or the equivalent for your vendor if you are not using MySQL.
  • Make sure datanucleus.autoCreateSchema is set to false (in all types of database) and datanucleus.fixedDatastore is set to true (for MySQL and Oracle) in all hive-site.xml files. See the configuration instructions for more information about setting the properties in hive-site.xml.

  • Insulate the metastore database from users by running the metastore service in Remote mode. If you do not follow this recommendation, make sure you remove DROP, ALTER, and CREATE privileges from the Hive user configured in hive-site.xml. See Configuring the Hive Metastore for complete instructions for each type of supported database.

Upgrading Hive from CDH 4 to CDH 5

  Note:

If you have already performed the steps to uninstall CDH 4 and all components, as described under Upgrading from CDH 4 to CDH 5, you can skip Step 1 below and proceed with installing the new CDH 5 version of Hive.

Step 1: Remove Hive

  Warning:

You must make sure no Hive processes are running. If Hive processes are running during the upgrade, the new version will not work correctly.

  1. Exit the Hive console and make sure no Hive scripts are running.
  2. Stop any HiveServer processes that are running. If HiveServer is running as a daemon, use the following command to stop it:
    $ sudo service hive-server stop

    If HiveServer is running from the command line, stop it with <CTRL>-c.

  3. Stop the metastore. If the metastore is running as a daemon, use the following command to stop it:
    $ sudo service hive-metastore stop

    If the metastore is running from the command line, stop it with <CTRL>-c.

  4. Remove Hive:
    $ sudo yum remove hive

    To remove Hive on SLES systems:

    $ sudo zypper remove hive

    To remove Hive on Ubuntu and Debian systems:

    $ sudo apt-get remove hive

Step 2: Install the new Hive version on all hosts (Hive servers and clients)

See Installing Hive.

  Important: Configuration files
  • If you install a newer version of a package that is already on the system, configuration files that you have modified will remain intact.
  • If you uninstall a package, the package manager renames any configuration files you have modified from <file> to <file>.rpmsave. If you then re-install the package (probably to install a new version) the package manager creates a new <file> with applicable defaults. You are responsible for applying any changes captured in the original configuration file to the new configuration file. In the case of Ubuntu and Debian upgrades, you will be prompted if you have made changes to a file for which there is a new version; for details, see Automatic handling of configuration files by dpkg.

Step 3: Configure the Hive Metastore

You must configure the Hive metastore and initialize the service before you start the Hive Console. See Configuring the Hive Metastore for detailed instructions.

Step 4: Upgrade the Metastore Schema

  Important:
  • Cloudera strongly encourages you to make a backup copy of your metastore database before running the upgrade scripts. You will need this backup copy if you run into problems during the upgrade or need to downgrade to a previous version.
  • You must upgrade the metastore schema before starting Hive after the upgrade. Failure to do so may result in metastore corruption.
  • To run a script, you must first cd to the directory that script is in: that is /usr/lib/hive/scripts/metastore/upgrade/<database>.

The current version of CDH 5 includes changes in the Hive metastore schema. If you have been using Hive 0.10 or earlier, you must upgrade the Hive metastore schema after you install the new version of Hive but before you start Hive.

With CDH 5, there are now two ways to do this. You could either use Hive's schematool or use the schema upgrade scripts available with the Hive package.

Using schematool (Recommended):

The Hive distribution now includes an offline tool for Hive metastore schema manipulation called schematool. This tool can be used to initialize the metastore schema for the current Hive version. It can also handle upgrading schema from an older version to the current one. To do this, use the upgradeSchemaFrom option to specify the version of the schema you are currently using (see table below) and the compulsory dbType option to specify the database you are using. For example,
$ schematool -dbType derby -upgradeSchemaFrom 0.10.0
Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:       APP
Starting upgrade metastore schema from version 0.10.0 to <new_version>
Upgrade script upgrade-0.10.0-to-0.11.0.derby.sql
Completed upgrade-0.10.0-to-0.11.0.derby.sql
Upgrade script upgrade-0.11.0-to-<new_version>.derby.sql
Completed upgrade-0.11.0-to-<new_version>.derby.sql
schemaTool completed

Possible values for the dbType option are mysql, postgres, derby or oracle. The following table lists the Hive versions corresponding to the older CDH releases.

CDH Releases Hive Version

CDH 3

0.7.0

CDH 4.0

0.8.0

CDH 4.1

0.9.0

CDH 4.2 and later

0.10.0

See Using the Hive Schema Tool for more details on how to use schematool.

Using Schema Upgrade Scripts:

Run the appropriate schema upgrade scripts available in /usr/lib/hive/scripts/metastore/upgrade/:

  • Schema upgrade scripts from 0.7 to 0.8 and from 0.8 to 0.9 for Derby, MySQL, and PostgreSQL
  • 0.8 and 0.9 schema scripts for Oracle, but no upgrade scripts (you will need to create your own)
  • Schema upgrade scripts from 0.9 to 0.10 for Derby, MySQL, PostgreSQL and Oracle
  • Schema upgrade scripts from 0.10 to 0.11 for Derby, MySQL, PostgreSQL and Oracle

For more information about upgrading the schema, see the README in /usr/lib/hive/scripts/metastore/upgrade/.

Step 5: Configure HiveServer2

HiveServer2 is an improved version of the original HiveServer (HiveServer1, no longer supported). Some configuration is required before you initialize HiveServer2; see Configuring HiveServer2 for details.

Step 6: Upgrade Scripts, etc., for HiveServer2 (if necessary)

If you have been running HiveServer1, you may need to make some minor modifications to your client-side scripts and applications when you upgrade:

  • HiveServer1 does not support concurrent connections, so many customers run a dedicated instance of HiveServer1 for each client. These can now be replaced by a single instance of HiveServer2.
  • HiveServer2 uses a different connection URL and driver class for the JDBC driver. If you have existing scripts that use JDBC to communicate with HiveServer1, you can modify these scripts to work with HiveServer2 by changing the JDBC driver URL from jdbc:hive://hostname:port to jdbc:hive2://hostname:port, and by changing the JDBC driver class name from org.apache.hive.jdbc.HiveDriver to org.apache.hive.jdbc.HiveDriver.

Step 7: Start the Metastore, HiveServer2, and Beeline

See:

Step 8: Upgrade the JDBC driver on the clients

The driver used for CDH 4.x does not work with CDH 5.x. Install the new version, following these instructions.

Upgrading Hive from an Earlier Version of CDH 5

The instructions that follow assume that you are upgrading Hive as part of a CDH 5 upgrade, and have already performed the steps under Upgrading from a CDH 5 Beta Release to the Latest Version .

  Important:
  • If you are currently running Hive under MRv1, check for the following property and value in /etc/mapred/conf/mapred-site.xml:
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property> 
    Remove this property before you proceed; otherwise Hive queries spawned from MapReduce jobs will fail with a null pointer exception (NPE).
  • If you have installed the hive-hcatalog-server package in the past, you must remove it before you proceed; otherwise the upgrade will fail.

To upgrade Hive from an earlier version of CDH 5, proceed as follows.

Step 1: Stop all Hive Processes and Daemons

  Warning:

You must make sure no Hive processes are running. If Hive processes are running during the upgrade, the new version will not work correctly.

  1. Stop any HiveServer processes that are running:
    $ sudo service hive-server stop 
  2. Stop any HiveServer2 processes that are running:
    $ sudo service hive-server2 stop 
  3. Stop the metastore:
    $ sudo service hive-metastore stop 

Step 2: Install the new Hive version on all hosts (Hive servers and clients)

SeeInstalling Hive

Step 3: Verify that the Hive Metastore is Properly Configured

See Configuring the Hive Metastore for detailed instructions.

Step 4: Start the Metastore, HiveServer2, and Beeline

See:

The upgrade is now complete.