Apache Hive Incompatible Changes and Limitations
CDH 5 includes a new offline tool called schematool; Cloudera recommends you use this tool to upgrade your metastore schema. See Upgrade the Metastore Schema for more information.
Hive upgrade: Upgrading Hive from CDH 4 to CDH 5, or from an earlier CDH 5.x release to CDH 5.2 or later, requires several manual steps. Follow the upgrade guide closely. See Upgrading Hive.
Incompatible changes between CDH 4 and CDH 5:
- The CDH 4 JDBC client is not compatible with CDH 5 HiveServer2. JDBC applications connecting to the CDH 5 HiveServer2 will require the CDH 5 JDBC client driver.
- JDBC applications will require the newer CDH 5 JDBC packages in order to connect to HiveServer2. You do not need to recompile applications for this change.
- Because of security and concurrency issues, the original Hive server (HiveServer1) and the Hive command-line interface (CLI) are deprecated in current versions of CDH 5 and will be removed in a future release. Cloudera strongly encourages you to migrate to HiveServer2 and Beeline. For more information, see HiveServer2 and Beeline.
- CDH 5 Hue will not work with HiveServer2 from CDH 4.
- The npath function has been removed.
- Cloudera recommends that custom ObjectInspectors created for use with custom SerDes have a no-argument constructor in addition to their normal constructors, for serialization purposes. See HIVE-5380 for more details.
- The SerDe interface has been changed which requires the custom SerDe modules to be reworked.
- The decimal data type format has changed as of CDH 5 Beta 2 and is not compatible with CDH 4.
- From CDH 5 Beta 2 onwards, the Parquet SerDe is part of the Hive package. The SerDe class name has changed as a result. However, there is a wrapper class for backward compatibility, so any existing Hive tables created with the Parquet SerDe will continue to work with CDH 5 Beta 2 and later Hive versions.
Incompatible changes between any earlier CDH version and CDH 5.4.x:
- CDH 5.2.0 and later clients cannot communicate with CDH 5.1.x and earlier servers. This means that you must upgrade the server before the clients.
- As of CDH 5.2.0, DESCRIBE DATABASE returns additional fields: owner_name and owner_type. The command will continue to behave as expected if you identify the field you're interested in by its (string) name, but could produce unexpected results if you use a numeric index to identify the field(s).
- CDH 5.2.0 implements HIVE-6248 , which includes some backward-incompatible changes to the HCatalog API.
- The CDH 5.2 Hive JDBC driver is not wire-compatible with the CDH 5.1 version of HiveServer2. Make sure you upgrade Hive clients and all other Hive hosts in tandem: the server first, and then the clients.
- HiveServer 1 is deprecated as of CDH 5.3, and will be removed in a future release of CDH. Users of HiveServer 1 should upgrade to HiveServer 2 as soon as possible. For more information, see HiveServer 2.
- org.apache.hcatalog is deprecated as of CDH 5.3. All client-facing classes were moved from org.apache.hcatalog to org.apache.hive.hcatalog as of CDH 5.0 and the deprecated classes in org.apache.hcatalog will be removed altogether in a future release. If you are still using org.apache.hcatalog, you should move to org.apache.hive.hcatalog immediately.
- Date partition columns: as of Hive version 13, implemented in CDH 5.2, Hive validates the
format of dates in partition columns, if they are stored as dates. A partition column with a date in invalid form can neither be used nor dropped once you upgrade to CDH 5.2 or higher. To avoid this
problem, do one of the following:
- Fix any invalid dates before you upgrade. Hive expects dates in partition columns to be in the form YYYY-MM-DD.
- Store dates in partition columns as strings or integers.
SELECT "DBS"."NAME", "TBLS"."TBL_NAME", "PARTITION_KEY_VALS"."PART_KEY_VAL" FROM "PARTITION_KEY_VALS" INNER JOIN "PARTITIONS" ON "PARTITION_KEY_VALS"."PART_ID" = "PARTITIONS"."PART_ID" INNER JOIN "PARTITION_KEYS" ON "PARTITION_KEYS"."TBL_ID" = "PARTITIONS"."TBL_ID" INNER JOIN "TBLS" ON "TBLS"."TBL_ID" = "PARTITIONS"."TBL_ID" INNER JOIN "DBS" ON "DBS"."DB_ID" = "TBLS"."DB_ID" AND "PARTITION_KEYS"."INTEGER_IDX" ="PARTITION_KEY_VALS"."INTEGER_IDX" AND "PARTITION_KEYS"."PKEY_TYPE" = 'date';
- Decimal precision and scale: As of CDH 5.4, Hive support for decimal precision and scale changes as follows:
- When decimal is used as a type, it means decimal(10, 0) rather than a precision of 38 with a variable scale.
- When Hive is unable to determine the precision and scale of a decimal type (for example in the case of non-generic User-Defined Function (UDF) that has an evaluate() method that returns decimal), a precision and scale of (38, 18) is assumed. In previous versions, a precision of 38 and a variable scale were assumed. Cloudera recommends you develop generic UDFs instead, and specify exact precision and scale.
- When a decimal value is assigned or cast to a different decimal type, rounding is used to handle cases in which the precision of the value is greater than that of the target decimal type, as long as the integer portion of the value can be preserved. In previous versions, if the value's precision was greater than 38 (the only allowed precision for the decimal type), the value was set to null, regardless of whether the integer portion could be preserved.
- Deprecation of HivePassThrough serde formats: As of CDH 5.4, HIVE-8910 changes how the storage handler uses the HivePassThroughOutputFormat class. It removes the empty default constructor, which breaks org.apache.hadoop.util.ReflectionUtils.newInstance and throws a NoSuchMethodException. The workaround is to re-create the Hive tables without HivePassThrough serde formats.