Apache Hive Known Issues

INSERT INTO Overwrites EXTERNAL Table on the Local Filesystem

When EXTERNAL tables are located on the local filesystem (URIs beginning with file://), the INSERT INTO statement overwrites the table data. Defining EXTERNAL tables on the local filesystem is not a well-documented practice so its behavior is not well defined and is subject to change.

Affected Versions: CDH 5.10 and higher

Bug: None

Workaround: Change the table location from the local filesystem to an HDFS location.

Built-in version() function is not supported

Cloudera does not currently support the built-in version() function.

Affected Versions: Not applicable.

Bug: None

Workaround: None

EXPORT and IMPORT commands fail for tables or partitions with data residing on Amazon S3

The EXPORT and IMPORT commands fail when the data resides on the Amazon S3 filesystem because the default Hive configuration restricts which file systems can be used for these statements.

Bug: None.

Resolution: Use workaround.

Workaround: Add S3 to the list of supported filesystems for EXPORT and IMPORT by setting the following property in HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml in Cloudera Manager (select Hive service > Configuration > HiveServer2):

 
<property> 
<name>hive.exim.uri.scheme.whitelist</name> 
<value>hdfs,pfile,s3a</value> 
</property>

Hive queries on MapReduce 1 cannot use Amazon S3 when Cloudera Manager External Account feature is used

Hive queries that read or write data to Amazon S3 and use the Cloudera Manager External Account feature for S3 credential management do not work with MapReduce 1 (MRv1) because it is deprecated on CDH.

Bug: None.

Resolution: Use workaround.

Workaround: Migrate your cluster from MRv1 to MRv2. See Migrating from MapReduce (MRv1) to MapReduce (MRv2).

ALTER PARTITION does not work on Amazon S3 or between S3 and HDFS

Cloudera recommends that you do not use ALTER PARTITION on S3 or between S3 and HDFS.

Bug: None.

Hive cannot drop encrypted databases in cascade if trash is enabled

Bug: HIVE-11418.

Workaround: Remove each table using the PURGE keyword (DROP TABLE table PURGE). After all tables are removed, remove the empty database (DROP DATABASE database).

Potential Failure of "alter table <schema>.<table> rename to <schema>.<new_table_name>"

When Hive renames a managed table, it always creates the new renamed table directory under its database directory in order to preserve the database/table hierarchy. The renamed table directory is created under the default database.

Considering that the encryption of a filesystem is part of the evolution hardening of a system (where the system and the data contained in it already exist) and a database can be already created without the location set (because it is not strictly required) and the default database is outside the same encryption zone (or in a no-encryption zone) the alter table rename operation fails.

Affected Version: CDH 5.5 only

Bug: None

Resolution: Use workaround.

Workaround: Use the following statements:

CREATE DATABASE database_encrypted_zone LOCATION '/hdfs/encrypted_path/database_encrypted_zone';
USE database_encrypted_zone;
CREATE TABLE rename_test_table LOCATION 'hdfs/encrypted_path/database_encrypted_zone/rename_test';
ALTER TABLE rename_test_table RENAME TO test_rename_table;

The renamed table is created under the default database.

Hive upgrade from CDH 5.0.5 fails on Debian 7.0 if a Sentry 5.0.x release is installed

Upgrading Hive from CDH 5.0.5 to CDH 5.4, 5.3 or 5.2 fails with the following error if a Sentry version later than 5.0.4 and earlier than 5.1.0 is installed. You will see an error such as the following:
: error processing
    /var/cache/apt/archives/hive_0.13.1+cdh5.2.0+221-1.cdh5.2.0.p0.32~precise-cdh5.2.0_all.deb
    (--unpack):   trying to overwrite '/usr/lib/hive/lib/commons-lang-2.6.jar', which is also
    in package sentry 1.2.0+cdh5.0.5
This is because of a conflict involving commons-lang-2.6.jar.

Bug: None.

Workaround: Upgrade Sentry first and then upgrade Hive. Upgrading Sentry deletes all the JAR files that Sentry has installed under /usr/lib/hive/lib and installs them under /usr/lib/sentry/lib instead.

Hive ACID is not supported

Hive ACID is an experimental feature and Cloudera does not currently support it.

Hive creates an invalid table if you specify more than one partition with alter table

Hive (in all known versions from 0.7) allows you to configure multiple partitions with a single alter table command, but the configuration it creates is invalid for both Hive and Impala.

Bug: None

Resolution: Use workaround.

Workaround:

Correct results can be obtained by configuring each partition with its own alter table command in either Hive or Impala .For example, the following:
ALTER TABLE page_view ADD PARTITION (dt='2008-08-08', country='us') location '/path/to/us/part080808' PARTITION
(dt='2008-08-09', country='us') location '/path/to/us/part080809';
should be replaced with:
ALTER TABLE page_view ADD PARTITION (dt='2008-08-08', country='us') location '/path/to/us/part080808';
ALTER TABLE page_view ADD PARTITION (dt='2008-08-09', country='us') location '/path/to/us/part080809';

PostgreSQL 9.0+ requires additional configuration

The Hive metastore will not start if you use a version of PostgreSQL later than 9.0 in the default configuration. You will see output similar to this in the log:
Caused by: javax.jdo.JDODataStoreException: Error executing JDOQL query
"SELECT "THIS"."TBL_NAME" AS NUCORDER0 FROM "TBLS" "THIS" LEFT OUTER JOIN "DBS" "THIS_DATABASE_NAME" ON "THIS"."DB_ID" = "THIS_DATABASE_NAME"."DB_ID" 
WHERE "THIS_DATABASE_NAME"."NAME" = ? AND (LOWER("THIS"."TBL_NAME") LIKE ? ESCAPE '\\' ) ORDER BY NUCORDER0 " : ERROR: invalid escape string
Hint: Escape string must be empty or one character..
NestedThrowables:
org.postgresql.util.PSQLException: ERROR: invalid escape string
 Hint: Escape string must be empty or one character.
 at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313)
 at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:252)
 at org.apache.hadoop.hive.metastore.ObjectStore.getTables(ObjectStore.java:759)
 ... 28 more
Caused by: org.postgresql.util.PSQLException: ERROR: invalid escape string
 Hint: Escape string must be empty or one character.
 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2096)
 at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1829)
 at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
 at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:510)
 at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:386)
 at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:271)
 at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
 at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
 at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:457)
 at org.datanucleus.store.rdbms.query.legacy.SQLEvaluator.evaluate(SQLEvaluator.java:123)
 at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.performExecute(JDOQLQuery.java:288)
 at org.datanucleus.store.query.Query.executeQuery(Query.java:1657)
 at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245)
 at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499)
 at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243)
 ... 29 more 

The problem is caused by a backward-incompatible change in the default value of the standard_conforming_strings property. Versions up to PostgreSQL 9.0 defaulted to off, but starting with version 9.0 the default is on.

Bug: None

Resolution: Use workaround.

Workaround: As the administrator user, use the following command to turn standard_conforming_strings off:
ALTER DATABASE <hive_db_name> SET standard_conforming_strings = off; 

Queries spawned from MapReduce jobs in MRv1 fail if mapreduce.framework.name is set to yarn

Queries spawned from MapReduce jobs fail in MRv1 with a null pointer exception (NPE) if /etc/mapred/conf/mapred-site.xml has the following:
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property> 

Bug: None

Resolution: Use workaround

Workaround: Remove the mapreduce.framework.name property from mapred-site.xml.

Commands run against an Oracle backed Metastore may fail

Commands run against an Oracle-backed Metastore fail with error:
javax.jdo.JDODataStoreException Incompatible data type for column TBLS.VIEW_EXPANDED_TEXT : was CLOB (datastore),
but type expected was LONGVARCHAR (metadata). Please check that the type in the datastore and the type specified in the MetaData are consistent.

This error may occur if the metastore is run on top of an Oracle database with the configuration property datanucleus.validateColumns set to true.

Bug: None

Workaround: Set datanucleus.validateColumns=false in the hive-site.xml configuration file.

Hive, Pig, and Sqoop 1 fail in MRv1 tarball installation because /usr/bin/hbase sets HADOOP_MAPRED_HOME to MR2

This problem affects tarball installations only.

Bug: None

Resolution: Use workaround

Workaround: If you are using MRv1, edit the following line in /etc/default/hadoop from
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce 
to
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce 

In addition, /usr/lib/hadoop-mapreduce must not exist in HADOOP_CLASSPATH.

Hive Web Interface not supported

Cloudera no longer supports the Hive Web Interface because of inconsistent upstream maintenance of this project.

Bug: DISTRO-77

Resolution: Use workaround

Workaround: Use Hue and Beeswax instead of the Hive Web Interface.

Hive may need additional configuration to make it work in an Federated HDFS cluster

Hive jobs normally move data from a temporary directory to a warehouse directory during execution. Hive uses /tmp as its temporary directory by default, and users usually configure /user/hive/warehouse/ as the warehouse directory. Under Federated HDFS, /tmp and /user are configured as ViewFS mount tables, and so the Hive job will actually try to move data between two ViewFS mount tables. Federated HDFS does not support this, and the job will fail with the following error:
Failed with exception Renames across Mount points not supported 

Bug: None

Resolution: No software fix planned; use the workaround.

Workaround: Modify /etc/hive/conf/hive-site.xml to allow the temporary directory and warehouse directory to use the same ViewFS mount table. For example, if the warehouse directory is /user/hive/warehouse, add the following property to /etc/hive/conf/hive-site.xml so both directories use the ViewFS mount table for /user.
<property>
 <name>hive.exec.scratchdir</name>
 <value>/user/${user.name}/tmp</value>
</property> 

Cannot create archive partitions with external HAR (Hadoop Archive) tables

ALTER TABLE ... ARCHIVE PARTITION is not supported on external tables.

Bug: None

Workaround: None

Setting hive.optimize.skewjoin to true causes long running queries to fail

Bug: None

Workaround: None

JDBC - executeUpdate does not return the number of rows modified

Contrary to the documentation, method executeUpdate always returns zero.

Workaround: None

Hive Auth (Grant/Revoke/Show Grant) statements do not support fully qualified table names (default.tab1)

Bug: None

Workaround: Switch to the database before granting privileges on the table.

Object types Server and URI are not supported in "SHOW GRANT ROLE roleName on OBJECT objectName"

Bug: None

Workaround: Use SHOW GRANT ROLE roleNameto list all privileges granted to the role.

Kerberized HS2 with LDAP Authentication Fails in Multi-domain LDAP Case

In CDH 5.7, Hive introduced a feature to support HS2 with Kerberos plus LDAP authentication; but it broke compatibility with multi-domain LDAP cases on CDH 5.7.x and C5.8.x versions.

Bug: HIVE-13590.

Workaround: None.

HCatalog Known Issues

Hive's DECIMAL data type cannot be mapped to Pig via HCatalog

HCatalog does recognize the DECIMAL data type.

Bug: none

Workaround: None

Job submission using WebHCatalog might not work correctly

Bug: none

Resolution: Use workaround.

Workaround: Cloudera recommends using the Oozie REST interface to submit jobs, as it's a more mature and capable tool.

WebHCatalog does not work in a Kerberos-secured Federated cluster

Bug: none

Resolution: None planned.

Workaround:None

With Encrypted HDFS, 'drop database if exists <db_name> cascade' fails

Hive cannot drop encrypted databases in cascade if trash is enabled.

Bug: HIVE-11418

Workaround: Remove each table, using the PURGE keyword (DROP TABLE table PURGE). After all tables are removed, remove the empty database (DROP DATABASE database).

Hive External LDAP Configuration Requires Full Distinguished Name

This problem affects OpenLDAP only.

Due to a change in search and bind authentication, Hive users authenticating to external LDAP without the distinguishedName (dn) attribute may encounter errors.

Bug: HIVE-7193

Workaround: Set distinguishedName attribute to its full value.

Creating external Hive tables on an empty S3 bucket may result in NullPointerException

This bug only occurs on a completely empty s3 bucket.

Bug: None.

Workaround: Create any file in the bucket first.

Hive on Spark (HoS)

Hive on Spark throws exception for multi-insert with join

A multi-insert combined with a join query with Hive on Spark (Hos) sometimes throws an exception. It occurs only when multiple parts of the resultant operator tree are executed on the same executor by Spark.

Bug: HIVE-13300

Workaround: Run inserts one at a time.

NullPointerException when spark session is reused to run a mapjoin

Some Hive on Spark (HoS) queries may fail with a NullPointerException if a Spark dependency is not set.

Bug: HIVE-12616

Workaround: Configure Hive to depend on the Spark (on YARN) service in Cloudera Manager.

Large Hive on Spark queries may fail in Spark tasks with ExecutorLostFailure

The root cause is java.lang.OutOfMemoryError: Unable to acquire XX bytes of memory, got 0. Spark executors can OOM because of a failure to correctly spill shuffle data from memory to disk.

Bug: None.

Workaround: Run this query using MapReduce.