Issues Fixed in CDH 5.4.x

The following topics describe known issues fixed in CDH 5.4.x, from newest to oldest release.

Issues Fixed in CDH 5.4.11
Issues Fixed in CDH 5.4.10
Issues Fixed in CDH 5.4.9
Issues Fixed in CDH 5.4.8
Issues Fixed in CDH 5.4.7
Issues Fixed in CDH 5.4.5
Issues Fixed in CDH 5.4.4
Issues Fixed in CDH 5.4.3
Issues Fixed in CDH 5.4.2
Issues Fixed in CDH 5.4.1
Issues Fixed in CDH 5.4.0

Issues Fixed in CDH 5.4.11

FLUME-2891 - Revert FLUME-2712 and FLUME-2886
FLUME-2908 - NetcatSource - SocketChannel not closed when session is broken
HADOOP-8436 - NPE In getLocalPathForWrite ( path, conf ) when the required context item is not configured
HADOOP-8437 - getLocalPathForWrite should throw IOException for invalid paths
HADOOP-8751 - NPE in Token.toString() when Token is constructed using null identifier
HADOOP-8934 - Shell command ls should include sort options
HADOOP-10048 - LocalDirAllocator should avoid holding locks while accessing the filesystem
HADOOP-10971 - Add -C flag to make `hadoop fs -ls` print filenames only
HADOOP-11901 - BytesWritable fails to support 2G chunks due to integer overflow
HADOOP-12252 - LocalDirAllocator should not throw NPE with empty string configuration
HADOOP-12269 - Update aws-sdk dependency to 1.10.6
HADOOP-12787 - KMS SPNEGO sequence does not work with WEBHDFS
HADOOP-12841 - Update s3-related properties in core-default.xml.
HADOOP-12901 - Add warning log when KMSClientProvider cannot create a connection to the KMS server.
HADOOP-12972 - Lz4Compressor#getLibraryName returns the wrong version number
HADOOP-13079 - Add -q option to Ls to print ? instead of non-printable characters
HADOOP-13132 - Handle ClassCastException on AuthenticationException in LoadBalancingKMSClientProvider
HADOOP-13155 - Implement TokenRenewer to renew and cancel delegation tokens in KMS
HADOOP-13251 - Authenticate with Kerberos credentials when renewing KMS delegation token
HADOOP-13255 - KMSClientProvider should check and renew tgt when doing delegation token operations
HADOOP-13263 - Reload cached groups in background after expiry.
HADOOP-13457 - Remove hardcoded absolute path for shell executable.
HDFS-4660 - Block corruption can happen during pipeline recovery
HDFS-8211 - DataNode UUID is always null in the JMX counter.
HDFS-8451 - DFSClient probe for encryption testing interprets empty URI property for enabled
HDFS-8496 - Calling stopWriter() with FSDatasetImpl lock held may block other threads
HDFS-8576 - Lease recovery should return true if the lease can be released and the file can be closed
HDFS-8722 - Optimize datanode writes for small writes and flushes
HDFS-9085 - Show renewer information in DelegationTokenIdentifier#toString
HDFS-9220 - Reading small file (< 512 bytes) that is open for append fails due to incorrect checksum
HDFS-9276 - Failed to Update HDFS Delegation Token for long running application in HA mode
HDFS-9466 - TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
HDFS-9589 - Block files which have been hardlinked should be duplicated before the DataNode appends to the them
HDFS-9700 - DFSClient and DFSOutputStream should set TCP_NODELAY on sockets for DataTransferProtocol
HDFS-9732 - Improve DelegationTokenIdentifier.toString() for better logging
HDFS-9939 - Increase DecompressorStream skip buffer size
HDFS-9949 - Add a test case to ensure that the DataNode does not regenerate its UUID when a storage directory is cleared
HDFS-10267 - Extra "synchronized" on FsDatasetImpl#recoverAppend and FsDatasetImpl#recoverClose
HDFS-10360 - DataNode might format directory and lose blocks if current/VERSION is missing.
HDFS-10381 - , DataStreamer DataNode exclusion log message should be warning.
MAPREDUCE-4785 - TestMRApp occasionally fails
MAPREDUCE-6580 - Test failure: TestMRJobsWithProfiler
YARN-2871 - TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk
YARN-3727 - Check if the directory exists before using it for localization
YARN-4168 - Fixed a failing test TestLogAggregationService.testLocalFileDeletionOnDiskFull
YARN-4354 - Public resource localization fails with NPE
YARN-4717 - TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
YARN-5048 - DelegationTokenRenewer#skipTokenRenewal might throw NPE
HBASE-6617 - ReplicationSourceManager should be able to track multiple WAL paths (ADDENDUM)
HBASE-11625 - Verifies data before building HFileBlock. - Adds HFileBlock.Header class which contains information about location of fields. Testing: Adds CorruptedFSReaderImpl to TestChecksum.
HBASE-11927 - Use Native Hadoop Library for HFile checksum.
HBASE-14155 - StackOverflowError in reverse scan
HBASE-14359 - HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction
HBASE-14730 - region server needs to log warnings when there are attributes configured for cells with hfile v2
HBASE-14759 - Avoid using Math.abs when selecting SyncRunner in FSHLog
HBASE-15234 - Don't abort ReplicationLogCleaner on ZooKeeper errors
HBASE-15456 - CreateTableProcedure/ModifyTableProcedure needs to fail when there is no family in table descriptor
HBASE-15479 - No more garbage or beware of autoboxing
HBASE-15582 - SnapshotManifestV1 too verbose when there are no regions
HBASE-15707 - ImportTSV bulk output does not support tags with hfile.format.version=3
HBASE-15746 - Remove extra RegionCoprocessor preClose() in RSRpcServices#closeRegion
HBASE-15811 - Batch Get after batch Put does not fetch all Cells We were not waiting on all executors in a batch to complete. The test for no-more-executors was damaged by the 0.99/0.98.4 fix "HBASE-11403 Fix race conditions around Object#notify"
HBASE-15925 - provide default values for hadoop compat module related properties that match default hadoop profile.
HBASE-16207 - can't restore snapshot without "Admin" permission
HIVE-9499 - hive.limit.query.max.table.partition makes queries fail on non-partitioned tables
HIVE-10048 - JDBC - Support SSL encryption regardless of Authentication mechanism
HIVE-10303 - HIVE-9471 broke forward compatibility of ORC files
HIVE-10685 - Alter table concatenate oparetor will cause duplicate data
HIVE-10925 - Non-static threadlocals in metastore code can potentially cause memory leak
HIVE-11031 - ORC concatenation of old files can fail while merging column statistics
HIVE-11054 - Handle varchar/char partition columns in vectorization
HIVE-11243 - Changing log level in Utilities.getBaseWork
HIVE-11408 - HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils
HIVE-11427 - Location of temporary table for CREATE TABLE SELECT broken by HIVE-7079.
HIVE-11488 - Combine the following jiras for "Support sessionId and queryId logging"Add sessionId and queryId info to HS2 log (Aihua Xu, reviewed by Szehon Ho) HIVE-12456: QueryId can't be stored in the configuration of the SessionState since multiple queries can run in a single session
HIVE-11583 - When PTF is used over a large partitions result could be corrupted
HIVE-11747 - Unnecessary error log is shown when executing a "INSERT OVERWRITE LOCAL DIRECTORY" cmd in the embedded mode
HIVE-11827 - STORED AS AVRO fails SELECT COUNT(*) when empty
HIVE-11919 - Hive Union Type Mismatch
HIVE-12354 - MapJoin with double keys is slow on MR
HIVE-12431 - Support timeout for compile lock
HIVE-12481 - Occasionally "Request is a replay" will be thrown from HS2
HIVE-12635 - Hive should return the latest hbase cell timestamp as the row timestamp value
HIVE-12958 - Make embedded Jetty server more configurable
HIVE-13200 - Aggregation functions returning empty rows on partitioned columns
HIVE-13251 - hive can't read the decimal in AVRO file generated from previous version
HIVE-13285 - Orc concatenation may drop old files from moving to final path
HIVE-13286 - Query ID is being reused across queries
HIVE-13462 - HiveResultSetMetaData.getPrecision() fails for NULL columns
HIVE-13527 - Using deprecated APIs in HBase client causes zookeeper connection leaks
HIVE-13570 - Some queries with Union all fail when CBO is off
HIVE-13932 - Hive SMB Map Join with small set of LIMIT failed with NPE
HIVE-13953 - Issues in HiveLockObject equals method
HIVE-13991 - Union All on view fail with no valid permission on underneath table
HIVE-14118 - Make the alter partition exception more meaningful
HUE-3185 - [oozie] Avoid extra API calls for parent information in workflow dashboard
HUE-3185 - Revert "[oozie] Avoid extra API calls for parent information in workflow dashboard"
HUE-3185 - [oozie] Avoid extra API calls for parent information in workflow dashboard
HUE-3437 - [core] PamBackend does not honor ignore_username_case
IMPALA-2378 - check proc mem limit before preparing fragment
IMPALA-2612 - Free local allocations once for every row batch when building hash tables.
IMPALA-2711 - Fix memory leak in Rand().
IMPALA-2722 - Free local allocations per row batch in non-partitioned AGG and HJ
OOZIE-2429 - TestEventGeneration test is unreliable
OOZIE-2466 - Repeated failure of TestMetricsInstrumentation.testSamplers
OOZIE-2486 - TestSLAEventsGetForFilterJPAExecutor is unreliable
SENTRY-780 - HDFS Plugin should not execute path callbacks for views
SENTRY-1184 - Clean up HMSPaths.renameAuthzObject
SENTRY-1292 - Reorder DBModelAction EnumSet
SENTRY-1293 - Avoid converting string permission to Privilege object
SOLR-6631 - DistributedQueue spinning on calling zookeeper getChildren()
SOLR-6820 - Make the number of version buckets used by the UpdateLog configurable as increasing beyond the default 256 has been shown to help with high volume indexing performance in SolrCloudIncrease the default number of buckets to 65536 instead of 256fix numVersionBuckets name attribute in configsets
SOLR-7332 - Initialize the highest value for all version buckets with the max value from the index or recent updates to avoid unnecessary lookups to the index to check for reordered updates when processing new documents.
SOLR-7587 - TestSpellCheckResponse stalled and never timed out
SOLR-7625 - Version bucket seed not updated after new index is installed on a replica
SOLR-8152 - Overseer Task Processor/Queue can miss responses, leading to timeouts
SOLR-8451 - Fix backport
SOLR-8451 - We should not call method.abort in HttpSolrClient or HttpSolrCall#remoteQuery and HttpSolrCall#remoteQuery should not close streams.
SOLR-8453 - Solr should attempt to consume the request inputstream on errors as we cannot count on the container to do it.
SOLR-8578 - Successful or not, requests are not always fully consumed by Solrj clients and we count on HttpClient or the JVM.
SOLR-8633 - DistributedUpdateProcess processCommit/deleteByQuery calls finish on DUP and SolrCmdDistributor, which violates the lifecycle and can cause bugs.
SOLR-8683 - Tune down stream closed logging
SOLR-8683 - Always consume the full request on the server, not just in the case of an error.
SOLR-8855 - The HDFS BlockDirectory should not clean up its cache on shutdown.
SOLR-8856 - Do not cache merge or 'read once' contexts in the hdfs block cache.
SOLR-8857 - HdfsUpdateLog does not use configured or new default number of version buckets and is hard coded to 256.
SOLR-8869 - Optionally disable printing field cache entries in SolrFieldCacheMBean
SPARK-12087 - Create new JobConf for every batch in saveAsHadoopFiles

Issues Fixed in CDH 5.4.10

CDH 5.4.10 fixes the following issues.

Apache Hadoop
Upstream Issues Fixed

Apache Hadoop

`FSImage` may get corrupted after deleting snapshot

Bug: HDFS-9406

Cloudera Bug: CDH-33224

When deleting a snapshot that contains the last record of a given INode, the fsimage may become corrupt because the create list of the snapshot diff in the previous snapshot and the child list of the parent INodeDirectory are not cleaned.

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.4.10:

FLUME-2712 - Optional channel errors slows down the Source to Main channel event rate
FLUME-2886 - Optional Channels can cause OOMs
HADOOP-10406 - TestIPC.testIpcWithReaderQueuing may fail
HADOOP-10668 - TestZKFailoverControllerStress#testExpireBackAndForth occasionally fails
HADOOP-11218 - Add TLSv1.1,TLSv1.2 to KMS, HttpFS, SSLFactory
HADOOP-12200 - TestCryptoStreamsWithOpensslAesCtrCryptoCodec should be skipped in non-native profile
HADOOP-12240 - Fix tests requiring native library to be skipped in non-native profile
HADOOP-12280 - Skip unit tests based on Maven profile rather than NativeCodeLoader.isNativeCodeLoaded
HADOOP-12417 - TestWebDelegationToken failing with port in use
HADOOP-12418 - TestRPC.testRPCInterruptedSimple fails intermittently
HADOOP-12464 - Interrupted client may try to fail-over and retry
HADOOP-12468 - Partial group resolution failure should not result in user lockout.
HADOOP-12474 - MiniKMS should use random ports for Jetty server by default
HADOOP-12559 - KMS connection failures should trigger TGT renewal
HADOOP-12604 - Exception may be swallowed in KMSClientProvider.
HADOOP-12605 - Fix intermittent failure of TestIPC.testIpcWithReaderQueuing
HADOOP-12668 - Support excluding weak Ciphers in HttpServer2 through ssl-server.conf
HADOOP-12682 - Fix TestKMS#testKMSRestart* failure
HADOOP-12699 - TestKMS#testKMSProvider intermittently fails during 'test rollover draining'
HADOOP-12715 - TestValueQueue#testgetAtMostPolicyALL fails intermittently
HADOOP-12736 - TestTimedOutTestsListener#testThreadDumpAndDeadlocks sometimes times out
HADOOP-12788 - OpensslAesCtrCryptoCodec should log which random number generator is used
HDFS-6533 - TestBPOfferService#testBasicFunctionalitytest fails intermittently
HDFS-7553 - fix the TestDFSUpgradeWithHA due to BindException
HDFS-8647 - Abstract BlockManager's rack policy into BlockPlacementPolicy
HDFS-9083 - Replication violates block placement policy
HDFS-9092 - Nfs silently drops overlapping write requests and causes data copying to fail
HDFS-9289 - Make DataStreamer#block thread safe and verify genStamp in commitBlock
HDFS-9313 - Possible NullPointerException in BlockManager if no excess replica can be chosen
HDFS-9347 - Invariant assumption in TestQuorumJournalManager.shutdown() is wrong
HDFS-9358 - TestNodeCount#testNodeCount timed out
HDFS-9406 - FSImage may get corrupted after deleting snapshot
HDFS-9445 - Datanode may deadlock while handling a bad volume
HDFS-9688 - Test the effect of nested encryption zones in HDFS downgrade
HDFS-9721 - Allow Delimited PB OIV tool to run upon fsimage that contains INodeReference
MAPREDUCE-6302 - Incorrect headroom can lead to a deadlock between map and reduce allocations
MAPREDUCE-6460 - TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
YARN-2902 - Killing a container that is localizing can orphan resources in the DOWNLOADING state
YARN-4155 - TestLogAggregationService.testLogAggregationServiceWithInterval failing
YARN-4204 - ConcurrentModificationException in FairSchedulerQueueInfo
YARN-4347 - Resource manager fails with Null pointer exception
YARN-4380 - TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
YARN-4393 - Fix intermittent test failure for TestResourceLocalizationService#testFailedDirsResourceRelease
YARN-4573 - Fix test failure in TestRMAppTransitions#testAppRunningKill and testAppKilledKilled
YARN-4613 - Fix test failure in TestClientRMService#testGetClusterNodes
HBASE-14205 - RegionCoprocessorHost System.nanoTime() performance bottleneck
HBASE-14621 - ReplicationLogCleaner stuck on RS crash
HBASE-14923 - VerifyReplication should not mask the exception during result comparison
HBASE-14926 - Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck reading
HBASE-15019 - Replication stuck when HDFS is restarted
HBASE-15031 - Fix merge of MVCC and SequenceID performance regression in branch-1.0
HBASE-15032 - hbase shell scan filter string assumes UTF-8 encoding
HBASE-15035 - bulkloading hfiles with tags that require splits do not preserve tags
HBASE-15052 - Use EnvironmentEdgeManager in ReplicationSource
HBASE-15104 - Occasional failures due to NotServingRegionException in IT tests
HBASE-15157 - Add *PerformanceTest for Append, CheckAnd*
HBASE-15213 - Fix increment performance regression caused by HBASE-8763 on branch-1.0
HIVE-7575 - GetTables thrift call is very slow
HIVE-10213 - MapReduce jobs using dynamic-partitioning fail on commit
HIVE-10514 - Fix MiniCliDriver tests failure
HIVE-11826 - 'hadoop.proxyuser.hive.groups' configuration does not prevent unauthorized user to access metastore
HIVE-11828 - beeline -f fails on scripts with tabs between column type and comment
HIVE-11977 - Hive should handle an external avro table with zero length files present
HIVE-12008 - Hive queries failing when using count(*) on column in view
HIVE-12388 - GetTables cannot get external tables when TABLE type argument is given
HIVE-12505 - Insert overwrite in same encrypted zone silently fails to remove some existing files
HIVE-12566 - Incorrect result returns when using COALESCE in WHERE condition with LEFT JOIN
HIVE-12713 - Miscellaneous improvements in driver compile and execute logging
HIVE-12784 - Group by SemanticException: Invalid column reference
HIVE-12790 - Metastore connection leaks in HiveServer2
HIVE-12795 - Vectorized execution causes ClassCastException
HIVE-12946 - alter table should also add default scheme and authority for the location similar to create table
HIVE-13039 - BETWEEN predicate is not functioning correctly with predicate pushdown on Parquet table
HIVE-13065 - Hive throws NPE when writing map type data to a HBase backed table
HUE-3106 - [filebrowser] Add support for full paths in zip file uploads
HUE-3110 - [oozie] Fix bundle submission when coordinator points to multiple bundles
HUE-3180 - [useradmin] Override duplicate username validation message
IMPALA-1702 - Check for duplicate table IDs at the end of analysis (issue not entirely fixed, but now fails gracefully)
IMPALA-2264 - Implicit casts to integers from decimals with higher precision sometimes allowed
IMPALA-2473 - Excessive memory usage by scan nodes
IMPALA-2621 - Fix flaky UNIX_TIMESTAMP() test
IMPALA-2643 - Nested inline view produces incorrect result when referencing the same column implicitly
IMPALA-2765 - AnalysisException: operands of type BOOLEAN and TIMESTAMP are not comparable when OUTER JOIN with CASE statement
IMPALA-2798 - After adding a column to avro table, Impala returns weird result if codegen is enabled.
IMPALA-2861 - Fix flaky scanner test added via IMPALA-2473 backport
IMPALA-2914 - Hit DCHECK Check failed: HasDateOrTime()
IMPALA-3034 - MemTracker leak on PHJ failure to spill
IMPALA-3085 - DataSinks' MemTrackers need to unregister themselves from parent
IMPALA-3093 - ReopenClient() could NULL out 'client_key' causing a crash
IMPALA-3095 - Allow additional Kerberos users to be authorized to access internal APIs
KITE-1114 - fix test
KITE-1114 - Fix missing license header
KITE-1114 - Kite CLI json-import HDFS temp file path not multiuser safe
OOZIE-2413 - Kerberos credentials can expire if the KDC is slow to respond
OOZIE-2428 - TestSLAService, TestSLAEventGeneration flaky tests
OOZIE-2432 - TestPurgeXCommand fails
OOZIE-2435 - TestCoordChangeXCommand is flaky
SENTRY-835 - Drop table leaves a connection open when using MetastoreListener
SENTRY-885 - DB name should be case insensitive in HDFS sync plugin
SENTRY-944 - Setting HDFS rules on Sentry managed hdfs paths should not affect original hdfs rules
SENTRY-953 - External Partitions which are referenced by more than one table can cause some unexpected behavior with Sentry HDFS sync
SENTRY-957 - Exceptions in MetastoreCacheInitializer should probably not prevent HMS from starting up
SENTRY-988 - It's better to let SentryAuthorization setter path always fall through and update HDFS
SENTRY-991 - backportRoles of Sentry Permission needs to be case insensitive
SENTRY-994 - SentryAuthorizationInfoX should override isSentryManaged
SENTRY-1002 - PathsUpdate.parsePath(path) will throw an NPE when parsing relative paths
SENTRY-1003 - backportSupport "reload" by updating the classpath of Sentry function aux jar path during run time
SENTRY-1008 - Path should be not be updated if the create/drop table/partition event fails
SENTRY-1044 - Tables with non-HDFS locations breaks HMS startup
SOLR-7281 - Add an overseer action to publish an entire node as 'down'
SOLR-8367 - Fix the LeaderInitiatedRecovery 'all replicas participate' fail-safe
SOLR-8371 - Try and prevent too many recovery requests from stacking up and clean up some faulty cancel recovery logic
SOLR-8372 - backportCanceled recovery can lead to data loss
SOLR-8575 - Addendum to Fix HDFSLogReader replay
SOLR-8575 - Fix HDFSLogReader replay status numbers and a performance bug where we can reopen FSDataInputStream too often
SOLR-8615 - Just like creating cores, we should use multiple threads when closing cores
SOLR-8720 - ZkController#publishAndWaitForDownStates should use #publishNodeAsDown
SOLR-8771 - Multithreaded core shutdown creates executor per core
SQOOP-2847 - Sqoop --incremental + missing parent --target-dir reports success with no data
SQOOP-2422 - Sqoop2: Test TestJSONIntermediateDataFormat is failing on JDK8
ZOOKEEPER-442 - Need a way to remove watches that are no longer of interest"

Issues Fixed in CDH 5.4.9

Known Issues Fixed

The following topics describe known issues fixed in CDH 5.4.9.

Apache Commons Collections deserialization vulnerability
Apache HBase

Apache Commons Collections deserialization vulnerability

Cloudera has learned of a potential security vulnerability in a third-party library called the Apache Commons Collections. This library is used in products distributed and supported by Cloudera (“Cloudera Products”), including core Apache Hadoop. The Apache Commons Collections library is also in widespread use beyond the Hadoop ecosystem. At this time, no specific attack vector for this vulnerability has been identified as present in Cloudera Products.

In an abundance of caution, we are currently in the process of incorporating a version of the Apache Commons Collections library with a fix into the Cloudera Products. In most cases, this will require coordination with the projects in the Apache community. One example of this is tracked by HADOOP-12577.

The Apache Commons Collections potential security vulnerability is titled “Arbitrary remote code execution with InvokerTransformer” and is tracked by COLLECTIONS-580. MITRE has not issued a CVE, but related CVE-2015-4852 has been filed for the vulnerability. CERT has issued Vulnerability Note #576313 for this issue.

Releases affected: CDH 5.5.0, CDH 5.4.8 and lower, CDH 5.3.8 and lower, CDH 5.2.8 and lower, CDH 5.1.7 and lower, Cloudera Manager 5.5.0, Cloudera Manager 5.4.8 and lower, Cloudera Manager 5.3.8 and lower, and Cloudera Manager 5.2.8 and lower, Cloudera Manager 5.1.6 and lower, Cloudera Manager 5.0.7 and lower, Cloudera Navigator 2.4.0, Cloudera Navigator 2.3.8 and lower.

Users affected: All

Impact: This potential vulnerability may enable an attacker to execute arbitrary code from a remote machine without requiring authentication.

Immediate action required: Upgrade to Cloudera Manager 5.5.1 and CDH 5.5.1, Cloudera Manager 5.4.9 and CDH 5.4.9, Cloudera Manager 5.3.9 and CDH 5.3.9, and Cloudera Manager 5.2.9 and CDH 5.2.9, and Cloudera Manager 5.1.7 and CDH 5.1.7, and Cloudera Manager 5.0.8 and CDH 5.0.8.

Apache HBase

Data may not be replicated to worker cluster if multiwal multiplicity is set to greater than 1

Bug: HBASE-13703, HBASE-6617, HBASE-14501.

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.4.9:

FLUME-2841 - Upgrade commons-collections to 3.2.2
HADOOP-7713 - dfs -count -q should label output column
HADOOP-11171 - Enable using a proxy server to connect to S3a
HADOOP-12568 - Update core-default.xml to describe posixGroups support
HADOOP-12577 - Bumped up commons-collections version to 3.2.2 to address a security flaw
HDFS-7785 - Improve diagnostics information for HttpPutFailedException
HDFS-7798 - Checkpointing failure caused by shared KerberosAuthenticator
HDFS-7871 - NameNodeEditLogRoller can keep printing 'Swallowing exception' message
HDFS-7990 - IBR delete ack should not be delayed
HDFS-8646 - Prune cached replicas from DatanodeDescriptor state on replica invalidation
HDFS-9123 - Copying from the root to a subdirectory should be forbidden
HDFS-9250 - Add Precondition check to LocatedBlock#addCachedLoc
HDFS-9273 - ACLs on root directory may be lost after NN restart
HDFS-9332 - Fix Precondition failures from NameNodeEditLogRoller while saving namespace
HDFS-9364 - Unnecessary DNS resolution attempts when creating NameNodeProxies
HDFS-9470 - Encryption zone on root not loaded from fsimage after NN restart
MAPREDUCE-6191 - Improve clearing stale state of Java serialization
MAPREDUCE-6549 - Multibyte delimiters with LineRecordReader cause duplicate records
YARN-4235 - FairScheduler PrimaryGroup does not handle empty groups returned for a user
HBASE-6617 - ReplicationSourceManager should be able to track multiple WAL paths
HBASE-12865 - WALs may be deleted before they are replicated to peers
HBASE-13134 - mutateRow and checkAndMutate APIs don't throw region level exceptions
HBASE-13618 - ReplicationSource is too eager to remove sinks
HBASE-13703 - ReplicateContext should not be a member of ReplicationSource
HBASE-14003 - Work around JDK-8044053
HBASE-14283 - Reverse scan doesn’t work with HFile inline index/bloom blocks
HBASE-14374 - Backport parent 'HBASE-14317 Stuck FSHLog' issue to 1.1
HBASE-14501 - NPE in replication with TDE
HBASE-14533 - Connection Idle time 1 second is too short and the connection is closed too quickly by the ChoreService
HBASE-14547 - Add more debug/trace to zk-procedure
HBASE-14799 - Commons-collections object deserialization remote command execution vulnerability
HBASE-14809 - Grant / revoke Namespace admin permission to group
HIVE-10265 - Hive CLI crashes on != inequality
HIVE-11149 - Sometimes HashMap in PerfLogger.java hangs
HIVE-11616 - DelegationTokenSecretManager reuses the same objectstore, which has concurrency issues
HIVE-12058 - Change hive script to record errors when calling hbase fails
HIVE-12188 - DoAs does not work properly in non-Kerberos secured HS2
HIVE-12189 - The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
HIVE-12250 - ZooKeeper connection leaks in Hive's HBaseHandler
HIVE-12365 - Added resource path is sent to cluster as an empty string when externally removed
HIVE-12378 - Exception on HBaseSerDe.serialize binary field
HIVE-12406 - HIVE-9500 introduced incompatible change to LazySimpleSerDe public interface
HIVE-12418 - HiveHBaseTableInputFormat.getRecordReader() causes ZooKeeper connection leak
HUE-2941 - [hadoop] Cache the active RM HA
HUE-3035 - [beeswax] Optimize sample data query for partitioned tables
IMPALA-1459 - Fix migration/assignment of On-clause predicates inside inline views
IMPALA-1675 - Avoid overflow when adding large intervals to TIMESTAMPs
IMPALA-1746 - QueryExecState does not check for query cancellation or errors
IMPALA-1917 - Do not register aux equivalence predicates with NULL on either side
IMPALA-1949 - Analysis exception when a binary operator contain an IN operator with
IMPALA-2086/IMPALA-2090 - Avoid boost year/month interval logic
IMPALA-2141 - UnionNode::GetNext() does not check for query errors
IMPALA-2252 - Crash (likely race) tearing down BufferedBlockMgr on query failure
IMPALA-2260 - Adding a large hour interval caused an interval overflow
IMPALA-2265 - Sorter was not checking the returned Status of PrepareRead
IMPALA-2273 - Make MAX_PAGE_HEADER_SIZE configurable
IMPALA-2286 - Fix race between ~BufferedBlockMgr() and BufferedBlockMgr::Create()
IMPALA-2344 - Work-around IMPALA-2344 Fail query with OOM in case block->Pin() fails
IMPALA-2357 - Fix spilling sorts with var-len slots that are NULL or empty
IMPALA-2446 - Fix wrong predicate assignment in outer joins
IMPALA-2533 - Impala throws IllegalStateException when inserting data into a partition
IMPALA-2559 - Fix check failed: sorter_runs_.back()->is_pinned_
IMPALA-2664 - Avoid sending large partition stats objects over thrift
IMPALA-2731 - Refactor MemPool usage in HBase scan node
KITE-1089 - readAvroContainer morphline command should work even if the Avro writer schema of each input file is different
PIG-3641 - Split "otherwise" producing incorrect output when combined with ColumnPruning
SENTRY-565 - Improve performance of filtering Hive SHOW commands
SENTRY-702 - Hive binding should support RELOAD command
SENTRY-936 - getGroup and getUser should always return orginal hdfs values for paths in prefixes which are not Sentry managed
SENTRY-960 - Blacklist reflect, java_method using hive.server2.builtin.udf.blacklist
SOLR-6443 - backportDisable test that fails on Jenkins with SolrCore.getOpenCount()==2
SOLR-7049 - LIST Collections API call should be processed directly by the CollectionsHandler instead of the OverseerCollectionProcessor
SOLR-7552 - Support using ZkCredentialsProvider/ZkACLProvider in custom filter
SOLR-7989 - After a new leader is elected, it should ensure it's state is ACTIVE if it has already registered with ZK
SOLR-8075 - Leader Initiated Recovery should not stop a leader that participated in an election with all of it's replicas from becoming a valid leader
SOLR-8223 - Avoid accidentally swallowing OutOfMemoryError
SOLR-8288 - DistributedUpdateProcessor#doFinish should explicitly check and ensure it does not try to put itself into LIR
SPARK-11484 - [WEBUI] Using proxyBase set by spark AM
SPARK-11652 - [CORE] Remote code execution with InvokerTransformer

Issues Fixed in CDH 5.4.8

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.4.8:

FLUME-2095 - JMS source with TIBCO
HADOOP-11261 - Set custom endpoint for S3A
HADOOP-12404 - Disable caching for JarURLConnection to avoid sharing JarFile with other users when loading resource from URL in Configuration class
HADOOP-12413 - AccessControlList should avoid calling getGroupNames in isUserInList with empty groups
HDFS-7916 - 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
HDFS-7978 - Add LOG.isDebugEnabled() guard for some LOG.debug()
HDFS-8384 - Allow NN to startup if there are files having a lease but are not under construction
HDFS-8735 - Inotify: All events classes should implement toString() API
HDFS-8860 - Remove unused Replica copyOnWrite code
HDFS-8964 - When validating the edit log, do not read at or beyond the file offset that is being written
HDFS-8965 - Harden edit log reading code against out of memory errors
MAPREDUCE-5918 - LineRecordReader can return the same decompressor to CodecPool multiple times
MAPREDUCE-5948 - org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
MAPREDUCE-6273 - HistoryFileManager should check whether summaryFile exists to avoid FileNotFoundException causing HistoryFileInfo into MOVE_FAILED state
MAPREDUCE-6481 - LineRecordReader may give incomplete record and wrong position/key information for uncompressed input sometimes
MAPREDUCE-6484 - YARN Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled
YARN-2666 - TestFairScheduler.testContinuousScheduling fails intermittently
YARN-3385 - Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes
YARN-3469 - ZKRMStateStore: Avoid setting watches that are not required.
YARN-3943 - Use separate threshold configurations for disk-full detection and disk-not-full detection
HBASE-13217 - Procedure fails due to ZK issue
HBASE-13331 - Exceptions from DFS client can cause CatalogJanitor to delete referenced files
HBASE-13388 - Handling NullPointer in ZKProcedureMemberRpcs while getting ZNode data
HBASE-13933 - DBE's seekBefore with tags corrupts the tag's offset information thus leading to incorrect results
HBASE-14196 - Thrift server idle connection timeout issue
HBASE-14302 - TableSnapshotInputFormat should not create back references when restoring snapshot
HBASE-14347 - Add a switch to DynamicClassLoader to disable it
HBASE-14385 - Close the sockets that is missing in connection closure
HBASE-14394 - Properly close the connection after reading records from table
HBASE-14471 - Thrift - HTTP Error 413 full HEAD if using Kerberos authentication
HBASE-14492 - Increase REST server header buffer size from 8k to 64k
HIVE-5545 - HCatRecord getInteger method returns String when used on Partition columns of type INT
HIVE-8529 - HiveSessionImpl#fetchResults should not try to fetch operation log when hive.server2.logging.operation.enabled is false
HIVE-9867 - Migrate usage of deprecated Calcite methods
HIVE-9984 - JoinReorder's getOutputSize is exponential
HIVE-10021 - "Alter index rebuild" statements submitted through HiveServer2 fail when Sentry is enabled
HIVE-10122 - Hive metastore filter-by-expression is broken for non-partition expressions
HIVE-10421 - DROP TABLE with qualified table name ignores database name when checking partitions
HIVE-10451 - PTF deserializer fails if values are not used in reducer
HIVE-10658 - Insert with values clause may expose data that should be encrypted
HIVE-10980 - Merge of dynamic partitions loads all data to default partition
HIVE-11077 - Part of Exchange partition does not properly populate fields for post/pre execute hooks.
HIVE-11440 - Create Parquet predicate push down (PPD) unit tests and q-tests
HIVE-11504 - Predicate pushing down does not work for float type for Parquet
HIVE-11590 - AvroDeserializer is very chatty
HIVE-11618 - BackportCorrect the SARG api to reunify the PredicateLeaf.Type INTEGER and LONG
HIVE-11657 - HIVE-2573 introduces some issues during metastore init (and CLI init)
HIVE-11695 - If user has no permission to create LOCAL DIRECTORY, the Hql does not throw any exception and fails silently
HIVE-11696 - Exception when table-level serde is Parquet while partition-level serde is JSON
HIVE-11712 - Duplicate groupby keys cause ClassCastException
HIVE-11737 - IndexOutOfBounds compiling query with duplicated groupby keys
HIVE-11745 - Alter table Exchange partition with multiple partition_spec is not working
HIVE-11816 - Upgrade groovy to 2.4.4
HIVE-11824 - Insert to local directory causes staging directory to be copied
HIVE-11843 - Add 'sort by c' to Parquet PPD q-tests to avoid different output issues with hadoop-1
HIVE-11891 - Add basic performance logging to metastore calls
HIVE-11926 - Backport:Stats annotation might not extract stats for varchar/decimal columns
HIVE-11982 - Some test cases for union all fail with recent changes
HIVE-11995 - Remove repetitively setting permissions in insert/load overwrite partition
HUE-2881 - [oozie] A fork can point to a deleted node
IMPALA-1136 - Support loading Avro tables without an explicit Avro schema
IMPALA-1899 - Cleanup handling of Hive's field schema3e0fee5 IMPALA-2369, IMPALA-2435: Impala crashes when the sorter hits an OOM error
IMPALA-2130 - Wrong verification of Parquet file version
IMPALA-2161 - Skip \u0000 characters when dealing Avro schemas
IMPALA-2165 - Avoid cardinality 0 in scan nodes of small tables and low selectivity
IMPALA-2168 - Do not try to access streams of repartitioned spilled partition in right-joins
IMPALA-2213 - Make Parquet scanner fail query if the file size metadata is stale
IMPALA-2249 - Avoid allocating StringBuffer > 1GB in ScannerContext::Stream::GetBytesInternal()
IMPALA-2256 - Handle joins with right side of high cardinality and zero materialized slots
IMPALA-2270 - Avoid FnvHash64to32 with empty inputs
IMPALA-2284 - Disallow long (1<<30) strings in group_concat()
IMPALA-2292 - Change the type of timestamp_col to string in the table no_avro_schema.
IMPALA-2314 - LargestSpilledPartition was not checking if partition is closed
IMPALA-2348 - The catalog does not close the connection to HMS during table invalidation
IMPALA-2364 - Wrong DCHECK in PHJ::ProcessProbeBatch
IMPALA-2366 - Check fread return code correctly
IMPALA-2440 - Fix old HJ full outer join with no rows
IMPALA-2477 - Parquet metadata randomly 'appears stale'
IMPALA-2514 - DCHECK on destroying an ExprContext
KITE-1069 - Make zkClientSessionTimeout and zkClientConnectTimeout configurable in SolrLocator
KITE-1074 - Partial updates aka Atomic updates with loadSolr aren't recognized with SolrCloud
MAHOUT-1771 - Cluster dumper omits indices and 0 elements for dense vector or sparse containing 0s
OOZIE-2376 - Default action configs not honored if no <configuration> section in workflow
SENTRY-878 - collect_list missing from HIVE_UDF_WHITE_LIST
SENTRY-884 - Give execute permission by default to paths managed by sentry
SENTRY-893 - Synchronize calls in SentryClient and create sentry client once per request in SimpleDBProvider
SOLR-5776 - Use less TLS/SSL in a test run
SOLR-7109 - Indexing threads stuck during network partition can put leader into down state
SOLR-7844 - Zookeeper session expiry during shard leader election can cause multiple leaders
SOLR-7956 - There are interrupts on shutdown in places that can cause ChannelAlreadyClosed exceptions which prevents proper closing of transaction logs and can poison the IndexWriter and interfere with the HDFS client
SOLR-8046 - HdfsCollectionsAPIDistributedZkTest checks that no transaction logs failed to be opened during the test but does not isolate this to the test and could fail due to other tests
SOLR-8069 - Ensure that only the valid ZooKeeper registered leader can put a replica into Leader Initiated Recovery
SOLR-8075 - Leader Initiated Recovery should not stop a leader that participated in an election with all of it's replicas from becoming a valid leader
SOLR-8077 - Replication can still cause index corruption
SOLR-8085 - Fix a variety of issues that can result in replicas getting out of sync
SOLR-8094 - HdfsUpdateLog should not replay buffered documents as a replacement to dropping them
SOLR-8095 - Add enable prop for HDFS Locality Metrics
SOLR-8121 - It looks like ChaosMonkeySafeLeader test can fail with replica inconsistency because waitForThingsToLevelOut can pass while state is still changing.
SPARK-6880 - Spark Shutdowns with NoSuchElementException when running parallel collect on cachedRDD
SQOOP-2597 - Missing method AvroSchemaGenerator.generate()

Published Known Issues Fixed

As a result of the above fixes, the following issues, previously published as Known Issues in CDH 5, are also fixed.

Spurious warning in MRv1 jobs

The mapreduce.client.genericoptionsparser.used property is not correctly checked by JobClient and this leads to a spurious warning.

Cloudera Bug: CDH-9740

Workaround: MapReduce jobs using GenericOptionsParser or implementing Tool can remove the warning by setting this property to true.

Spark Sink requires `spark-assembly.jar` in Flume classpath

In CDH 5.4.0, Flume requires spark-assembly.jar in the Flume classpath to use the Spark Sink. Without this, the sink fails with a dependency issue.

Bug: SPARK-7038

Cloudera Bug: CDH-27210

Workaround: Use the Spark Sink from CDH 5.3 with Spark from CDH 5.4, or add spark-assembly.jar to the FLUME_CLASSPATH.

Streaming incompatibility between Spark 1.2 and 1.3

Applications built as a JAR with dependencies ("uber JAR") must be built for the specific version of Spark running on the cluster.

Cloudera Bug: CDH-26527

Workaround: Rebuild the JAR with the Spark dependencies in pom.xml pointing to the specific version of Spark running on the target cluster.

Configuring more than one NT domain does not work in CDH 5.4.0

Trying to add users and groups using the multi-NT domain feature (http://gethue.com/hadoop-tutorial-make-hadoop-more-accessible-by-integrating-multiple-ldap-servers/) produces an error.

Bug: HUE-2665

Cloudera Bug: CDH-26431

Workaround: None.

If Sentry is enabled, the `RELOAD` command cannot be executed in the Hive CLI or Beeline.

Bug: SENTRY-702

Cloudera Bug: CDH-25786

Workaround: None.

Issues Fixed in CDH 5.4.7

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.4.7:

CRUNCH-525 - Correct (more) accurate default scale factors for built-in MapFn implementations
CRUNCH-527 - Use hash smearing for partitioning
CRUNCH-528 - Improve Pair comparison
CRUNCH-531 - Fix split graph rendering typo.
CRUNCH-535 - call initCredentials on the job
CRUNCH-536 - Refactor CrunchControlledJob.Hook interface and make it client-accessible
CRUNCH-539 - Fix reading WritableComparables bimap
CRUNCH-540 - Make AvroReflectDeepCopier serializable
CRUNCH-543 - Have AvroPathPerKeyTarget handle child directories properly
CRUNCH-544 - Improve performance/serializability of materialized toMap.
CRUNCH-546 - Remove calls to CellUtil.cloneXXX
CRUNCH-547 - Properly handle nullability for Avro union types
CRUNCH-548 - Have the AvroReflectDeepCopier use the class of the source object when constructing new instances instead of the target class
CRUNCH-551 - Make the use of Configuration objects consistent in CrunchInputSplit and CrunchRecordReader
CRUNCH-553 - Fix record drop issue that can occur w/From.formattedFile TableSources
FLUME-1934 - Spooling Directory Source dies on encountering zero-byte files.
FLUME-2753 - Error when specifying empty replace string in Search and Replace Interceptor
HADOOP-12317 - Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
HDFS-8806 - Inconsistent metrics: number of missing blocks with replication factor 1 not properly cleared
HDFS-8850 - VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks.
MAPREDUCE-5817 - Mappers get rescheduled on node transition even after all reducers are completed.
MAPREDUCE-6277 - Job can post multiple history files if attempt loses connection to the RM
MAPREDUCE-6439 - AM may fail instead of retrying if RM shuts down during the allocate call.
YARN-2921 - Fix MockRM/MockAM#waitForState sleep too long.
YARN-3823 - Fix mismatch in default values for yarn.scheduler.maximum-allocation-vcores property
YARN-3990 - AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected
HBASE-13329 - ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray.
HBASE-13437 - ThriftServer leaks ZooKeeper connections
HBASE-13471 - Fix a possible infinite loop in doMiniBatchMutation
HBASE-13684 - Allow mlockagent to be used when not starting as root
HBASE-14162 - Fixing maven target for regenerating thrift classes fails against 0.9.2
HBASE-14354 - Minor improvements for usage of the mlock agent
HIVE-7476 - CTAS does not work properly for s3
HIVE-9327 - CBO (Calcite Return Path): Removing Row Resolvers from ParseContext
HIVE-9512 - HIVE-9327 causing regression in stats annotation
HIVE-9580 - Server returns incorrect result from JOIN ON VARCHAR columns
HIVE-9613 - Left join query plan outputs wrong column when using subquery
HIVE-10085 - Lateral view on top of a view throws RuntimeException
HIVE-10140 - Window boundary is not compared correctly
HIVE-10288 - Cannot call permanent UDFs
HIVE-10319 - Hive CLI startup takes a long time with a large number of databases
HIVE-10719 - Hive metastore failure when alter table rename is attempted.
HIVE-10875 - Select query with view in subquery adds underlying table as direct input
HIVE-10906 - Value based UDAF function without orderby expression throws NPE
HIVE-10911 - Add support for date datatype in the value based windowing function
HIVE-10972 - DummyTxnManager always locks the current database in shared mode, which is incorrect
HIVE-10985 - Value based windowing on timestamp and double can't handle NULL value
HIVE-10996 - Aggregation / Projection over Multi-Join Inner Query producing incorrect results
HIVE-11139 - PROPOSEDQTest combine2_hadoop20.q fails when using -Phadoop-1 profile due to
HIVE-11172 - Vectorization wrong results for aggregate query with where clause without group by
HIVE-11203 - Beeline force option does not force execution when errors occurred in a script.
HIVE-11250 - Change in spark.executor.instances (and others) does not take effect after RSC is launched for HS2
HIVE-11255 - get_table_objects_by_name() in HiveMetaStore.java needs to retrieve table objects in multiple batches
HIVE-11258 - The function drop_database_core() of HiveMetaStore.java may not drop all the tables
HIVE-11271 - java.lang.IndexOutOfBoundsException when union all with if function
HIVE-11288 - Avro SerDe InstanceCache returns incorrect schema
HIVE-11333 - ColumnPruner prunes columns of UnionOperator that should be kept
HIVE-11502 - Map side aggregation is extremely slow
HIVE-11604 - HIVE return wrong results in some queries with PTF function
HIVE-11620 - Fix several qtest output order
HUE-2873 - [oozie] Handle TransactionManagementError on workflow dashboard
HUE-2877 - [desktop] Add pyasn1 and ndg_httpsclient to support SSL Server Name Indication
HUE-2880 - [hadoop] Fix uploading large files to a kerberized HTTPFS
HUE-2882 - [oozie] Fix parsing error when workflow job uses Australian timezone
HUE-2883 - [impala] Canceling a query shows an error message
HUE-2885 - [oozie] Java options java-opts not generated correctly in XML
HUE-2893 - [desktop] Backport CherryPy SSL file upload fix
HUE-2903 - [oozie] Fix error with Workflow parameter on rerun
IMPALA-1737 - Substitute an InsertStmt's partition key exprs with the root node's smap.
IMPALA-1756 - Constant filter expressions are not checked for errors and state cleanup is not done before throwing exception.
IMPALA-1898 - Explicit aliases + ordinals analysis bug
IMPALA-1983 - Warn if table stats are potentially corrupt.
IMPALA-1987 - Fix TupleIsNullPredicate to return false if no tuples are nullable.
IMPALA-2088 - Fix planning of empty union operands with analytics.
IMPALA-2089 - Retain eq predicates bound by grouping slots with complex grouping exprs.
IMPALA-2178 - fix Expr::ComputeResultsLayout() logic.
IMPALA-2199 - Row count not set for empty partition when spec is used with compute incremental stats
IMPALA-2201 - Unconditionally update the partition stats and row count.
IMPALA-2203 - Set an InsertStmt's result exprs from the source statement's result exprs.
IMPALA-2216 - Set the output smap of an EmptySetNode produced from an empty inline view.
IMPALA-2239 - update misc.test to match the new .test file format.
IMPALA-2266 - Pass correct child node in 2nd phase merge aggregation.
KITE-1053 - Fix int overflow bug in FS writer.
SENTRY-810 - CTAS without location is not verified properly
SOLR-7135 - Allow the server build.xml 'sync-hack' target to by skipped by specifying a system property.
SOLR-7999 - SolrRequetParserTest#testStreamURL started failing.
SPARK-8606 - Prevent exceptions in RDD.getPreferredLocations() from crashing DAGScheduler
ZOOKEEPER-442 - need a way to remove watches that are no longer of interest

Issues Fixed in CDH 5.4.5

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.4.5:

CRUNCH-508 - Improve performance of Scala Enumeration counters in Scrunch
CRUNCH-511 - Scrunch product type support should use derived() instead of derivedImmutable()
CRUNCH-514 - AvroDerivedDeepCopier should initialize delegate MapFns
CRUNCH-516 - Scrunch needs some additional null checks
CRUNCH-530 - Fix object reuse bug in GenericRecordToTuple
CRUNCH-542 - Wider tolerance for flaky scrunch PCollectionTest
FLUME-2215 - ResettableFileInputStream can't support ucs-4 character
FLUME-2732 - Make maximum tolerated failures before shutting down and recreating client in AsyncHbaseSink configurable
FLUME-2738 - Async HBase sink FD leak on client shutdown
FLUME-2749 - Kerberos configuration error when using short names in multiple HDFS Sinks
HADOOP-12017 - Hadoop archives command should use configurable replication factor when closing
HADOOP-12103 - Small refactoring of DelegationTokenAuthenticationFilter to allow code sharing
HADOOP-8151 - Error handling in snappy decompressor throws invalid exceptions
HDFS-7501 - TransactionsSinceLastCheckpoint can be negative on SBNs
HDFS-7546 - Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern
HDFS-7890 - Improve information on Top users for metrics in RollingWindowsManager and lower log level
HDFS-7894 - Rolling upgrade readiness is not updated in jmx until query command is issued.
HDFS-8072 - Reserved RBW space is not released if client terminates while writing block
HDFS-8337 - Accessing httpfs via webhdfs doesn't work from a jar with kerberos
HDFS-8656 - Preserve compatibility of ClientProtocol#rollingUpgrade after finalization
HDFS-8681 - BlockScanner is incorrectly disabled by default
MAPREDUCE-5965 - Hadoop streaming throws error if list of input files is high.
YARN-3143 - RM Apps REST API can return NPE or entries missing id and other fields
YARN-3453 - Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
YARN-3535 - Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
YARN-3793 - Several NPEs when deleting local files on NM recovery
YARN-3842 - NMProxy should retry on NMNotYetReadyException
HBASE-13342 - Fix incorrect interface annotations
HBASE-13419 - Thrift gateway should propagate text from exception causes.
HBASE-13491 - Fix bug in FuzzyRowFilter#getNextForFuzzyRule
HBASE-13851 - RpcClientImpl.close() can hang with cancelled replica RPCs
HBASE-13885 - ZK watches leaks during snapshots
HBASE-13958 - RESTApiClusterManager calls kill() instead of suspend() and resume()
HBASE-13995 - ServerName is not fully case insensitive
HBASE-14027 - Clean up netty dependencies
HBASE-14045 - Bumping thrift version to 0.9.2.
HBASE-14076 - ResultSerialization and MutationSerialization can throw InvalidProtocolBufferException when serializing a cell larger than 64MB
HIVE-10252 - Make PPD work for Parquet in row group level
HIVE-10270 - Cannot use Decimal constants less than 0.1BD
HIVE-10553 - Remove hardcoded Parquet references from SearchArgumentImpl SearchArgumentImpl
HIVE-10706 - Make vectorized_timestamp_funcs test more stable
HIVE-10801 - 'drop view' fails throwing java.lang.NullPointerException
HIVE-10808 - Inner join on Null throwing Cast Exception
HIVE-11150 - Remove wrong warning message related to chgrp
HIVE-11174 - Hive does not treat floating point signed zeros as equal
HIVE-11216 - UDF GenericUDFMapKeys throws NPE when a null map value is passed in
HIVE-11401 - Predicate push down does not work with Parquet when partitions are in the expression expression
HIVE-6099 - Multi insert does not work properly with distinct count
HIVE-9500 - Support nested structs over 24 levels
HIVE-9665 - Parallel move task optimization causes race condition
HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
HIVE-10437 - NullPointerException on queries where map/reduce is not involved on tables with partitions
HIVE-10895 - ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
HIVE-10976 - Redundant HiveMetaStore connect check in HS2 CLIService start
HIVE-10977 - No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled
HIVE-11095 - Fix SerDeUtils bug when Text is reused
HIVE-11100 - Beeline should escape semi-colon in queries
HIVE-11112 - ISO-8859-1 text output has fragments of previous longer rows appended
HIVE-11157 - Hive.get(HiveConf) returns same Hive object to different user sessions
HIVE-11194 - Exchange partition on external tables should fail with error message when target folder already exists
HIVE-11433 - NPE for a multiple inner join query
HIVE-9767 - Fixes in Hive UDF to be usable in Pig
HIVE-10629 - Dropping table in an encrypted zone does not drop warehouse directory
HIVE-10630 - Renaming tables across encryption zones renames table even though the operation throws error
HIVE-10659 - Beeline command which contains semi-colon as a non-command terminator will fail
HIVE-10788 - Change sort_array to support non-primitive types
HIVE-10895 - ObjectStore does not close Query objects in some calls causing potential leak.
HIVE-11109 - Replication factor is not properly set in SparkHashTableSinkOperator [Spark Branch]
HIVE-10594 - Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch]
HUE-2618 - [hive] Recent query results show character encoding in view
HUE-2767 - [impala] Issue showing sample data for a table
HUE-2796 - sync_groups_on_login doesn't work with posixGroups
HUE-2807 - [useradmin] Support deleting numeric groups
HUE-2808 - [dbquery] Add row numbers to support default order by
HUE-2813 - [hive] Report when Hue server is down when trying to execute a query
HUE-2814 - Revert pyopenssl 0.13.1
HUE-2835 - Fixed issue with DN's that have weird comma location
HUE-2840 - [useradmin] Fix create home directories for Add/Sync LDAP group
HUE-2849 - [useradmin] Fix exception in Add/Sync LDAP group for undefined group name
IMPALA-1929 - Avoiding a DCHECK of NULL hash table in spilled right joins
IMPALA-2136 - Bug in PrintTColumnValue caused wrong stats for TINYINT partition cols
IMPALA-2133 - Properly unescape string value for HBase filters
IMPALA-2018 - Where clause does not propagate to joins inside nested views
IMPALA-2064 - Add effective_user() builtin
IMPALA-2125 - Make UTC to local TimestampValue conversion faster.
IMPALA-2048 - Impala DML/DDL operations corrupt table metadata leading to Hive query failures
KITE-1014 - Fix support for Hive datasets on Kerberos enabled clusters.
KITE-1015 - Add "replaceValues" morphline command that replaces all matching record field values with a given replacement string
KITE-462 - Oozie jobs do not pass credentials
KITE-976 - DatasetKeyInputFormat/DatasetKeyOutputFormat not setting job configuration before loading dataset
KITE-1030 - readCSV WARN log msg on overly long lines where quoteChar is non-empty should print the whole record seen so far
OOZIE-2268 - Update ActiveMQ version for security and other fixes
OOZIE-2286 - Update Log4j and Log4j-extras to latest 1.2.x release
PIG-4053 - PIG-4053: TestMRCompiler succeeded with sun jdk 1.6 while failed with sun jdk 1.7
PIG-4338 - PIG-4338: Fix test failures with JDK8
PIG-4326 - PIG-4326: AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records
SENTRY-695 - Sentry service should read the hadoop group mapping properties from core-site
SENTRY-721 - HDFS Cascading permissions not applied to child file ACLs if a direct grant exists
SENTRY-752 - Sentry service audit log file name format should be consistent
SOLR-7457 - Make DirectoryFactory publishing MBeanInfo extensible
SOLR-7458 - Expose HDFS Block Locality Metrics
SPARK-6480 - histogram() bucket function is wrong in some simple edge cases
SPARK-6954 - ExecutorAllocationManager can end up requesting a negative number of executors
SPARK-7503 - Resources in .sparkStaging directory can't be cleaned up on error
SPARK-7705 - Cleanup of .sparkStaging directory fails if application is killed
SQOOP-2103 - Not able define Decimal(n,p) data type in map-column-hive option
SQOOP-2149 - Update Kite dependency to 1.0.0
SQOOP-2252 - Add default to Avro Schema
SQOOP-2294 - Change to Avro schema name breaks some use cases
SQOOP-2295 - Hive import with Parquet should append automatically
SQOOP-2327 - Sqoop2: Change package name from Authorization to authorization
SQOOP-2339 - Move sub-directory might fail in append mode
SQOOP-2362 - Add oracle direct mode in list of supported databases
SQOOP-2400 - hive.metastore.sasl.enabled should be set to true for Oozie integration
SQOOP-2406 - Add support for secure mode when importing Parquet files into Hive
SQOOP-2437 - Use hive configuration to connect to secure metastore

Issues Fixed in CDH 5.4.4

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.4.4:

HIVE-10572 - Improve Hive service test to check empty string
HIVE-9934 - Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to "none", allowing authentication without password
HIVE-10006 - RSC has memory leak while execute multi queries.
HUE-2814 - Revert pyopenssl 0.13.1

Published Known Issues Fixed

As a result of the above fixes, the following issues, previously published as Known Issues in CDH 5, are also fixed.

Hue with TLS/SSL Fails to Start in CDH 5.4.3

In CDH 5.4.3, Hue with TLS/SSL fails to start because a pyOpenSSL package is missing in the parcel. This applies to both new installs and upgrades and is not operating-system specific.

Bug: CDH-29076

Release affected: CDH 5.4.3

Release containing the fix: CDH 5.4.4

Workaround:

Download the package:
1. cd /tmp
2. curl -O https://pypi.python.org/packages/source/p/pyOpenSSL/pyOpenSSL-0.13.tar.gz

Determine the Hue installation directory:

Parcels:

export HUE_DIR=/opt/cloudera/parcels/CDH-5.4.3-1.cdh5.4.3.p0.6/lib/hue

Packages:
```
export HUE_DIR=/usr/lib/hue
```

Change to the Hue installation directory:
```
cd $HUE_DIR
```
Do the following, depending on your OS:
- On CentOS/RedHat 6.x:
  1. sudo yum install gcc python-devel openssl-devel
  2. sudo ./build/env/bin/python ./build/env/bin/pip -v install /tmp/pyOpenSSL-0.13.tar.gz
- On Ubuntu 14.04:
  1. sudo apt-get install gcc python-dev python-pip libssl-dev
  2. sudo pip install --target=`pwd`/`ls -d build/env/lib/python*/site-packages` /tmp/pyOpenSSL-0.13.tar.gz
- On other platforms, contact Support for assistance.

Issues Fixed in CDH 5.4.3

Upgrades to CDH 5.4.1 from Releases Earlier than 5.4.0 May Fail

Problem: Because of a change in the implementation of the NameNode metadata upgrade mechanism, upgrading to CDH 5.4.1 from a version lower than 5.4.0 can take an inordinately long time. In a cluster with NameNode high availability (HA) configured and a large number of edit logs, the upgrade can fail, with errors indicating a timeout in the pre-upgrade step on JournalNodes.

What to do:

To avoid the problem: Do not upgrade to CDH 5.4.1; upgrade to CDH 5.4.2 instead.

If you experience the problem: If you have already started an upgrade and seen it fail, contact Cloudera Support. This problem involves no risk of data loss, and manual recovery is possible.

If you have already completed an upgrade to CDH 5.4.1, or are installing a new cluster: In this case you are not affected and can continue to run CDH 5.4.1.

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.4.3:

HADOOP-12043 - Display warning if defaultFs is not set when running fs commands.
HADOOP-11969 - ThreadLocal initialization in several classes is not thread safe
HADOOP-11402 - Negative user-to-group cache entries are never cleared for never-again-accessed users
HADOOP-11238 - Update the NameNode's Group Cache in the background when possible
HDFS-8535 - Clarify that dfs usage in dfsadmin -report output includes all block replicas.
HDFS-8486 - DN startup may cause severe data loss
HDFS-7917 - Use file to replace data dirs in test to simulate a disk failure.
HDFS-7833 - DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated.
HDFS-7604 - Track and display failed DataNode storage locations in NameNode.
HDFS-8380 - Always call addStoredBlock on blocks which have been shifted from one storage to another
HDFS-7980 - Incremental BlockReport will dramatically slow down the startup of a namenode
HDFS-8305 - HDFS INotify: the destination field of RenameOp should always end with the file name
YARN-3842 - NMProxy should retry on NMNotYetReadyException
YARN-3467 - Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI
YARN-3762 - FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
YARN-3675 - FairScheduler: RM quits when node removal races with continousscheduling on the same node
YARN-3491 - PublicLocalizer#addResource is too slow.
MAPREDUCE-6387 - Serialize the recently added Task#encryptedSpillKey field at the end
HBASE-13481 - Master should respect master (old) DNS/bind related configurations
HBASE-13826 - Unable to create table when group acls are appropriately set.
HBASE-13729 - Old hbase.regionserver.global.memstore.upperLimit and hbase.regioniserver.global.memstore.lowerLimit properties are ignored if present
HBASE-13789 - ForeignException should not be sent to the client
HBASE-13779 - Calling table.exists() before table.get() end up with an empty Result
HBASE-13780 - Default to 700 for HDFS root dir permissions for secure deployments
HBASE-13768 - ZooKeeper znodes are bootstrapped with insecure ACLs in a secure configuration
HBASE-13767 - Allow ZKAclReset to set and not just clear ZK ACLs
HBASE-13086 - Show ZK root node on Master WebUI
HBASE-13413 - Create an integration test for Replication
HBASE-13611 - update clover to work for current versions
HIVE-10841 - [WHERE col is not null] does not work sometimes for queries with many JOIN statements
HIVE-10956 - HS2 leaks HMS connections
HIVE-10571 - HiveMetaStoreClient should close existing thrift connection before its reconnect
HIVE-10835 - Concurrency issues in JDBC driver
HIVE-10802 - Table join query with some constant field in select fails
HIVE-10538 - Fix NPE in FileSinkOperator from hashcode mismatch
HIVE-10771 - "separatorChar" has no effect in "CREATE TABLE AS SELECT" statement
HIVE-10732 - Hive JDBC driver does not close operation for metadata queries
HIVE-10151 - insert into A select from B is broken when both A and B are Acid tables and bucketed the same way
HIVE-10483 - insert overwrite partition deadlocks on itself with DbTxnManager
HIVE-10050 - Support overriding memory configuration for AM launched for TempletonControllerJob
HIVE-10242 - ACID: insert overwrite prevents create table command
HIVE-10481 - ACID table update finishes but values not really updated if column names are not all lower case
HIVE-10150 - delete from acidTbl where a in(select a from nonAcidOrcTbl) fails
HIVE-10721 - SparkSessionManagerImpl leaks SparkSessions [Spark Branch]
HIVE-10671 - yarn-cluster mode offers a degraded performance from yarn-client [Spark Branch]
HIVE-10453 - HS2 leaking open file descriptors when using UDFs
HIVE-2573 - Create per-session function registry
HIVE-9520 - Create NEXT_DAY UDF
HIVE-9143 - select user(), current_user()
HIVE-5472 - support a simple scalar which returns the current timestamp
HIVE-10646 - ColumnValue does not handle NULL_TYPE
HUE-2784 - [oozie] Coordinator editor generate wrong Monday cron expression
HUE-2793 - [JB] Fix Mapper & Reducer counts in job page
HUE-2776 - [jb] Fix "View All Tasks" pagination
HUE-2778 - [jobbrowser] Fix "Text Filter" search box in JB "View All Tasks" page
HUE-2767 - [impala] Issue showing sample data for a table
HUE-2754 - [oozie] Sqoop action with variable adds an empty argument
HUE-2743 - [search] Error HTML style leaks in the UI
HUE-2587 - [jb] Kill jobs in accepted state
HUE-2731 - [core] Validate that Hue is running in collect data script
HUE-2687 - [core] Create script to gather hue process info for troubleshooting
HUE-2656 - [tools] Add cron scripts for restart when mem usage is high
HUE-2701 - [oozie] Java action relative jar path results in error on submit
HUE-2703 - [sentry] Make more obvious why a user is not a Sentry admin
HUE-2741 - [home] Hide the document move dialog
HUE-2739 - [metastore] Autocomplete with databases/tables with built in names fails
HUE-2732 - Hue isn't correctly doing add_column migrations with non-blank defaults
IMPALA-1963: Impala Timestamp ISO-8601 Support.
IMPALA-2043: skip metadata/testddl.py#test_create_alter_bulk_partition on S3
IMPALA-1968: Part 1: Improve planner numNodes estimate for remote scans
IMPALA-1730: reduce scanner thread spinning windows
IMPALA-2002: Provide way to cache ext data source classes
IMPALA-2008: Fix wrong warning when insert overwrite to empty table
IMPALA-1381: Expand set of supported timezones.
IMPALA-1952: Expand parsing of decimals to include scientific notation
SENTRY-227 - Fix for "Unsupported entity type DUMMYPARTITION"
SOLR-7503 - Recovery after ZK session expiration happens in a single thread for all cores in a node
SPARK-6299 - ClassNotFoundException in standalone mode when running groupByKey with class defined in REPL.
SPARK-5522 - Accelerate the History Server start

Published Known Issues Fixed

Migrations to MySQL fail if multiple Hue users have the same name but different upper/lower case letters

Bug: CDH-24213

Workaround: None.

Issues Fixed in CDH 5.4.2

Upgrades to CDH 5.4.1 from Releases Earlier than 5.4.0 May Fail

What to do:

To avoid the problem: Do not upgrade to CDH 5.4.1; upgrade to CDH 5.4.2 instead.

If you experience the problem: If you have already started an upgrade and seen it fail, contact Cloudera Support. This problem involves no risk of data loss, and manual recovery is possible.

If you have already completed an upgrade to CDH 5.4.1, or are installing a new cluster: In this case you are not affected and can continue to run CDH 5.4.1.

Issues Fixed in CDH 5.4.1

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.4.1:

HADOOP-11891 - OsSecureRandom should lazily fill its reservoir to avoid open too many file descriptors.
HADOOP-11802 - DomainSocketWatcher thread terminates sometimes after there is an I/O error during requestShortCircuitShm
HADOOP-11724 - DistCp throws NPE when the target directory is root.
HDFS-7645 - Rolling upgrade is restoring blocks from trash multiple times which could cause significant and unnecessary block churn.
HDFS-7869 - Inconsistency in the return information while performing rolling upgrade
HDFS-8127 - NameNode Failover during HA upgrade can cause DataNode to finalize upgrade
HDFS-3443 - Fix NPE when NameNode transition to active during startup.
HDFS-7312 - Update DistCp v1 to optionally not use tmp location
HDFS-8292 - Move conditional in fmt_time from dfs-dust.js to status.html
HDFS-6673 - Add delimited format support to PB OIV tool
HDFS-8214 - Secondary NN Web UI shows wrong date for Last Checkpoint
HDFS-7884 - Fix NullPointerException in BlockSender when the generation stamp provided by the client is larger than the one stored in the DataNode
HDFS-4448 - Allow HA NN to start in secure mode with wildcard address configured
HDFS-8070 - Fix issue that Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
HDFS-7915 - The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error
HDFS-7931 - DistributedFileSystem should not look for keyProvider in cache if Encryption is disabled
HDFS-7916 - 'reportBadBlocks' from DataNodes to standby Node BPServiceActor goes for infinite loop
HDFS-8099 - Change "DFSInputStream has been closed already" message to debug log level
HDFS-7996 - After swapping a volume, BlockReceiver reports ReplicaNotFoundException
HDFS-7587 - Edit log corruption can happen if append fails with a quota violation
HDFS-7881 - TestHftpFileSystem#testSeek fails
HDFS-7929 - inotify is unable to fetch pre-upgrade edit log segments once upgrade starts
YARN-3363 - Add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
YARN-3485 - FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies
YARN-3464 - Race condition in LocalizerRunner kills localizer before localizing all resources
YARN-3516 - Killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status.
YARN-3021 - YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
YARN-3241 - FairScheduler handles invalid queue names inconsistently.
YARN-2868 - FairScheduler: Add a metric for measuring latency of allocating first container for an application
YARN-3428 - Add debug logs to capture the resources being localized for a container.
MAPREDUCE-6339 - Job history file is not flushed correctly because isTimerActive flag is not set true when flushTimerTask is scheduled.
MAPREDUCE-5710 - Running distcp with -delete incurs avoidable penalties
MAPREDUCE-6343 - JobConf.parseMaximumHeapSizeMB() fails to parse value greater than 2GB expressed in bytes
MAPREDUCE-6238 - MR2 can't run local jobs with -libjars command options which is a regression from MR1
MAPREDUCE-6076 - Zero map split input length combined with none zero map split input length may cause MR1 job hung sometimes.
HBASE-13374 - Small scanners (with particular configurations) do not return all rows
HBASE-13269 - Limit result array pre-allocation to avoid OOME with large scan caching values
HBASE-13335 - Update ClientSmallScanner and ClientSmallReversedScanner to use serverHasMoreResults context
HBASE-13534 - Change HBase master WebUI to explicitly mention if it is a backup master
HBASE-13111 - truncate_preserve command is failing with undefined method error
HBASE-13430 - HFiles that are in use by a table cloned from a snapshot may be deleted when that snapshot is deleted
HBASE-13546 - NPE on RegionServer status page if all masters are down
HBASE-13350 - Add a debug-warning if we fail HTD checks even if table.sanity.checks is disabled
HBASE-13262 - ResultScanner doesn't return all rows in Scan
HIVE-10452 - Avoid sending Beeline prompt+query to the standard output/error only when in script mode.
HIVE-10541 - Beeline requires newline at the end of each query in a file
HIVE-9625 - Delegation tokens for HMS are not renewed
HIVE-10499 - Ensure Session/ZooKeeperClient instances are closed
HIVE-10312 - SASL.QOP in JDBC URL is ignored for Delegation token Authentication
HIVE-10324 - Hive metatool should take table_param_key to allow for changes to avro serde's schema url key
HIVE-10202 - Beeline outputs prompt+query on standard output when used in non-interactive mode
HIVE-10087 - Beeline's --silent option should suppress query from being echoed when running with -f option
HIVE-10098 - HS2 local task for map join fails in KMS encrypted cluster
HIVE-10146 - Add option to not count session as idle if query is running
HIVE-10108 - Index#getIndexTableName() should return db.index_table_name instead of qualified table name
HIVE-10093 - Unnecessary HMSHandler initialization for default MemoryTokenStore on HS2
HIVE-10085 - Lateral view on top of a view throws RuntimeException
HIVE-10086 - Parquet file using column index access throws error in Hive
HIVE-9839 - HiveServer2 leaks OperationHandle on async queries which fail at compile phase
HIVE-9920 - DROP DATABASE IF EXISTS throws exception if database does not exist
HIVE-10476 - Hive query should fail when it fails to initialize a session in SetSparkReducerParallelism
HIVE-10434 - Cancel connection when remote Spark driver process has failed
HIVE-10473 - Spark client is recreated even spark configuration is not changed
HIVE-10291 - Hive on Spark job configuration needs to be logged
HIVE-10143 - HS2 fails to clean up Spark client state on timeout
HIVE-10073 - Runtime exception when querying HBase with Spark
HUE-2723 - [hive] Listing table information in non default DB fails
HUE-2722 - [hive] Query returns wrong number of rows when HiveServer2 returns data not encoded properly
HUE-2713 - [oozie] Deleting a Fork of Fork can break the workflow
HUE-2717 - [oozie] Coordinator editor does not save non-default schedules
HUE-2716 - [pig] Scripts fail on hcat auth with org.apache.hive.hcatalog.pig.HCatLoader()
HUE-2707 - [hive] Allow sample of data on partitioned tables in strict mode
HUE-2720 - [oozie] Intermittent 500s when trying to view oozie workflow history v1
HUE-2712 - [oozie] Creating a fork can error
HUE-2710 - [search] Heatmap select on yelp example errors
HUE-2686 - [impala] Explain button is erroring
HUE-2671 - [core] sync_groups_on_login doesn't work with NT Domain
IMPALA-1519/IMPALA-1946 - Fix wrapping of exprs via a TupleIsNullPredicate with analytics.
IMPALA-1900 - Assign predicates below analytic functions with a compatible partition by clause for partition pruning.
IMPALA-1919 - When out_batch->AtCapacity(), avoid calling ProcessBatch in right joins.
IMPALA-1960 - Illegal reference to non-materialized tuple when query has an empty select-project-join block.
IMPALA-1969 - OpenSSL init must not be called concurrently.
IMPALA-1973 - Fixing crash when uninitialized, empty row is added in HdfsTextScanner due to missing newline at the end of file.
OOZIE-2218 - META-INF directories in the war file have 777 permissions
OOZIE-2170 - Oozie should automatically set configs to make Spark jobs show up in the Spark History Server
SENTRY-699 - Memory leak when running Sentry with HiveServer2
SENTRY-703 - Calls to add_partition fail when passed a Partition object with a null location
SENTRY-696 - Improve Metastoreplugin Cache Initialization time
SENTRY-683 - HDFS service client should ensure the kerberos ticket is valid before new service connection
SOLR-7478 - UpdateLog#close shuts down it's executor with interrupts before running close, possibly preventing a clean close.
SOLR-7437 - Make HDFS transaction log replication factor configurable.
SOLR-7338/SOLR-6583 - A reloaded core will never register itself as active after a ZK session expiration.
SPARK-7281 - No option for AM native library path in yarn-client mode.
SPARK-6087 - Provide actionable exception if Kryo buffer is not large enough
SPARK-6868 - Container link broken on Spark UI Executors page when YARN is set to HTTPS_ONLY
SPARK-6506 - python support in yarn cluster mode requires SPARK_HOME to be set
SPARK-6650 - ExecutorAllocationManager never stops
SPARK-6578 - Outbound channel in network library is not thread-safe, can lead to fetch failures
SQOOP-2343 - AsyncSqlRecordWriter stuck if any exception is thrown out in its close method
SQOOP-2286 - Ensure Sqoop generates valid avro column names
SQOOP-2283 - Support usage of --exec and --password-alias
SQOOP-2281 - Set overwrite on kite dataset
SQOOP-2282 - Add validation check for --hive-import and --append
SQOOP-2257 - Import Parquet data into a hive table with --hive-overwrite option does not work
ZOOKEEPER-2146 - BinaryInputArchive readString should check length before allocating memory
ZOOKEEPER-2149 - Log client address when socket connection established

Published Known Issues Fixed

As a result of the above fixes, the following issues, previously published as Known Issues in CDH 5, are also fixed.

Apache Hadoop

NameNode cannot use wildcard address in a secure cluster

In a secure cluster, you cannot use a wildcard for the NameNode's RPC or HTTP bind address. For example, dfs.namenode.http-address must be a real, routable address and port, not 0.0.0.0.<port>. This should affect you only if you are running a secure cluster and your NameNode needs to bind to multiple local addresses.

Bug: HDFS-4448

Cloudera Bug: CDH-9991

Workaround: None

Offline Image Viewer (OIV) tool regression: missing Delimited outputs.

Bugs: HDFS-6673, HDFS-5952

Cloudera Bug: CDH-20259

Severity: Medium

Workaround: Set up dfs.namenode.legacy-oiv-image.dir to an appropriate directory on the secondary NameNode (or standby NameNode in an HA configuration), and use hdfs oiv_legacy to process the legacy format of the OIV fsimage.

Apache HBase

Setting `maxResultSize` Incorrectly On a Scan May Cause Client Data Loss

Scanners may not return all the results from a region if a scan is configured with a maxResultSize limit that could be reached before the caching limit. Results are missed because the scanner jumps to the next region preemptively.

The default value for maxResultSize is Long.MAX_VALUE and the default value of caching is 100, so with the default configuration, the caching limit will always be reached before the maxResultSize and the issue will not appear. If the maxResultSize is configured to any limit that may be reached before the caching limit, the issue may occur.

Bug: HBASE-13262

Severity: Low

Workaround: Never configure a scan with a maxResultSize other than Long.MAX_VALUE (never change it from its default value) because that will ensure that the maxResultSize limit is never reached before the caching limit.

Apache Hive

Hive metatool does not fix Avro schema URL setting in an HDFS HA upgrade

When you upgrade Hive in an HDFS HA configuration, and the avro.schema.url is set in an Avro table's properties instead of the SerDe properties, the metatool will not correct the problem.

Bug: HIVE-10324

Cloudera Bug: CDH-26976

Workaround: Use alter table.. set tblproperties to fix the avro.schema.url.

Hive metastore getIndexTableName returns qualified table name

In CDH 5.4.0, getIndexTableName returns a qualified table name such as

database_name
.
index_table_name

whereas in previous releases it returns an unqualified table name, such as

index_table_name

Bug: HIVE-10108

Cloudera Bug: CDH-26508

Workaround: None

HiveServer2 has an unexpected Derby metastore directory in secure clusters

Bug: HIVE-10093

Cloudera Bug: CDH-26453

Workaround: None. Ignore the Derby database.

Apache Oozie

Spark jobs run from the Spark action don't show up in the Spark History Server or properly link to it from the Spark AM

Bug: OOZIE-2170

Cloudera Bug: CDH-25513

Severity: Low

Workaround: Specify these configuration properties in the spark-opts element of your Spark action in the workflow.xml file:

--conf spark.yarn.historyServer.address=http://SPH:18088 --conf spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true

where SPH is the hostname of the Spark History Server and NN is the hostname of the NameNode. You can also find these values in /etc/spark/conf/spark-defaults.conf on the gateway host when Spark is installed from Cloudera Manager.

Apache Sentry

Hive binding should support enforcing URI privilege for transforms

Bug: SENTRY-598

Severity: Medium

Workaround: None.

Issues Fixed in CDH 5.4.0

The following topics describe known issues fixed in CDH 5.4.0.

For the latest Impala fixed issues, see Issues Fixed in CDH 5.4 / Impala 2.2.

Apache Hadoop

HDFS

After upgrade from a release earlier than CDH 5.2.0, storage IDs may no longer be unique

As of CDH 5.2, each storage volume on a DataNode should have its own unique storageID, but in clusters upgraded from CDH 4, or CDH 5 releases earlier than CDH 5.2.0, each volume on a given DataNode shares the same storageID, because the HDFS upgrade does not properly update the IDs to reflect the new naming scheme. This causes problems with load balancing. The problem affects only clusters upgraded from CDH 5.1.x and earlier to CDH 5.2 or later. Clusters that are new as of CDH 5.2.0 or later do not have the problem.

Bug: HDFS-7575

Cloudera Bug: CDH-24155

Severity: Medium

Workaround: Upgrade to a later or patched version of CDH.

Apache Hive

UDF infile() does not accept arguments of type CHAR or VARCHAR

Bug: HIVE-6637

Severity: Low

Workaround: Cast the argument to type String.

Hive's Decimal type cannot be stored in Parquet and Avro

Tables containing decimal columns cannot use Parquet or the Avro storage engine.

Bug: HIVE-6367 and HIVE-5823

Severity: Low

Workaround: Use a different file format.

Apache Oozie

Executing oozie `job -config properties file -dryrun` fails because of a code defect in argument parsing

Bug: OOZIE-1878

Cloudera Bug: CDH-25537

Severity: Low

Workaround: None.

When you use Hive Server 2 from Oozie, Oozie won't collect or print out the Hadoop Job IDs of any jobs launched by Hive Server 2

Cloudera Bug: CDH-12777, CDH-12891

Severity: Low

Workaround: You can get the Hadoop IDs from the Resource Manager or JobTracker.

Cloudera Search

Spark indexer failed if configured to use security.

Spark indexing jobs failed when Kerberos authentication was enabled.

With Search for CDH 5.4 and later, Spark indexing jobs succeed, even when Kerberos authentication is required.

Bug: None.

Severity: Medium.

Workaround: Disable Kerberos authentication or use another indexer.

Mapper-only HBase batch indexer failed if configured to use security.

Attempts to complete an HBase batch indexing job failed when Kerberos authentication was enabled and reducers were set to 0.

With Search for CDH 5.4 and later, mapper-only HBase batch indexer succeeds, even when Kerberos authentication is required.

Bug: None.

Severity: Medium.

Workaround: Either disable Kerberos authentication or use one or more reducers.

Shard splitting support is experimental.

Cloudera anticipated shard splitting to function as expected with Cloudera Search, but this interaction had not been thoroughly tested.

As of the release of Search for CDH 5.4, additional testing of shard splitting has been completed, so this functionality can be safely used.

Cloudera Bug: CDH-11024

Severity: Low

Workaround: Use shard splitting for test and development purposes, but be aware of the risks of using shard splitting in production environments. To avoid using shard splitting, use the source data to create a new index with a new sharding count by re-indexing the data to a new collection. You can enable this using the MapReduceIndexerTool.

`TrieDateField` defaulted `OMIT_NORMS` to True.

All primitive field types were intended to omit norms by default with schema version 1.5 or higher. This change was not applied to TrieDateField.

With Search for CDH 5.4, TrieDateField is set to omit norms by default.

Bug: SOLR-6211

Severity: Low

Fields or Types outside `<field>` or `<types>` tags are silently ignored.

In previous releases, Solr silently ignored definitions such as <fieldType>, <field>, and <copyField> if those definitions were not contained in <fields> or <types> tags.

With Search 5.4 for CDH, these tags are no longer required for definitions to be included. These tags are supported so either style may be implemented.

Bug: SOLR-5228

Apache Sentry (incubating)

`INSERT OVERWRITE LOCAL` fails if you use only the Linux pathname

Cloudera Bug: CDH-13732

Severity: Low

Workaround: Prefix the path of the local file with file:// when using INSERT OVERWRITE LOCAL.

`INSERT OVERWRITE` and `CREATE EXTERNAL` commands fail because of HDFS URI permissions

When you use Sentry to secure Hive, and use HDFS URIs in a HiveQL statement, the query will fail with an HDFS permissions error unless you specify the NameNode and port.

Cloudera Bug: CDH-13728

Severity: Low

Workaround: Specify the NameNode and port, where applicable, in the URI; for example specify hdfs://nn-uri:port/user/warehouse/hive/tab rather than simply /user/warehouse/hive/tab. In a high-availability deployment, specify the value of FS.defaultFS.

Issues Fixed in CDH 5.5.x

Issues Fixed in CDH 5.3.x

Issues Fixed in CDH 5.4.x

Issues Fixed in CDH 5.4.11

Issues Fixed in CDH 5.4.10

Apache Hadoop

FSImage may get corrupted after deleting snapshot

Upstream Issues Fixed

Issues Fixed in CDH 5.4.9

Known Issues Fixed

Apache Commons Collections deserialization vulnerability

Apache HBase

Data may not be replicated to worker cluster if multiwal multiplicity is set to greater than 1

Upstream Issues Fixed

Issues Fixed in CDH 5.4.8

Upstream Issues Fixed

Published Known Issues Fixed

Spurious warning in MRv1 jobs

Spark Sink requires spark-assembly.jar in Flume classpath

Streaming incompatibility between Spark 1.2 and 1.3

Configuring more than one NT domain does not work in CDH 5.4.0

If Sentry is enabled, the RELOAD command cannot be executed in the Hive CLI or Beeline.

Issues Fixed in CDH 5.4.7

Upstream Issues Fixed

Issues Fixed in CDH 5.4.5

Upstream Issues Fixed

Issues Fixed in CDH 5.4.4

Upstream Issues Fixed

Published Known Issues Fixed

Hue with TLS/SSL Fails to Start in CDH 5.4.3

Issues Fixed in CDH 5.4.3

Upgrades to CDH 5.4.1 from Releases Earlier than 5.4.0 May Fail

Upstream Issues Fixed

Published Known Issues Fixed

Migrations to MySQL fail if multiple Hue users have the same name but different upper/lower case letters

Issues Fixed in CDH 5.4.2

Upgrades to CDH 5.4.1 from Releases Earlier than 5.4.0 May Fail

Issues Fixed in CDH 5.4.1

Upstream Issues Fixed

Published Known Issues Fixed

Apache Hadoop

NameNode cannot use wildcard address in a secure cluster

Offline Image Viewer (OIV) tool regression: missing Delimited outputs.

Apache HBase

Setting maxResultSize Incorrectly On a Scan May Cause Client Data Loss

Apache Hive

Hive metatool does not fix Avro schema URL setting in an HDFS HA upgrade

Hive metastore getIndexTableName returns qualified table name

HiveServer2 has an unexpected Derby metastore directory in secure clusters

Apache Oozie

Spark jobs run from the Spark action don't show up in the Spark History Server or properly link to it from the Spark AM

Apache Sentry

Hive binding should support enforcing URI privilege for transforms

Issues Fixed in CDH 5.4.0

Apache Hadoop

HDFS

After upgrade from a release earlier than CDH 5.2.0, storage IDs may no longer be unique

Apache Hive

UDF infile() does not accept arguments of type CHAR or VARCHAR

Hive's Decimal type cannot be stored in Parquet and Avro

Apache Oozie

Executing oozie job -config properties file -dryrun fails because of a code defect in argument parsing

When you use Hive Server 2 from Oozie, Oozie won't collect or print out the Hadoop Job IDs of any jobs launched by Hive Server 2

Cloudera Search

Spark indexer failed if configured to use security.

Mapper-only HBase batch indexer failed if configured to use security.

Shard splitting support is experimental.

TrieDateField defaulted OMIT_NORMS to True.

Fields or Types outside <field> or <types> tags are silently ignored.

Apache Sentry (incubating)

INSERT OVERWRITE LOCAL fails if you use only the Linux pathname

INSERT OVERWRITE and CREATE EXTERNAL commands fail because of HDFS URI permissions

`FSImage` may get corrupted after deleting snapshot

Spark Sink requires `spark-assembly.jar` in Flume classpath

If Sentry is enabled, the `RELOAD` command cannot be executed in the Hive CLI or Beeline.

Setting `maxResultSize` Incorrectly On a Scan May Cause Client Data Loss

Executing oozie `job -config properties file -dryrun` fails because of a code defect in argument parsing

`TrieDateField` defaulted `OMIT_NORMS` to True.

Fields or Types outside `<field>` or `<types>` tags are silently ignored.

`INSERT OVERWRITE LOCAL` fails if you use only the Linux pathname

`INSERT OVERWRITE` and `CREATE EXTERNAL` commands fail because of HDFS URI permissions