Issues Fixed in CDH 5.8.x

The following topics describe issues fixed in CDH 5.8.x, from newest to oldest release. You can also review What's New In CDH 5.8.x or Known Issues in CDH 5.

Issues Fixed in CDH 5.8.5

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.8.5:

  • CRUNCH-592 - Job fails for null ByteBuffer value in Avro tables.
  • FLUME-1899 - Make SpoolDir work with subdirectories
  • FLUME-2171 - Add Interceptor to remove headers from event
  • FLUME-2652 - Documented transaction handling semantics incorrect in developer guide.
  • FLUME-2797 - Use SourceCounter for SyslogTcpSource
  • FLUME-2798 - Malformed Syslog messages can lead to OutOfMemoryException
  • FLUME-2812 - Fix semaphore leak causing java.lang.Error: Maximum permit count exceeded in MemoryChannel
  • FLUME-2844 - SpillableMemoryChannel must start ChannelCounter
  • FLUME-2889 - Fixes to DateTime computations
  • FLUME-2901 - Document Kerberos setup for Kafka channel
  • FLUME-2910 - AsyncHBaseSink: Failure callbacks should log the exception that caused them
  • FLUME-2913 - Don't strip SLF4J from imported classpaths
  • FLUME-2918 - Speed up TaildirSource on directories with many files
  • FLUME-2922 - Sync SequenceFile.Writer before calling hflush
  • FLUME-2923 - Bump asynchbase version to 1.7.0
  • FLUME-2934 - Document new cachePatternMatching option for TaildirSource
  • FLUME-2935 - Bump java target version to 1.7
  • FLUME-2948 - docs: Fix parameters on Replicating Channel Selector example
  • FLUME-2954 - Make raw data appearing in log messages explicit
  • FLUME-2963 - FlumeUserGuide: Fix error in Kafka Source properties table
  • FLUME-2972 - Handle offset migration in the new Kafka Channel
  • FLUME-2975 - docs: Fix NetcatSource example
  • FLUME-2982 - Add localhost escape sequence to HDFS sink
  • FLUME-2983 - Handle offset migration in the new Kafka Source
  • FLUME-2999 - Kafka channel and sink should enable statically assigned partition per event via header
  • FLUME-3020 - Improve HDFS Sink escape sequence substitution
  • FLUME-3027 - Change Kafka Channel to clear offsets map after commit
  • FLUME-3031 - Change sequence source to reset its counter for event body on channel exception
  • FLUME-3049 - Make HDFS sink rotate more reliably in secure mode
  • HADOOP-7930 - Kerberos relogin interval in UserGroupInformation should be configurable
  • HADOOP-8436 - NPE In getLocalPathForWrite ( path, conf ) when the required context item is not configured
  • HADOOP-8437 - getLocalPathForWrite should throw IOException for invalid paths
  • HADOOP-8934 - Shell command ls should include sort options
  • HADOOP-10048 - LocalDirAllocator should avoid holding locks while accessing the filesystem
  • HADOOP-10300 - Allowed deferred sending of call responses.
  • HADOOP-10971 - Add -C flag to make `hadoop fs -ls` print filenames only
  • HADOOP-11031 - Design Document for Credential Provider API
  • HADOOP-11361 - Fix a race condition in MetricsSourceAdapter.updateJmxCache
  • HADOOP-11400 - GraphiteSink does not reconnect to Graphite after 'broken pipe'
  • HADOOP-11469 - KMS should skip default.key.acl and whitelist.key.acl when loading key acl.
  • HADOOP-11599 - Client#getTimeout should use IPC_CLIENT_PING_DEFAULT when IPC_CLIENT_PING_KEY is not configured
  • HADOOP-11619 - FTPFileSystem should override getDefaultPort.
  • HADOOP-11901 - BytesWritable fails to support 2G chunks due to integer overflow
  • HADOOP-12252 - LocalDirAllocator should not throw NPE with empty string configuration
  • HADOOP-12453 - Support decoding KMS Delegation Token with its own Identifier
  • HADOOP-12483 - Maintain wrapped SASL ordering for postponed IPC responses.
  • HADOOP-12537 - S3A to support Amazon STS temporary credentials
  • HADOOP-12548 - Read s3a creds from a Credential Provider
  • HADOOP-12609 - Fix intermittent failure of TestDecayRpcScheduler.
  • HADOOP-12655 - TestHttpServer.testBindAddress bind port range is wider than expected.
  • HADOOP-12659 - Incorrect usage of config parameters in token manager of KMS
  • HADOOP-12672 - RPC timeout should not override IPC ping interval
  • HADOOP-12723 - S3A: Add ability to plug in any AWSCredentialsProvider
  • HADOOP-12963 - Allow using path style addressing for accessing the s3 endpoint.
  • HADOOP-12973 - Make DU pluggable.
  • HADOOP-12974 - Create a CachingGetSpaceUsed implementation that uses df
  • HADOOP-12975 - Add jitter to CachingGetSpaceUsed's thread
  • HADOOP-13034 - Log message about input options in distcp lacks some items
  • HADOOP-13072 - WindowsGetSpaceUsed constructor should be public
  • HADOOP-13079 - Add -q option to Ls to print ? instead of non-printable characters
  • HADOOP-13132 - Handle ClassCastException on AuthenticationException in LoadBalancingKMSClientProvider
  • HADOOP-13155 - Implement TokenRenewer to renew and cancel delegation tokens in KMS
  • HADOOP-13189 - FairCallQueue makes callQueue larger than the configured capacity
  • HADOOP-13251 - Authenticate with Kerberos credentials when renewing KMS delegation token
  • HADOOP-13255 - KMSClientProvider should check and renew tgt when doing delegation token operations
  • HADOOP-13263 - Reload cached groups in background after expiry.
  • HADOOP-13270 - BZip2CompressionInputStream finds the same compression marker twice in corner case, causing duplicate data blocks
  • HADOOP-13317 - Add logs to KMS server-side to improve supportability
  • HADOOP-13353 - LdapGroupsMapping getPassward shouldn't return null when IOException throws
  • HADOOP-13381 - KMS clients should use KMS Delegation Tokens from current UGI
  • HADOOP-13433 - Race in UGI.reloginFromKeytab
  • HADOOP-13434 - Add bash quoting to Shell class.
  • HADOOP-13437 - KMS should reload whitelist and default key ACLs when hot-reloading
  • HADOOP-13457 - Remove hardcoded absolute path for shell executable.
  • HADOOP-13487 - Hadoop KMS should load old delegation tokens from Zookeeper on startup
  • HADOOP-13503 - Improve SaslRpcClient failure logging
  • HADOOP-13526 - Add detailed logging in KMS for the authentication failure of proxy user
  • HADOOP-13558 - UserGroupInformation created from a Subject incorrectly tries to renew the Kerberos ticket
  • HADOOP-13579 - Fix source-level compatibility after HADOOP-11252
  • HADOOP-13590 - Retry until TGT expires even if the UGI renewal thread encountered exception.
  • HADOOP-13627 - Have an explicit KerberosAuthException for UGI to throw, text from public constants
  • HADOOP-13638 - KMS should set UGI's Configuration object properly
  • HADOOP-13641 - Update UGI#spawnAutoRenewalThreadForUserCreds to reduce indentation
  • HADOOP-13669 - Addendum patch 2 for KMS Server should log exceptions before throwing.
  • HADOOP-13693 - Remove the message about HTTP OPTIONS in SPNEGO initialization message from kms audit log.
  • HADOOP-13749 - KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
  • HADOOP-13805 - UGI.getCurrentUser() fails if user does not have a keytab associated
  • HADOOP-13838 - KMSTokenRenewer should close providers
  • HADOOP-13953 - Make FTPFileSystem's data connection mode and transfer mode configurable
  • HADOOP-14003 - Make additional KMS tomcat settings configurable
  • HADOOP-14104 - Client should always ask namenode for kms provider path
  • HADOOP-14195 - CredentialProviderFactory$getProviders is not thread-safe
  • HDFS-4176 - EditLogTailer should call rollEdits with a timeout.
  • HDFS-4210 - Throw helpful exception when DNS entry for JournalNode cannot be resolved
  • HDFS-6434 - Default permission for creating file should be 644 for WebHdfs/HttpFS
  • HDFS-6962 - ACLs inheritance conflict with umaskmode
  • HDFS-7413 - Some unit tests should use NameNodeProtocols instead of FSNameSystem
  • HDFS-7415 - Move FSNameSystem.resolvePath() to FSDirectory
  • HDFS-7420 - Delegate permission checks to FSDirectory
  • HDFS-7463 - Simplify FSNamesystem#getBlockLocationsUpdateTimes
  • HDFS-7478 - Move org.apache.hadoop.hdfs.server.namenode.NNConf to FSNamesystem
  • HDFS-7517 - Remove redundant non-null checks in FSNamesystem#getBlockLocations
  • HDFS-7597 - DelegationTokenIdentifier should cache the TokenIdentifier to UGI mapping
  • HDFS-7964 - Add support for async edit logging
  • HDFS-8224 - Schedule a block for scanning if its metadata file is corrupt
  • HDFS-8269 - getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
  • HDFS-8581 - ContentSummary on / skips further counts on yielding lock
  • HDFS-8709 - Clarify automatic sync in FSEditLog#logEdit.
  • HDFS-8829 - Make SO_RCVBUF and SO_SNDBUF size configurable for DataTransferProtocol sockets and allow configuring auto-tuning
  • HDFS-8897 - Balancer should handle fs.defaultFS trailing slash in HA
  • HDFS-9038 - DFS reserved space is erroneously counted towards non-DFS used.
  • HDFS-9085 - Show renewer information in DelegationTokenIdentifier#toString
  • HDFS-9137 - DeadLock between DataNode#refreshVolumes and BPOfferService#registrationSucceeded.
  • HDFS-9141 - Thread leak in Datanode#refreshVolumes.
  • HDFS-9259 - Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario.
  • HDFS-9276 - Failed to Update HDFS Delegation Token for long running application in HA mode
  • HDFS-9365 - Balaner does not work with the HDFS-6376 HA setup.
  • HDFS-9461 - DiskBalancer: Add Report Command
  • HDFS-9530 - ReservedSpace is not cleared for abandoned Blocks
  • HDFS-9601 - NNThroughputBenchmark.BlockReportStats should handle NotReplicatedYetException on adding block.
  • HDFS-9630 - DistCp minor refactoring and clean up
  • HDFS-9638 - Improve DistCp Help and documentation.
  • HDFS-9700 - DFSClient and DFSOutputStream should set TCP_NODELAY on sockets for DataTransferProtocol
  • HDFS-9732 - Improve DelegationTokenIdentifier.toString() for better logging
  • HDFS-9781 - FsDatasetImpl#getBlockReports can occasionally throw NullPointerException
  • HDFS-9805 - Add server-side configuration for enabling TCP_NODELAY for DataTransferProtocol and default it to true
  • HDFS-9820 - Improve distcp to support efficient restore to an earlier snapshot
  • HDFS-9906 - Remove spammy log spew when a datanode is restarted.
  • HDFS-9939 - Increase DecompressorStream skip buffer size
  • HDFS-9958 - BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages
  • HDFS-10216 - Distcp -diff throws exception when handling relative path
  • HDFS-10270 - TestJMXGet:testNameNode() fails
  • HDFS-10298 - Document the usage of distcp -diff option
  • HDFS-10312 - Large block reports may fail to decode at NameNode due to 64 MB protobuf maximum length restriction
  • HDFS-10313 - Distcp need to enforce the order of snapshot names passed to -diff.
  • HDFS-10336 - TestBalancer failing intermittently because of not reseting UserGroupInformation completely
  • HDFS-10381 - DataStreamer DataNode exclusion log message should be warning.
  • HDFS-10397 - Distcp should ignore -delete option if -diff option is provided instead of exiting
  • HDFS-10403 - DiskBalancer: Add cancel command
  • HDFS-10457 - DataNode should not auto-format block pool directory if VERSION is missing.
  • HDFS-10481 - HTTPFS server should correctly impersonate as end user to open file
  • HDFS-10500 - Diskbalancer: Print out information when a plan is not generated
  • HDFS-10501 - DiskBalancer: Use the default datanode port if port is not provided
  • HDFS-10512 - VolumeScanner may terminate due to NPE in DataNode.reportBadBlocks
  • HDFS-10516 - Fix bug when warming up EDEK cache of more than one encryption zone
  • HDFS-10517 - DiskBalancer: Support help command
  • HDFS-10525 - Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap
  • HDFS-10541 - Diskbalancer: When no actions in plan, error message says "Plan was generated more than 24 hours ago"
  • HDFS-10544 - Balancer doesn't work with IPFailoverProxyProvider.
  • HDFS-10552 - DiskBalancer "-query" results in NPE if no plan for the node
  • HDFS-10556 - DistCpOptions should be validated automatically
  • HDFS-10559 - DiskBalancer: Use SHA1 for Plan ID
  • HDFS-10567 - Improve plan command help message
  • HDFS-10588 - False alarm in datanode log - ERROR - Disk Balancer is not enabled
  • HDFS-10598 - DiskBalancer does not execute multi-steps plan
  • HDFS-10600 - PlanCommand#getThrsholdPercentage should not use throughput value.
  • HDFS-10609 - Uncaught InvalidEncryptionKeyException during pipeline recovery may abort downstream applications
  • HDFS-10641 - TestBlockManager#testBlockReportQueueing fails intermittently.
  • HDFS-10643 - Namenode should use loginUser(hdfs) to generateEncryptedKey
  • HDFS-10681 - DiskBalancer: query command should report Plan file path apart from PlanID.
  • HDFS-10715 - NPE when applying AvailableSpaceBlockPlacementPolicy
  • HDFS-10722 - Fix race condition in TestEditLog#testBatchedSyncWithClosedLogs
  • HDFS-10760 - DataXceiver#run() should not log InvalidToken exception as an error
  • HDFS-10763 - Open files can leak permanently due to inconsistent lease update
  • HDFS-10822 - Log DataNodes in the write pipeline.
  • HDFS-10879 - TestEncryptionZonesWithKMS#testReadWrite fails intermittently
  • HDFS-10963 - Reduce log level when network topology cannot find enough datanodes
  • HDFS-11012 - Unnecessary INFO logging on DFSClients for InvalidToken
  • HDFS-11040 - Add documentation for HDFS-9820 distcp improvement
  • HDFS-11056 - Concurrent append and read operations lead to checksum error
  • HDFS-11160 - VolumeScanner reports write-in-progress replicas as corrupt incorrectly
  • HDFS-11229 - HDFS-11056 failed to close meta file
  • HDFS-11275 - Check groupEntryIndex and throw a helpful exception on failures when removing ACL.
  • HDFS-11292 - log lastWrittenTxId etc info in logSyncAll
  • HDFS-11306 - Print remaining edit logs from buffer if edit log can't be rolled
  • HDFS-11363 - Need more diagnosis info when seeing Slow waitForAckedSeqno.
  • HDFS-11379 - DFSInputStream may infinite loop requesting block locations
  • HDFS-11689 - New exception thrown by DFSClient%isHDFSEncryptionEnabled broke hacky hive code
  • MAPREDUCE-4784 - TestRecovery occasionally fails
  • MAPREDUCE-6172 - TestDbClasses timeouts are too aggressive
  • MAPREDUCE-6359 - In RM HA setup, Cluster tab links populated with AM hostname instead of RM
  • MAPREDUCE-6442 - Stack trace is missing when error occurs in client protocol provider's constructor
  • MAPREDUCE-6473 - Job submission can take a long time during Cluster initialization
  • MAPREDUCE-6571 - JobEndNotification info logs are missing in AM container syslog
  • MAPREDUCE-6628 - Potential memory leak in CryptoOutputStream
  • MAPREDUCE-6633 - AM should retry map attempts if the reduce task encounters commpression related errors
  • MAPREDUCE-6641 - TestTaskAttempt fails in trunk
  • MAPREDUCE-6670 - TestJobListCache#testEviction sometimes fails on Windows with timeout
  • MAPREDUCE-6680 - JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc
  • MAPREDUCE-6718 - add progress log to JHS during startup
  • MAPREDUCE-6728 - Give fetchers hint when ShuffleHandler rejects a shuffling connection
  • MAPREDUCE-6738 - TestJobListCache.testAddExisting failed intermittently in slow VM testbed
  • MAPREDUCE-6761 - Regression when handling providers - invalid configuration ServiceConfiguration causes Cluster initialization failure
  • MAPREDUCE-6763 - Shuffle server listen queue is too small
  • MAPREDUCE-6771 - RMContainerAllocator sends container diagnostics event after corresponding completion event
  • MAPREDUCE-6798 - Fix intermittent failure of TestJobHistoryParsing.testJobHistoryMethods
  • MAPREDUCE-6817 - The format of job start time in JHS is different from those of submit and finish time.
  • MAPREDUCE-6839 - TestRecovery.testCrashed failed
  • YARN-2306 - Add test for leakage of reservation metrics in fair scheduler..
  • YARN-2336 - Fair scheduler's REST API returns a missing '[' bracket JSON for deep queue tree
  • YARN-2605 - [RM HA] Rest api endpoints doing redirect incorrectly.
  • YARN-2977 - Fixed intermittent TestNMClient failure.
  • YARN-3251 - Fixed a deadlock in CapacityScheduler when computing absoluteMaxAvailableCapacity in LeafQueue
  • YARN-3601 - Fix UT TestRMFailover.testRMWebAppRedirect
  • YARN-3654 - ContainerLogsPage web UI should not have meta-refresh
  • YARN-3722 - Merge multiple TestWebAppUtils into o.a.h.yarn.webapp.util.TestWebAppUtils
  • YARN-3933 - FairScheduler: Multiple calls to completedContainer are not safe.
  • YARN-3957 - FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500.
  • YARN-4004 - container-executor should print output of docker logs if the docker container exits with non-0 exit status
  • YARN-4017 - container-executor overuses PATH_MAX
  • YARN-4092 - Addendum. Fixed UI redirection to print useful messages when both RMs are in standby mode
  • YARN-4245 - Generalize config file handling in container-executor
  • YARN-4255 - container-executor does not clean up docker operation command files
  • YARN-4363 - In TestFairScheduler, testcase should not create FairScheduler redundantly.
  • YARN-4411 - RMAppAttemptImpl#createApplicationAttemptReport throws IllegalArgumentException
  • YARN-4459 - container-executor should only kill process groups
  • YARN-4544 - All the log messages about rolling monitoring interval are shown with WARN level
  • YARN-4555 - TestDefaultContainerExecutor#testContainerLaunchError fails on non-english locale environment
  • YARN-4556 - TestFifoScheduler.testResourceOverCommit fails
  • YARN-4820 - ResourceManager web redirects in HA mode drops query parameters
  • YARN-4866 - FairScheduler: AMs can consume all vcores leading to a livelock when using FAIR policy.
  • YARN-4878 - Expose scheduling policy and max running apps over JMX for Yarn queues.
  • YARN-4940 - yarn node -list -all failed if RM start with decommissioned node
  • YARN-4989 - TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently
  • YARN-5001 - Aggregated Logs root directory is created with wrong group if nonexistent
  • YARN-5048 - DelegationTokenRenewer#skipTokenRenewal may throw NPE
  • YARN-5077 - Fix FSLeafQueue#getFairShare() for queues with zero fairshare.
  • YARN-5107 - TestContainerMetrics fails.
  • YARN-5136 - Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
  • YARN-5246 - NMWebAppFilter web redirects drop query parameters
  • YARN-5272 - Handle queue names consistently in FairScheduler.
  • YARN-5608 - TestAMRMClient.setup() fails with ArrayOutOfBoundsException
  • YARN-5704 - Provide config knobs to control enabling/disabling new/work in progress features in container-executor
  • YARN-5752 - TestLocalResourcesTrackerImpl#testLocalResourceCache times out
  • YARN-5837 - NPE when getting node status of a decommissioned node after an RM restart
  • YARN-5859 - TestResourceLocalizationService#testParallelDownloadAttemptsForPublicResource sometimes fails
  • YARN-5862 - TestDiskFailures.testLocalDirsFailures failed
  • YARN-5890 - FairScheduler should log information about AM-resource-usage and max-AM-share for queues
  • YARN-5920 - Fix deadlock in TestRMHA.testTransitionedToStandbyShouldNotHang
  • YARN-6042 - Dump scheduler and queue state information into FairScheduler DEBUG log.
  • YARN-6042 - Revert "Dump scheduler and queue state information into FairScheduler DEBUG log."
  • YARN-6042 - Dump scheduler and queue state information into FairScheduler DEBUG log.
  • YARN-6151 - FS preemption does not consider child queues over fairshare if the parent is under.
  • YARN-6175 - FairScheduler: Negative vcore for resource needed to preempt.
  • YARN-6264 - AM not launched when a single vcore is available on the cluster.
  • YARN-6359 - TestRM#testApplicationKillAtAcceptedState fails rarely due to race condition
  • YARN-6360 - Prevent FS state dump logger from cramming other log files
  • YARN-6453 - fairscheduler-statedump.log gets generated regardless of service
  • HBASE-12949 - Scanner can be stuck in infinite loop if the HFile is corrupted
  • HBASE-14644 - Region in transition metric is broken
  • HBASE-14818 - user_permission does not list namespace permissions
  • HBASE-14963 - Remove use of Guava Stopwatch from HBase client code
  • HBASE-15125 - BackportHBaseFsck's adoptHdfsOrphan function creates region with wrong end key boundary
  • HBASE-15324 - Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy and trigger unexpected split
  • HBASE-15328 - sanity check the redirect used to send master info requests to the embedded regionserver.
  • HBASE-15378 - Scanner cannot handle heartbeat message with no results
  • HBASE-15430 - Failed taking snapshot - Manifest proto-message too large
  • HBASE-15465 - userPermission returned by getUserPermission() for the selected namespace does not have namespace set
  • HBASE-15496 - Throw RowTooBigException only for user scan/get
  • HBASE-15587 - FSTableDescriptors.getDescriptor() logs stack trace erronously
  • HBASE-15613 - TestNamespaceCommand times out
  • HBASE-15621 - Suppress Hbase SnapshotHFile cleaner error messages when a snaphot is going on
  • HBASE-15683 - Min latency in latency histograms are emitted as Long.MAX_VALUE
  • HBASE-15698 - Increment TimeRange not serialized to server
  • HBASE-15746 - Remove extra RegionCoprocessor preClose() in RSRpcServices#closeRegion
  • HBASE-15808 - Reduce potential bulk load intermediate space usage and waste
  • HBASE-15856 - Don't cache unresolved addresses for connections
  • HBASE-15873 - ACL for snapshot restore / clone is not enforced
  • HBASE-15925 - provide default values for hadoop compat module related properties that match default hadoop profile.
  • HBASE-15931 - Add log for long-running tasks in AsyncProcess
  • HBASE-15955 - Disable action in CatalogJanitor#setEnabled should wait for active cleanup scan to finish
  • HBASE-16032 - Possible memory leak in StoreScanner
  • HBASE-16056 - Procedure v2 - fix master crash for FileNotFound
  • HBASE-16062 - Improper error handling in WAL Reader/Writer creation
  • HBASE-16093 - Fix splits failed before creating daughter regions leave meta inconsistent
  • HBASE-16135 - PeerClusterZnode under rs of removed peer may never be deleted
  • HBASE-16146 - Counters are expensive...
  • HBASE-16172 - Unify the retry logic in ScannerCallableWithReplicas and RpcRetryingCallerWithReadReplicas
  • HBASE-16194 - Should count in MSLAB chunk allocation into heap size change when adding duplicate cells
  • HBASE-16195 - Should not add chunk into chunkQueue if not using chunk pool in HeapMemStoreLAB
  • HBASE-16207 - can't restore snapshot without "Admin" permission
  • HBASE-16227 - [Shell] Column value formatter not working in scans. Tested : manually using shell.
  • HBASE-16238 - It's useless to catch SESSIONEXPIRED exception and retry in RecoverableZooKeeper
  • HBASE-16270 - Handle duplicate clearing of snapshot in region replicas
  • HBASE-16284 - Unauthorized client can shutdown the cluster
  • HBASE-16288 - HFile intermediate block level indexes might recurse forever creating multi TB files
  • HBASE-16294 - hbck reporting "No HDFS region dir found" for replicas
  • HBASE-16304 - HRegion#RegionScannerImpl#handleFileNotFoundException may lead to deadlock when trying to obtain write lock on updatesLock
  • HBASE-16317 - revert all ESAPI changes
  • HBASE-16319 - Fix TestCacheOnWrite after HBASE-16288
  • HBASE-16321 - ensure no findbugs-jsr305
  • HBASE-16340 - exclude Xerces iplementation jars from coming in transitively.
  • HBASE-16345 - RpcRetryingCallerWithReadReplicas#call() should catch some RegionServer Exceptions
  • HBASE-16350 - Undo server abort from HBASE-14968
  • HBASE-16360 - TableMapReduceUtil addHBaseDependencyJars has the wrong class name for PrefixTreeCodec
  • HBASE-16429 - FSHLog: deadlock if rollWriter called when ring buffer filled with appends
  • HBASE-16460 - Can't rebuild the BucketAllocator's data structures when BucketCache uses FileIOEngine
  • HBASE-16604 - Scanner retries on IOException can cause the scans to miss data
  • HBASE-16662 - Fix open POODLE vulnerabilities
  • HBASE-16699 - Overflows in AverageIntervalRateLimiter's refill() and getWaitInterval()
  • HBASE-16721 - Concurrency issue in WAL unflushed seqId tracking
  • HBASE-16767 - Mob compaction needs to clean up files in /hbase/mobdir/.tmp and /hbase/mobdir/.tmp/.bulkload when running into IO exceptions
  • HBASE-16807 - , RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover.
  • HBASE-16824 - Writer.flush() can be called on already closed streams in WAL roll
  • HBASE-16841 - Data loss in MOB files after cloning a snapshot and deleting that snapshot
  • HBASE-16931 - Setting cell's seqId to zero in compaction flow might cause RS down.
  • HBASE-16960 - RegionServer hang when aborting
  • HBASE-17020 - keylen in midkey() dont computed correctly
  • HBASE-17023 - Region left unassigned due to AM and SSH each thinking others would do the assignment work
  • HBASE-17044 - Fix merge failed before creating merged region leaves meta inconsistent
  • HBASE-17058 - Lower epsilon used for jitter verification from HBASE-15324
  • HBASE-17069 - RegionServer writes invalid META entries for split daughters in some circumstances
  • HBASE-17072 - CPU usage starts to climb up to 90-100% when using G1GC; purge ThreadLocal usage
  • HBASE-17206 - FSHLog may roll a new writer successfully with unflushed entries
  • HBASE-17241 - Avoid compacting already compacted mob files with _del files
  • HBASE-17265 - Region left unassigned in master failover when region failed to open
  • HBASE-17275 - Assign timeout may cause region to be unassigned forever
  • HBASE-17328 - Properly dispose of looped replication peers
  • HBASE-17381 - ReplicationSourceWorkerThread can die due to unhandled exceptions
  • HBASE-17409 - Limit jsonp callback name to prevent xss
  • HBASE-17452 - Failed taking snapshot - region Manifest proto-message too large
  • HBASE-17522 - Handle JVM throwing runtime exceptions when we ask for details on heap usage the same as a correctly returned 'undefined'.
  • HBASE-17558 - ZK dumping jsp should escape HTML.
  • HBASE-17561 - table status page should escape values that may contain arbitrary characters.
  • HBASE-17675 - ReplicationEndpoint should choose new sinks if a SaslException occurs
  • HBASE-17717 - Explicitly use "sasl" ACL scheme for hbase superuser
  • HIVE-6758 - Beeline doesn't work with -e option when started in background
  • HIVE-7443 - Fix HiveConnection to communicate with Kerberized Hive JDBC server and alternative JDKs
  • HIVE-7723 - Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
  • HIVE-10007 - Support qualified table name in analyze table compute statistics for columns
  • HIVE-10384 - BackportRetryingMetaStoreClient does not retry wrapped TTransportExceptions
  • HIVE-10728 - deprecate unix_timestamp(void) and make it deterministic
  • HIVE-10965 - direct SQL for stats fails in 0-column case
  • HIVE-11028 - Tez: table self join and join with another table fails with IndexOutOfBoundsException
  • HIVE-11141 - Improve RuleRegExp when the Expression node stack gets huge
  • HIVE-11243 - Changing log level in Utilities.getBaseWork
  • HIVE-11375 - Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)
  • HIVE-11428 - Performance: Struct IN() clauses are extremely slow
  • HIVE-11432 - Hive macro give same result for different arguments
  • HIVE-11487 - Add getNumPartitionsByFilter api in metastore api
  • HIVE-11594 - Analyze Table for column names with embedded spaces
  • HIVE-11671 - Optimize RuleRegExp in DPP codepath
  • HIVE-11717 - nohup mode is not support for new hive
  • HIVE-11747 - Unnecessary error log is shown when executing a "INSERT OVERWRITE LOCAL DIRECTORY" cmd in the embedded mode
  • HIVE-11827 - STORED AS AVRO fails SELECT COUNT(*) when empty
  • HIVE-11842 - Improve RuleRegExp by caching some internal data structures
  • HIVE-11849 - NPE in HiveHBaseTableShapshotInputFormat in query with just count(*)
  • HIVE-11901 - StorageBasedAuthorizationProvider requires write permission on table for SELECT statements
  • HIVE-11980 - Follow up on HIVE-11696, exception is thrown from CTAS from the table with table-level serde is Parquet while partition-level serde is JSON
  • HIVE-12077 - MSCK Repair table should fix partitions in batches
  • HIVE-12083 - HIVE-10965 introduces thrift error if partNames or colNames are empty
  • HIVE-12179 - Add option to not add spark-assembly.jar to Hive classpath
  • HIVE-12277 - Hive macro results on macro_duplicate.q different after adding ORDER BY
  • HIVE-12349 - NPE in ORC SARG for IS NULL queries on Timestamp and Date columns
  • HIVE-12465 - Hive might produce wrong results when (outer) joins are merged
  • HIVE-12475 - Parquet schema evolution within array<struct<>> doesn't work
  • HIVE-12556 - Ctrl-C in beeline doesn't kill Tez query on HS2
  • HIVE-12619 - Switching the field order within an array of structs causes the query to fail
  • HIVE-12635 - Hive should return the latest hbase cell timestamp as the row timestamp value
  • HIVE-12768 - Thread safety: binary sortable serde decimal deserialization
  • HIVE-12780 - Fix the output of the history command in Beeline
  • HIVE-12785 - View with union type and UDF to the struct is broken
  • HIVE-12834 - Fix to accept the arrow keys in BeeLine CLI
  • HIVE-12891 - Hive fails when java.io.tmpdir is set to a relative location
  • HIVE-12976 - MetaStoreDirectSql doesn't batch IN lists in all cases
  • HIVE-13043 - Reload function has no impact to function registry
  • HIVE-13058 - Add session and operation_log directory deletion messages
  • HIVE-13090 - Hive metastore crashes on NPE with ZooKeeperTokenStore
  • HIVE-13129 - CliService leaks HMS connection
  • HIVE-13149 - Remove some unnecessary HMS connections from HS2
  • HIVE-13198 - Authorization issues with cascading views
  • HIVE-13237 - Select parquet struct field with upper case throws NPE
  • HIVE-13240 - GroupByOperator: Drop the hash aggregates when closing operator
  • HIVE-13372 - Hive Macro overwritten when multiple macros are used in one column
  • HIVE-13381 - Timestamp & date should have precedence in type hierarchy than string group
  • HIVE-13429 - Tool to remove dangling scratch dir
  • HIVE-13462 - HiveResultSetMetaData.getPrecision() fails for NULL columns
  • HIVE-13539 - HiveHFileOutputFormat searching the wrong directory for HFiles
  • HIVE-13590 - Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case
  • HIVE-13620 - Merge llap branch work to master
  • HIVE-13625 - Hive Prepared Statement when executed with escape characters in parameter fails
  • HIVE-13645 - Beeline needs null-guard around hiveVars and hiveConfVars read
  • HIVE-13704 - Don't call DistCp.execute() instead of DistCp.run()
  • HIVE-13736 - View's input/output formats are TEXT by default.
  • HIVE-13749 - Memory leak in Hive Metastore
  • HIVE-13864 - Beeline ignores the command that follows a semicolon and comment
  • HIVE-13866 - flatten callstack for directSQL errors
  • HIVE-13884 - Disallow queries in HMS fetching more than a configured number of partitions
  • HIVE-13895 - HoS start-up overhead in yarn-client mode
  • HIVE-13932 - Hive SMB Map Join with small set of LIMIT failed with NPE
  • HIVE-13936 - Add streaming support for row_number
  • HIVE-13953 - Issues in HiveLockObject equals method
  • HIVE-13991 - Union All on view fail with no valid permission on underneath table
  • HIVE-13997 - Insert overwrite directory doesn't overwrite existing files
  • HIVE-14006 - Hive query with UNION ALL fails with ArrayIndexOutOfBoundsException.
  • HIVE-14015 - SMB MapJoin failed for Hive on Spark when kerberized
  • HIVE-14037 - java.lang. ClassNotFoundException for the jar in hive.reloadable.aux.jars.path in mapreduce
  • HIVE-14055 - directSql - getting the number of partitions is broken
  • HIVE-14098 - Logging task properties, and environment variables might contain passwords
  • HIVE-14118 - Make the alter partition exception more meaningful
  • HIVE-14137 - Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty tables
  • HIVE-14142 - java.lang. ClassNotFoundException for the jar in hive.reloadable.aux.jars.path for Hive on Spark
  • HIVE-14173 - NPE was thrown after enabling directsql in the middle of session
  • HIVE-14187 - JDOPersistenceManager objects remain cached if MetaStoreClient#close is not called
  • HIVE-14198 - Refactor aux jar related code to make them more consistent
  • HIVE-14209 - Add some logging info for session and operation management
  • HIVE-14210 - ExecDriver should call jobclient.close() to trigger cleanup
  • HIVE-14296 - Session count is not decremented when HS2 clients do not shutdown cleanly.
  • HIVE-14342 - Beeline output is garbled when executed from a remote shell
  • HIVE-14383 - SparkClientImpl should pass principal and keytab to spark-submit instead of calling kinit explicitely
  • HIVE-14421 - FS.deleteOnExit holds references to _tmp_space.db files.
  • HIVE-14229 - The jars in hive. aux.jar.paths are not added to session classpath
  • HIVE-14436 - Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and with MR engine
  • HIVE-14457 - Partitions in encryption zone are still trashed though an exception is returned
  • HIVE-14519 - Multi insert query bug
  • HIVE-14538 - beeline throws exceptions with parsing hive config when using !sh statement
  • HIVE-14693 - Some paritions will be left out when partition number is the multiple of the option hive.msck.repair.batch.size
  • HIVE-14697 - Can not access kerberized HS2 Web UI
  • HIVE-14715 - Hive throws NumberFormatException with query with Null value
  • HIVE-14743 - ArrayIndexOutOfBoundsException - HBASE-backed views' query with JOINs
  • HIVE-14762 - Add logging while removing scratch space
  • HIVE-14764 - Enabling "hive.metastore.metrics.enabled" throws OOM in HiveMetastore
  • HIVE-14784 - Operation logs are disabled automatically if the parent directory does not exist.
  • HIVE-14799 - Query operation are not thread safe during its cancellation
  • HIVE-14805 - Subquery inside a view will have the object in the subquery as the direct input
  • HIVE-14817 - Shutdown the SessionManager timeoutChecker thread properly upon shutdown.
  • HIVE-14819 - FunctionInfo for permanent functions shows TEMPORARY FunctionType
  • HIVE-14820 - RPC server for spark inside HS2 is not getting server address properly
  • HIVE-15054 - Hive insertion query execution fails on Hive on Spark
  • HIVE-15061 - Metastore types are sometimes case sensitive
  • HIVE-15090 - Temporary DB failure can stop ExpiredTokenRemover thread
  • HIVE-15231 - query on view with CTE and alias fails with table not found error
  • HIVE-15291 - Comparison of timestamp fails if only date part is provided
  • HIVE-15338 - Wrong result from non-vectorized DATEDIFF with scalar parameter of type DATE/TIMESTAMP
  • HIVE-15346 - "values temp table" should not be an input
  • HIVE-15410 - WebHCat supports get/set table property with its name containing period and hyphen
  • HIVE-15517 - NOT (x <=> y) returns NULL if x or y is NULL
  • HIVE-15551 - memory leak in directsql for mysql+bonecp specific initialization
  • HIVE-15572 - Improve the response time for query canceling when it happens during acquiring locks
  • HIVE-15735 - In some cases, view objects inside a view do not have parents.
  • HIVE-15782 - query on parquet table returns incorrect result when hive.optimize.index.filter is set to true
  • HIVE-15872 - The PERCENTILE_APPROX UDAF does not work with empty set
  • HIVE-15997 - Resource leaks when query is cancelled
  • HIVE-16047 - Shouldn't try to get KeyProvider unless encryption is enabled
  • HIVE-16156 - FileSinkOperator should delete existing output target when renaming
  • HIVE-16175 - Possible race condition in InstanceCache
  • HIVE-16394 - HoS does not support queue name change in middle of session
  • HUE-3065 - [oozie] Sub-workflow submitted from coordinator gets parent workflow graph
  • HUE-3079 - [oozie] Some links of a Fork can point to deleted nodes
  • HUE-4147 - [useradmin] Ignore (objectclass=*) filter when searching for LDAP users
  • HUE-4386 - [oozie] Remove oozie.coord.application.path from properties when rerunning workflow
  • HUE-4462 - [oozie] Fix deployement_dir for the bundle in oozie example fixtures
  • HUE-4706 - [useradmin] update AuthenticationForm to allow activated users to login
  • HUE-4921 - [core] Skip idle session timeout relogin popup on running jb jobs call when idle session timeout is disabled
  • HUE-4941 - [jobbrowser] Unable to kill jobs with Resource Manager HA enabled
  • HUE-4969 - [security] Can't type any / in the HDFS ACLs path input
  • HUE-5158 - [editor] Older queries after upgrade do not provide direct save
  • HUE-5390 - [editor] Improve import testing of beeswax queries to notebook format
  • HUE-5659 - [home] Ignore history dependencies when importing document from different cluster
  • HUE-5679 - [yarn] Reset API_CACHE on logout
  • HUE-5714 - [yarn] Fix unittest for MR API Cache
  • HUE-6090 - [search] Typing in the search bar always redirect to the end of the input
  • HUE-6131 - [metastore] No information surfaced when LOAD data from Create table from file fails
  • HUE-6133 - [editor] Horizontal scrollbar can be hidden under the first fixed column
  • HUE-6144 - [editor] Make it possible to turn autocomplete on or off
  • HUE-6197 - [editor] Enable scrolling past the end of the editor
  • HUE-6228 - [editor] API for progress status and truncating warning when direct downloading results as Excel
  • IMPALA-1346 - /1590/2344: fix sorter buffer mgmt when spilling
  • IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs
  • IMPALA-1619 - Support 64-bit allocations.
  • IMPALA-1657 - Rework detection and reporting of corrupt table stats.
  • IMPALA-2864 - Ensure that client connections are closed after a failed Open()
  • IMPALA-3018 - Don't return NULL on zero length allocations.
  • IMPALA-3159 - impala-shell does not accept wildcard or SAN certificates
  • IMPALA-3167 - Fix assignment of WHERE conjunct through grouping agg + OJ.
  • IMPALA-3314 - Fix Avro schema loading for partitioned tables.
  • IMPALA-3344 - Simplify sorter and document/enforce invariants.
  • IMPALA-3441,IMPALA-3659: check for malformed Avro data
  • IMPALA-3499 - Split catalog update
  • IMPALA-3552 - Make incremental stats max serialized size configurable
  • IMPALA-3575 - Add retry to backend connection request and rpc timeout
  • IMPALA-3628 - Fix cancellation from shell when security is enabled
  • IMPALA-3633 - cancel fragment if coordinator is gone
  • IMPALA-3646 - Handle corrupt RLE literal or repeat counts of 0.
  • IMPALA-3670 - fix sorter buffer mgmt bugs
  • IMPALA-3678 - Fix migration of predicates into union operands with an order by + limit.
  • IMPALA-3680 - Cleanup the scan range state after failed hdfs cache reads
  • IMPALA-3682 - Don't retry unrecoverable socket creation errors
  • IMPALA-3687 - Prefer Avro field name during schema reconciliation
  • IMPALA-3711 - Remove unnecessary privilege checks in getDbsMetadata()
  • IMPALA-3732 - handle string length overflow in avro files
  • IMPALA-3745 - parquet invalid data handling
  • IMPALA-3751 - fix clang build errors and warnings
  • IMPALA-3754 - fix TestParquet.test_corrupt_rle_counts flakiness
  • IMPALA-3776 - fix 'describe formatted' for Avro tables
  • IMPALA-3820 - Handle linkage errors while loading Java UDFs in Catalog
  • IMPALA-3861 - Replace BetweenPredicates with their equivalent CompoundPredicate.
  • IMPALA-3875 - Thrift threaded server hang in some cases
  • IMPALA-3884 - Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
  • IMPALA-3915 - Register privilege and audit requests when analyzing resolved table refs.
  • IMPALA-3930, IMPALA-2570: Fix shuffle insert hint with constant partition exprs.
  • IMPALA-3940 - Fix getting column stats through views.
  • IMPALA-3949 - Log the error message in FileSystemUtil.copyToLocal()
  • IMPALA-3964 - Fix crash when a count(*) is performed on a nested collection.
  • IMPALA-3965 - TSSLSocketWithWildcardSAN.py not exported as part of impala-shell build lib
  • IMPALA-3983, IMPALA-3974: Delete function jar resources after load
  • IMPALA-4019 - initialize member variables in HdfsTableSink
  • IMPALA-4020 - Handle external conflicting changes to HMS gracefully
  • IMPALA-4037, IMPALA-4038: fix locking during query cancellation
  • IMPALA-4049 - fix empty batch handling NLJ build side
  • IMPALA-4076 - Fix runtime filter sort compare method
  • IMPALA-4099 - Fix the error message while loading UDFs with no JARs
  • IMPALA-4120 - Incorrect results with LEAD() analytic function
  • IMPALA-4135 - Thrift threaded server times-out connections during high load
  • IMPALA-4153 - Fix count(*) on all blank('') columns - test
  • IMPALA-4170 - Fix identifier quoting in COMPUTE INCREMENTAL STATS.
  • IMPALA-4180 - Synchronize accesses to RuntimeState::reader_contexts_
  • IMPALA-4196 - Cross compile bit-byte-functions
  • IMPALA-4223 - Handle truncated file read from HDFS cache
  • IMPALA-4237 - Fix materialization of 4 byte decimals in data source scan node.
  • IMPALA-4246 - SleepForMs() utility function has undefined behavior for > 1s
  • IMPALA-4260 - Alter table add column drops all the column stats
  • IMPALA-4263 - Fix wrong ommission of agg/analytic hash exchanges.
  • IMPALA-4266 - Java udf returning string can give incorrect results
  • IMPALA-4282 - Remove max length check for type strings.
  • IMPALA-4293 - query profile should include error log
  • IMPALA-4295 - XFAIL wildcard SSL test
  • IMPALA-4336 - Cast exprs after unnesting union operands.
  • IMPALA-4363 - Add Parquet timestamp validation
  • IMPALA-4383 - Ensure plan fragment report thread is always started
  • IMPALA-4391 - fix dropped statuses in scanners
  • IMPALA-4423 - Correct but conservative implementation of Subquery.equals().
  • IMPALA-4433 - Always generate testdata using the same time zone setting
  • IMPALA-4449 - Revisit table locking pattern in the catalog This commit fixes an issue where multiple long-running operations on the same catalog object (e.g. table) can block other catalog operations from making progress.
  • IMPALA-4488 - HS2 GetOperationStatus() should keep session alive
  • IMPALA-4518 - CopyStringVal() doesn't copy null string
  • IMPALA-4539 - fix bug when scratch batch references I/O buffers
  • IMPALA-4550 - Fix CastExpr analysis for substituted slots
  • IMPALA-4579 - SHOW CREATE VIEW fails for view containing a subquery
  • IMPALA-4765 - Avoid using several loading threads on one table.
  • IMPALA-4767 - Workaround for HIVE-15653 to preserve table stats.
  • IMPALA-4779, IMPALA-4780: Fix conditional functions built-in and Timestamp bounds
  • IMPALA-4787 - Optimize APPX_MEDIAN() memory usage
  • IMPALA-4916 - Fix maintenance of set of item sets in DisjointSet.
  • IMPALA-4995 - Fix integer overflow in TopNNode::PrepareForOutput
  • IMPALA-4997 - Fix overflows in Sorter::TupleIterator
  • IMPALA-5005 - Don't allow server to send SASL COMPLETE msg out of order
  • IMPALA-5088 - Fix heap buffer overflow
  • IMPALA-5253 - Use appropriate transport for StatestoreSubscriber
  • IMPALA-4391 - fix dropped status in scanners
  • OOZIE-1814 - Oozie should mask any passwords in logs and REST interfaces
  • OOZIE-2068 - Configuration as part of sharelib
  • OOZIE-2194 - oozie job -kill doesn't work with spark action
  • OOZIE-2243 - Kill Command does not kill the child job for java action
  • OOZIE-2314 - Unable to kill old instance child job by workflow or coord rerun by Launcher
  • OOZIE-2329 - Make handling yarn restarts configurable
  • OOZIE-2345 - Parallel job submission for forked actions
  • OOZIE-2347 - Remove unnecessary new Configuration()/new jobConf() calls from oozie
  • OOZIE-2436 - Fork/join workflow fails with oozie.action.yarn.tag must not be null
  • OOZIE-2504 - Create a log4j.properties under HADOOP_CONF_DIR in Shell Action
  • OOZIE-2533 - Patch-1550 - workaround for
  • OOZIE-2555 - Oozie SSL enable setup does not return port for admin -servers
  • OOZIE-2567 - HCat connection is not closed while getting hcat cred
  • OOZIE-2584 - Eliminate Thread.sleep() calls in TestMemoryLocks
  • OOZIE-2589 - CompletedActionXCommand is hardcoded to wrong priority
  • OOZIE-2649 - Can't override sub-workflow configuration property if defined in parent workflow XML
  • OOZIE-2656 - OozieShareLibCLI uses op system username instead of Kerberos to upload jars
  • OOZIE-2678 - Oozie job -kill doesn't work with tez jobs
  • OOZIE-2739 - Remove property expansion pattern from ShellMain's log4j properties content
  • OOZIE-2742 - Unable to kill applications based on tag
  • OOZIE-2777 - Config-default.xml longer than 64k results in java.io.UTFDataFormatException
  • OOZIE-2818 - Can't overwrite oozie.action.max.output.data on a per-workflow basis
  • PIG-3807 - Pig creates wrong schema after dereferencing nested tuple fields with sorts
  • PIG-3818 - PIG-2499 is accidently reverted
  • PIG-3970 - Increase PermGen size, tests ran out of memory
  • PIG-4052 - TestJobControlSleep, TestInvokerSpeed are unreliable
  • SENTRY-1201 - Sentry ignores database prefix for MSCK statement
  • SENTRY-1265 - Sentry service should not require a TGT as it is not talking to other kerberos services as a client
  • SENTRY-1311 - Improve usability of URI privileges by supporting mixed use of URIs with and without scheme
  • SENTRY-1313 - Database prefix is not honoured when executing grant statement
  • SENTRY-1345 - ACLS on table folder disappear after insert for unpartitioned tables
  • SENTRY-1520 - Provide mechanism for triggering HMS full snapshot
  • SOLR-5776 - backportEnabled SSL tests can easily exhaust random generator entropy and block. Set the server side to SHA1PRNG as in Steve's original patch. Use less SSL in a test run. refactor SSLConfig so that SSLTestConfig can provide SSLContexts using a NullSecureRandom to prevent SSL tests from blocking on entropy starved machines Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS replace NullSecureRandom w/ NotSecurePsuedoRandom
  • SOLR-6295 - Fix child filter query creation to never match parent docs in SolrExampleTests
  • SOLR-7280 - /Missing test resources
  • SOLR-7280 - BackportLoad cores in sorted order and tweak coreLoadThread counts to improve cluster stability on restarts
  • SOLR-7866 - Harden code to prevent an unhandled NPE when trying to determine the max value of the version field.
  • SOLR-9091 - ZkController#publishAndWaitForDownStates logic is inefficient
  • SOLR-9236 - AutoAddReplicas will append an extra /tlog to the update log location on replica failover.
  • SOLR-9284 - The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.
  • SOLR-9310, SOLR-9524
  • SOLR-9330 - Fix AlreadyClosedException on admin/mbeans?stats=true
  • SOLR-9699, SOLR-4668: fix exception from core status in parallel with core reload
  • SOLR-9819 - Upgrade Apache commons-fileupload to 1.3.2, fixing a security vulnerability
  • SOLR-9848 - Lower solr.cloud.wait-for-updates-with-stale-state-pause back down from 7 seconds.
  • SOLR-9859 - backport ofreplication.properties cannot be updated after being written and neither replication.properties or index.properties are durable in the face of a crash. Don't log error on NoSuchFileException
  • SOLR-9901 - backport of SOLR-9899 Implement move in HdfsDirectoryFactory. SOLR-9899: StandardDirectoryFactory should use optimizations for all FilterDirectorys not just NRTCachingDirectory.
  • SOLR-10031 - Validation of filename params in ReplicationHandler
  • SOLR-10114 - backport of SOLR-9941 - Reordered delete-by-query can delete or omit child documents
  • SOLR-10119 - TestReplicationHandler assertion fixes part of
  • SOLR-10121, SOLR-10116: BlockCache corruption with high concurrency
  • SOLR-10338 - backportConfigure SecureRandom non blocking for tests.
  • SPARK-8428 - [SPARK-13850] Fix integer overflows in TimSort
  • SPARK-12009 - [YARN] Avoid to re-allocating yarn container while driver want to stop all Executors
  • SPARK-12241 - [YARN] Improve failure reporting in Yarn client obtainTokenForHBase()
  • SPARK-12339 - [SPARK-11206][WEBUI] Added a null check that was removed in
  • SPARK-12392 - [CORE] Optimize a location order of broadcast blocks by considering preferred local hosts
  • SPARK-12523 - [YARN] Support long-running of the Spark On HBase and hive meta store.
  • SPARK-12941 - [SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype mapping
  • SPARK-12941 - [SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype
  • SPARK-12966 - [SQL] ArrayType(DecimalType) support in Postgres JDBC
  • SPARK-13112 - [CORE] Make sure RegisterExecutorResponse arrive before LaunchTask
  • SPARK-13242 - [SQL] codegen fallback in case-when if there many branches
  • SPARK-13328 - [CORE] Poor read performance for broadcast variables with dynamic resource allocation
  • SPARK-13566 - [CORE] Avoid deadlock between BlockManager and Executor Thread
  • SPARK-13958 - Executor OOM due to unbounded growth of pointer array
  • SPARK-14204 - [SQL] register driverClass rather than user-specified class
  • SPARK-14391 - [LAUNCHER] Fix launcher communication test, take 2.
  • SPARK-14963 - [MINOR][YARN] Fix typo in YarnShuffleService recovery file name. Using recoveryPath if NM recovery is enabled
  • SPARK-15165 - [SPARK-15205] [SQL] Introduce place holder for comments in generated code
  • SPARK-16044 - [SQL] Backport input_file_name() for data source based on NewHadoopRDD to branch 1.6
  • SPARK-16106 - [CORE] TaskSchedulerImpl should properly track executors added to existing hosts
  • SPARK-16230 - [CORE] CoarseGrainedExecutorBackend to self kill if there is an exception while creating an Executor
  • SPARK-16505 - [YARN] Optionally propagate error during shuffle service startup.
  • SPARK-16625 - [SQL] General data types to be mapped to Oracle
  • SPARK-16711 - YarnShuffleService doesn't re-init properly on YARN rolling upgrade
  • SPARK-16873 - [CORE] Fix SpillReader NPE when spillFile has no data
  • SPARK-17171 - [WEB UI] DAG will list all partitions in the graph
  • SPARK-17245 - [SQL][BRANCH-1.6] Do not rely on Hive's session state to retrieve HiveConf
  • SPARK-17433 - YarnShuffleService doesn't handle moving credentials levelDb
  • SPARK-17465 - [SPARK CORE] Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may lead to memory leak
  • SPARK-17611 - [YARN][TEST] Make shuffle service test really test auth.
  • SPARK-17644 - [CORE] Do not add failedStages when abortStage for fetch failure
  • SPARK-17696 - [SPARK-12330][CORE] Partial backport of to branch-1.6.
  • SPARK-18750 - [YARN] Avoid using "mapValues" when allocating containers.
  • SPARK-19178 - [SQL][Backport-to-1.6] convert string of large numbers to int should return null
  • SPARK-19263 - DAGScheduler should avoid sending conflicting task set.
  • SPARK-19537 - Move pendingPartitions to ShuffleMapStage.
  • SQOOP-2349 - Add command line option for setting transaction isolation levels for metadata queries
  • SQOOP-2561 - Special Character removal from Column name as avro data results in duplicate column and fails the import
  • SQOOP-2846 - Sqoop Export with update-key failing for avro data file
  • SQOOP-2884 - Document --temporary-rootdir
  • SQOOP-2896 - Sqoop exec job fails with SQLException Access denied for user
  • SQOOP-2906 - Optimization of AvroUtil.toAvroIdentifier
  • SQOOP-2909 - Oracle related ImportTest fails after SQOOP-2737
  • SQOOP-2911 - Fix failing HCatalogExportTest caused by SQOOP-2863
  • SQOOP-2915 - Fixing Oracle related unit tests
  • SQOOP-2920 - sqoop performance deteriorates significantly on wide datasets; sqoop 100% on cpu
  • SQOOP-2950 - Sqoop trunk has consistent UT failures - need fixing
  • SQOOP-2952 - Fixing bug
  • SQOOP-2971 - OraOop does not close connections properly
  • SQOOP-2983 - OraOop export has degraded performance with wide tables
  • SQOOP-2986 - Add validation check for --hive-import and --incremental lastmodified
  • SQOOP-2990 - Sqoop(oracle) export [updateTableToOracle] with "--update-mode allowinsert" : app fails with java.sql.SQLException: Missing IN or OUT parameter at index
  • SQOOP-2995 - backward incompatibility introduced by Custom Tool options
  • SQOOP-2999 - Sqoop ClassNotFoundException (org.apache.commons.lang3.StringUtils) is thrown when executing Oracle direct import map task
  • SQOOP-3013 - Configuration "tmpjars" is not checked for empty strings before passing to MR
  • SQOOP-3021 - ClassWriter fails if a column name contains a backslash character
  • SQOOP-3028 - Include stack trace in the logging of exceptions in ExportTool
  • SQOOP-3034 - HBase import should fail fast if using anything other than as-textfile
  • SQOOP-3053 - Create a cmd line argument for sqoop.throwOnError and use it through SqoopOptions
  • SQOOP-3055 - Fixing MySQL tests failing due to ignored test inputs/configuration
  • SQOOP-3057 - Fixing 3rd party Oracle tests failing due to invalid case of column names
  • SQOOP-3066 - Introduce an option + env variable to enable/disable SQOOP-2737 feature
  • SQOOP-3068 - Enhance error (tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema) to suggest workaround
  • SQOOP-3069 - Get OracleExportTest#testUpsertTestExport in line with SQOOP-3066
  • SQOOP-3071 - Fix OracleManager to apply localTimeZone correctly in case of Date objects too
  • SQOOP-3072 - Reenable escaping in ImportTest#testProductWithWhiteSpaceImport for proper execution
  • SQOOP-3081 - use OracleEscapeUtils.escapeIdentifier in OracleUpsertOutputFormat instead of inline appending quotes
  • SQOOP-3123 - Introduce escaping logic for column mapping parameters (same what Sqoop already uses for the DB column names), thus special column names (e.g. containing '#' character) and mappings realted to those columns can be in the same format (thus not confusing the end users), and also eliminates the related AVRO format clashing issues.
  • SQOOP-3124 - Fix ordering in column list query of PostgreSQL connector to reflect the logical order instead of adhoc ordering
  • SQOOP-3159 - Sqoop (export + --table) with Oracle table_name having '$' fails with error

Issues Fixed in CDH 5.8.4

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.8.4:

  • AVRO-1943 - Unreliable test: TestNettyServerWithCompression.testConnectionsCount
  • CRUNCH-592 - Job fails for null ByteBuffer value in Avro tables
  • FLUME-2171 - Add Interceptor to remove headers from event
  • FLUME-2812 - Fix semaphore leak causing java.lang.Error: Maximum permit count exceeded in MemoryChannel
  • FLUME-2889 - Fixes to DateTime computations
  • FLUME-2997 - Fix unreliable test in SpillableMemoryChannel
  • FLUME-2999 - Kafka channel and sink should enable statically assigned partition per event via header
  • FLUME-3002 - Fix tests in TestBucketWriter
  • FLUME-3003 - Fix unreliable testSourceCounter in TestSyslogUdpSource
  • FLUME-3020 - Improve HDFS Sink escape sequence substitution
  • FLUME-3027 - Change Kafka Channel to clear offsets map after commit
  • FLUME-3031 - Change sequence source to reset its counter for event body on channel exception
  • HADOOP-7930 - Kerberos relogin interval in UserGroupInformation should be configurable
  • HADOOP-10300 - Deferred sending of call responses allowed
  • HADOOP-11031 - Design Document for Credential Provider API
  • HADOOP-12453 - Support decoding KMS Delegation Token with its own Identifier
  • HADOOP-12483 - Maintain wrapped SASL ordering for postponed IPC responses
  • HADOOP-12537 - S3A to support Amazon STS temporary credentials
  • HADOOP-12655 - TestHttpServer.testBindAddress bind port range is wider than expected
  • HADOOP-12723 - S3A to add ability to plug in any AWSCredentialsProvider
  • HADOOP-12973 - Make DU pluggable
  • HADOOP-12974 - Create a CachingGetSpaceUsed implementation that uses df
  • HADOOP-12975 - Add jitter to CachingGetSpaceUsed's thread
  • HADOOP-13034 - Log message about input options in distcp lacks some items
  • HADOOP-13072 - WindowsGetSpaceUsed constructor should be public
  • HADOOP-13317 - Add logs to KMS server-side to improve supportability
  • HADOOP-13590 - Retry until TGT expires even if the UGI renewal thread encountered exception
  • HADOOP-13641 - Update UGI#spawnAutoRenewalThreadForUserCreds to reduce indentation
  • HADOOP-13669 - Addendum patch 2 for KMS Server should log exceptions before throwing
  • HADOOP-13669 - Addendum patch for KMS Server should log exceptions before throwing
  • HADOOP-13669 - KMS Server should log exceptions before throwing
  • HADOOP-13693 - Remove the message about HTTP OPTIONS in SPNEGO initialization message from kms audit log
  • HADOOP-13838 - KMSTokenRenewer should close providers
  • HDFS-4176 - EditLogTailer should call rollEdits with a timeout
  • HDFS-6962 - ACLs inheritance conflict with umaskmode
  • HDFS-7413 - Some unit tests should use NameNodeProtocols instead of FSNameSystem
  • HDFS-7964 - Add support for async edit logging
  • HDFS-8709 - Clarify automatic sync in FSEditLog#logEdit
  • HDFS-9038 - DFS reserved space is erroneously counted towards non-DFS used
  • HDFS-9630 - DistCp minor refactoring and clean up
  • HDFS-9638 - Improve DistCp Help and documentation
  • HDFS-9820 - Improve distcp to support efficient restore to an earlier snapshot
  • HDFS-10216 - Distcp -diff throws exception when handling relative path
  • HDFS-10298 - Document the usage of distcp -diff option
  • HDFS-10312 - Large block reports may fail to decode at NameNode due to 64 MB protobuf maximum length restriction
  • HDFS-10313 - Distcp need to enforce the order of snapshot names passed to -diff
  • HDFS-10336 - TestBalancer failing intermittently because of not reseting UserGroupInformation completely
  • HDFS-10397 - Distcp should ignore -delete option if -diff option is provided instead of exiting
  • HDFS-10556 - DistCpOptions should be validated automatically
  • HDFS-10609 - Uncaught InvalidEncryptionKeyException during pipeline recovery may abort downstream applications
  • HDFS-10652 - Add a unit test for HDFS-4660
  • HDFS-10722 - Fix race condition in TestEditLog#testBatchedSyncWithClosedLogs
  • HDFS-10760 - DataXceiver#run() should not log InvalidToken exception as an error
  • HDFS-10763 - Open files can leak permanently due to inconsistent lease update
  • HDFS-11012 - Unnecessary INFO logging on DFSClients for InvalidToken
  • HDFS-11040 - Add documentation for HDFS-9820 distcp improvement
  • HDFS-11056 - Concurrent append and read operations lead to checksum error
  • HDFS-11160 - VolumeScanner reports write-in-progress replicas as corrupt incorrectly
  • HDFS-11229 - HDFS-11056 failed to close meta file
  • HDFS-11275 - Check groupEntryIndex and throw a helpful exception on failures when removing ACL
  • MAPREDUCE-6571 - JobEndNotification info logs are missing in AM container syslog
  • MAPREDUCE-6633 - AM should retry map attempts if the reduce task encounters compression related errors
  • MAPREDUCE-6728 - Give fetchers hint when ShuffleHandler rejects a shuffling connection
  • MAPREDUCE-6763 - Shuffle server listen queue is too small
  • MAPREDUCE-6798 - Fix intermittent failure of TestJobHistoryParsing.testJobHistoryMethods
  • MAPREDUCE-6801 - Fix unreliable TestKill.testKillJob
  • MAPREDUCE-6817 - The format of job start time in JHS is different from submit and finish times
  • YARN-3601 - Fix UT TestRMFailover.testRMWebAppRedirect
  • YARN-3654 - ContainerLogsPage web UI should not have meta-refresh
  • YARN-3722 - Merge multiple TestWebAppUtils into o.a.h.yarn.webapp.util.TestWebAppUtils
  • YARN-4004 - Container-executor should print output of docker logs if the docker container exits with non-0 exit status
  • YARN-4017 - Container-executor overuses PATH_MAX
  • YARN-4092 - Fixed UI redirection to print useful messages when both RMs are in standby mode (Addendum)
  • YARN-4245 - Generalize config file handling in container-executor
  • YARN-4255 - Container-executor does not clean up docker operation command files
  • YARN-4544 - All the log messages about rolling monitoring interval are shown with WARN level
  • YARN-4556 - TestFifoScheduler.testResourceOverCommit fails
  • YARN-4820 - ResourceManager web redirects in HA mode drops query parameters
  • YARN-5001 - Aggregated Logs root directory is created with wrong group if nonexistent
  • YARN-5136 - Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
  • YARN-5246 - NMWebAppFilter web redirects drop query parameters
  • YARN-5704 - Provide config knobs to control enabling/disabling new/work in progress features in container-executor
  • YARN-5837 - NPE when getting node status of a decommissioned node after an RM restart
  • YARN-5862 - TestDiskFailures.testLocalDirsFailures failed
  • YARN-5890 - FairScheduler should log information about AM-resource-usage and max-AM-share for queues
  • HBASE-15324 - Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy and trigger unexpected split
  • HBASE-15430 - Failed taking snapshot - Manifest proto-message too large
  • HBASE-16146 - Counter performance is expensive
  • HBASE-16172 - Unify the retry logic in ScannerCallableWithReplicas and RpcRetryingCallerWithReadReplicas
  • HBASE-16270 - Handle duplicate clearing of snapshot in region replicas
  • HBASE-16345 - RpcRetryingCallerWithReadReplicas#call() should catch some RegionServer Exceptions
  • HBASE-16824 - Writer.flush() can be called on already closed streams in WAL roll
  • HBASE-16841 - Data loss in MOB files after cloning a snapshot and deleting that snapshot
  • HBASE-17058 - Lower epsilon used for jitter verification from HBASE-15324
  • HBASE-17072 - CPU usage starts to climb up to 90-100% when using G1GC
  • HBASE-17241 - Avoid compacting already compacted mob files with _del files
  • HBASE-17452 - Failed taking snapshot - region Manifest proto-message too large
  • HIVE-10384 - BackportRetryingMetaStoreClient does not retry wrapped TTransportExceptions
  • HIVE-11849 - NPE in HiveHBaseTableShapshotInputFormat in query with just count(*)
  • HIVE-12077 - MSCK Repair table should fix partitions in batches
  • HIVE-12465 - Hive might produce wrong results when (outer) joins are merged
  • HIVE-12619 - Switching the field order within an array of structs causes the query to fail
  • HIVE-12780 - Fix the output of the history command in Beeline
  • HIVE-12789 - Fix output twice in the history command of Beeline
  • HIVE-12891 - Hive fails when java.io.tmpdir is set to a relative location
  • HIVE-12976 - MetaStoreDirectSql doesn't batch IN lists in all cases
  • HIVE-13129 - CliService leaks HMS connection
  • HIVE-13149 - Remove some unnecessary HMS connections from HS2
  • HIVE-13149 - Remove some unnecessary HMS connections from HS2
  • HIVE-13381 - Timestamp and date should have precedence in type hierarchy than string group
  • HIVE-13429 - Tool to remove dangling scratch dir
  • HIVE-13539 - HiveHFileOutputFormat searching the wrong directory for HFiles
  • HIVE-13866 - Flatten callstack for directSQL errors
  • HIVE-13895 - HoS start-up overhead in yarn-client mode
  • HIVE-13997 - Insert overwrite directory doesn't overwrite existing files
  • HIVE-14137 - Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty tables
  • HIVE-14173 - NPE was thrown after enabling direct.sql in the middle of session
  • HIVE-14421 - FS.deleteOnExit holds references to _tmp_space.db files
  • HIVE-14762 - Add logging while removing scratch space
  • HIVE-14799 - Query operation are not thread safe during its cancellation
  • HIVE-14817 - Shutdown the SessionManager timeoutChecker thread properly upon shutdown
  • HIVE-15054 - Hive insertion query execution fails on Hive on Spark
  • HIVE-15061 - Metastore types are sometimes case sensitive
  • HIVE-15090 - Temporary DB failure can stop ExpiredTokenRemover thread
  • HIVE-15231 - Query on view with CTE and alias fails with "table not found error"
  • HIVE-15291 - Comparison of timestamp fails if only date part is provided
  • HIVE-15410 - WebHCat supports get/set table property with its name containing period and hyphen
  • HIVE-15551 - Memory leak in directsql for MySQL with BoneCP specific initialization
  • HUE-4466 - [security] deliver csrftoken cookie with secure bit set if possible.
  • HUE-4546 - Auto-strip invalid characters from name field of converted docs
  • HUE-4747 - [editor] Download form should be submitted to a new tab otherwise the snippet gets closed
  • HUE-5028 - [oozie] User can't edit shared WF with modify permissions in new editor mode
  • HUE-5050 - [core] Logout fails for local login when multiple backends are used
  • HUE-5154 - [oozie] Create an new oozie workflow throws server error 500
  • HUE-5161 - [security] Speed up roles rendering
  • HUE-5163 - [security] Speed up initial page rendering
  • HUE-5166 - [impala] Handle empty session properties during upgrades
  • HUE-5218 - [search] Validate dashboard sharing works
  • HUE-5295 - [desktop] Avoid microsecond comparison for last_modified field MySQL < 5.6 doesn't support microsecond precision. https://code.djangoproject.com/ticket/19716
  • HUE-5295 - [desktop] Do not change the last_modified field when migrating history queries
  • HUE-5305 - [home] Fix empty share document modal and improve sharing UX
  • HUE-5310 - [search] Use Doc2 modal in search_controller
  • HUE-5476 - [core] Fix TTL is_idle middleware check
  • HUE-5482 - [home] Handle multiple home/trash directories by merging them into one.
  • IMPALA-1702 - "invalidate metadata" can cause duplicate TableIds
  • IMPALA-3167 - Fix assignment of WHERE clause predicate through grouping aggregate and outer join
  • IMPALA-3314 - Fix Avro schema loading for partitioned tables
  • IMPALA-3552 - Make incremental stats max serialized size configurable
  • IMPALA-3575 - Add retry to backend connection request and rpc timeout
  • IMPALA-3682 - Don't retry unrecoverable socket creation errors
  • IMPALA-3875 - Thrift threaded server hang in some cases
  • IMPALA-3884 - Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
  • IMPALA-3949 - Log the error message in FileSystemUtil.copyToLocal()
  • IMPALA-3964 - Fix crash when a count(*) is performed on a nested collection.
  • IMPALA-3983 - Delete function jar resources after load
  • IMPALA-4037 - ChildQuery::Cancel() appears to violate lock ordering
  • IMPALA-4038 - Fix locking during query cancellation
  • IMPALA-4076 - Fix runtime filter sort compare method
  • IMPALA-4099 - Fix the error message while loading UDFs with no JARs
  • IMPALA-4120 - Incorrect results with LEAD() analytic function
  • IMPALA-4153 - Fix count(*) on all blank('') columns - test
  • IMPALA-4223 - Handle truncated file read from HDFS cache
  • IMPALA-4246 - SleepForMs() utility function has undefined behavior for > 1s
  • IMPALA-4336 - Cast expressions after unnesting union operands
  • IMPALA-4363 - Add Parquet timestamp validation
  • IMPALA-4391 - Fix dropped statuses in scanners
  • IMPALA-4423 - Correct but conservative implementation of Subquery.equals()
  • IMPALA-4433 - Always generate test data using the same time zone setting
  • IMPALA-4449 - Revisit table locking pattern in the catalog. Fixes an issue where multiple long-running operations on the same catalog object (for example, a table) can block other catalog operations from making progress
  • IMPALA-4550 - Fix CastExpr analysis for substituted slots
  • IMPALA-4391 - Fix dropped status in scanners
  • LUCENE-5889 - AnalyzingInfixSuggester should expose commit()
  • LUCENE-7564 - AnalyzingInfixSuggester should close its IndexWriter by default at the end of build()
  • PIG-3818 - PIG-2499 is accidentally reverted
  • PIG-5025 - Fix unreliable test failures in TestLoad.java
  • SENTRY-1265 - Sentry service should not require a TGT as it is not talking to other Kerberos services as a client
  • SENTRY-1313 - Database prefix is not honoured when executing grant statement
  • SPARK-12241 - [YARN] Improve failure reporting in Yarn client obtainTokenForHBase()
  • SPARK-12523 - [YARN] Support long-running of the Spark On HBase and hive meta store.
  • SPARK-12966 - [SQL] ArrayType(DecimalType) support in Postgres JDBC
  • SPARK-13566 - [CORE] Avoid deadlock between BlockManager and Executor Thread
  • SPARK-13958 - Executor OOM due to unbounded growth of pointer array in…
  • SPARK-14204 - [SQL] register driverClass rather than user-specified class
  • SPARK-16044 - [SQL] Backport input_file_name() for data source based on NewHadoopRDD to branch 1.6
  • SPARK-17245 - [SQL][BRANCH-1.6] Do not rely on Hive's session state to retrieve HiveConf
  • SPARK-17465 - [SPARK CORE] Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may lead to memory leak
  • SQOOP-2349 - Add command line option for setting transaction isolation levels for metadata queries
  • SQOOP-2884 - Document argument overriding --temporary directory
  • SQOOP-2909 - Oracle related ImportTest fails after SQOOP-2737
  • SQOOP-2911 - Fix failing HCatalogExportTest caused by SQOOP-2863
  • SQOOP-2915 - Fixing Oracle related unit tests
  • SQOOP-2950 - Fix Sqoop trunk consistent UT failures
  • SQOOP-2952 - Row key not added into column family using --hbase-bulkload
  • SQOOP-2983 - OraOOP export has degraded performance with wide tables
  • SQOOP-2986 - Add validation check for --hive-import and --incremental lastmodified
  • SQOOP-2990 - Sqoop(oracle) export [updateTableToOracle] with "--update-mode allowinsert" : app fails with java.sql.SQLException: Missing IN or OUT parameter at index
  • SQOOP-3013 - Configuration "tmpjars" is not checked for empty strings before passing to MR
  • SQOOP-3028 - Include stack trace in the logging of exceptions in ExportTool
  • SQOOP-3034 - HBase import should fail fast if using anything other than as-textfile
  • SQOOP-3053 - Create a cmd line argument for sqoop.throwOnError and use it through SqoopOptions
  • SQOOP-3055 - Fixing MySQL tests failing due to ignored test inputs/configuration
  • SQOOP-3057 - Fixing 3rd party Oracle tests failing due to invalid case of column names
  • SQOOP-3066 - Introduce an option + env variable to enable/disable SQOOP-2737 feature
  • SQOOP-3069 - Get OracleExportTest#testUpsertTestExport in line with SQOOP-3066
  • SQOOP-3071 - Fix OracleManager to apply localTimeZone correctly in case of Date objects too
  • SQOOP-3072 - Re-enable escaping in ImportTest#testProductWithWhiteSpaceImport for proper execution
  • SQOOP-3081 - Use OracleEscapeUtils.escapeIdentifier in OracleUpsertOutputFormat instead of inline appending quotes
  • SQOOP-3124 - Fix ordering in column list query of PostgreSQL connector to reflect logical order (rather than ad hoc ordering)

Issues Fixed in CDH 5.8.3

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.8.3:

  • FLUME-2797 - Use SourceCounter for SyslogTcpSource
  • FLUME-2844 - SpillableMemoryChannel must start ChannelCounter
  • HADOOP-12548 - Read s3a credentials from a Credential Provider
  • HADOOP-13353 - LdapGroupsMapping getPassward should not return null when IOException throws
  • HADOOP-13526 - Add detailed logging in KMS for the authentication failure of proxy user
  • HADOOP-13558 - UserGroupInformation created from a Subject incorrectly tries to renew the Kerberos ticket
  • HADOOP-13579 - Fix source-level compatibility after HADOOP-11252
  • HADOOP-13638 - KMS should set UGI's Configuration object properly
  • HDFS-7415 - Move FSNameSystem.resolvePath() to FSDirectory
  • HDFS-7420 - Delegate permission checks to FSDirectory
  • HDFS-7463 - Simplify FSNamesystem#getBlockLocationsUpdateTimes
  • HDFS-7478 - Move org.apache.hadoop.hdfs.server.namenode.NNConf to FSNamesystem
  • HDFS-7517 - Remove redundant non-null checks in FSNamesystem#getBlockLocations
  • HDFS-8224 - Schedule a block for scanning if its metadata file is corrupt
  • HDFS-8269 - getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
  • HDFS-9601 - NNThroughputBenchmark.BlockReportStats should handle NotReplicatedYetException on adding block.
  • HDFS-9781 - FsDatasetImpl#getBlockReports can occasionally throw NullPointerException
  • HDFS-10641 - TestBlockManager#testBlockReportQueueing fails intermittently
  • HDFS-10879 - TestEncryptionZonesWithKMS#testReadWrite fails intermittently
  • HDFS-10962 - TestRequestHedgingProxyProvider fails intermittently
  • HDFS-10963 - Reduce log level when network topology cannot find enough datanodes
  • MAPREDUCE-6628 - Potential memory leak in CryptoOutputStream
  • MAPREDUCE-6641 - TestTaskAttempt fails in trunk
  • MAPREDUCE-6718 - Add progress log to JHS during startup
  • MAPREDUCE-6771 - RMContainerAllocator sends container diagnostics event after corresponding completion event
  • YARN-4940 - yarn node -list -all fails if RM starts with decommissioned node
  • HBASE-15856 - Do not cache unresolved addresses for connections
  • HBASE-16294 - hbck reporting "No HDFS region dir found" for replicas
  • HBASE-16699 - Overflows in AverageIntervalRateLimiter's refill() and getWaitInterval()
  • HBASE-16767 - Mob compaction needs to clean up files in /hbase/mobdir/.tmp and /hbase/mobdir/.tmp/.bulkload when running into IO exceptions
  • HIVE-10965 - Direct SQL for stats fails in 0-column case
  • HIVE-12083 - HIVE-10965 introduces thrift error if partNames or colNames are empty
  • HIVE-12475 - Parquet schema evolution within array<struct<>> does not work
  • HIVE-12785 - View with union type and UDF to the struct is broken
  • HIVE-13058 - Add session and operation_log directory deletion messages
  • HIVE-13198 - Authorization issues with cascading views
  • HIVE-13237 - Select parquet struct field with upper case throws NPE
  • HIVE-13620 - Merge llap branch work to master
  • HIVE-13625 - Hive Prepared Statement when executed with escape characters in parameter fails
  • HIVE-13645 - Beeline needs null-guard around hiveVars and hiveConfVars read
  • HIVE-14296 - Session count is not decremented when HS2 clients do not shutdown cleanly
  • HIVE-14383 - SparkClientImpl should pass principal and keytab to spark-submit instead of calling kinit explicitly
  • HIVE-14715 - Hive throws NumberFormatException with query with Null value
  • HIVE-14743 - ArrayIndexOutOfBoundsException - HBASE-backed views' query with JOINs
  • HIVE-14784 - Operation logs are disabled automatically if the parent directory does not exist.
  • HIVE-14805 - Subquery inside a view will have the object in the subquery as the direct input
  • HUE-4064 - Format creation and update date on the table details popover
  • HUE-4138 - Last modified time of a saved query is not in the correct timezone
  • HUE-4141 - Graph breaks for external workflows when there is more than one kill node
  • HUE-4804 - Download function of HTML widget breaks the display
  • HUE-4809 - Add trustore parameters only if SSL is turned on
  • HUE-4809 - Only add trustore paths when they are actually existing
  • HUE-4810 - Fix tests by setting data to valid JSON type
  • HUE-4871 - An unprivileged user can enumerate users
  • HUE-4891 - An unprivileged user can list document items
  • HUE-4916 - Truncate last name to 30 chars on ldap import
  • HUE-4968 - Remove access to /oozie/import_wokflow when v2 is enabled
  • HUE-4994 - Consider default path for decision nodes in dashboard graph
  • HUE-5041 - Hue export large file to HDFS does not work on non-default database
  • IMPALA-1619 - Support 64-bit allocations
  • IMPALA-3687 - Prefer Avro field name during schema reconciliation
  • IMPALA-3751 - Fix clang build errors and warnings
  • IMPALA-4135 - Thrift threaded server times-out connections during high load
  • IMPALA-4170 - Fix identifier quoting in COMPUTE INCREMENTAL STATS
  • IMPALA-4180 - Synchronize accesses to RuntimeState::reader_contexts_
  • IMPALA-4196 - Cross compile bit-byte-functions
  • IMPALA-4237 - Fix materialization of 4-byte decimals in data source scan node
  • OOZIE-1814 - Oozie should mask any passwords in logs and REST interfaces
  • SOLR-9310 - PeerSync fails on a node restart due to IndexFingerPrint mismatch
  • SPARK-12009 - Avoid reallocating YARN container when driver wants to stop all Executors
  • SPARK-12392 - Optimize a location order of broadcast blocks by considering preferred local hosts
  • SPARK-12941 - Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype mapping
  • SPARK-12941 - Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype
  • SPARK-13328 - Poor read performance for broadcast variables with dynamic resource allocation
  • SPARK-16625 - General data types to be mapped to Oracle
  • SPARK-16711 - YarnShuffleService doesn't re-init properly on YARN rolling upgrade
  • SPARK-17171 - DAG will list all partitions in the graph
  • SPARK-17433 - YarnShuffleService doesn't handle moving credentials levelDb
  • SPARK-17611 - Make shuffle service test really test authentication
  • SPARK-17644 - Do not add failedStages when abortStage for fetch failure
  • SPARK-17696 - Partial backport of to branch-1.6.
  • SQOOP-3021 - ClassWriter fails if a column name contains a backslash character

Issues Fixed in CDH 5.8.2

Kerberized HS2 with LDAP authentication fails in a multi-domain LDAP case

In CDH 5.7, Hive introduced a feature to support HS2 with Kerberos plus LDAP authentication; but it broke compatibility with multi-domain LDAP cases on CDH 5.7.x and C5.8.x versions.

Affected Versions: CDH 5.7.1, CDH 5.8.0, and CDH 5.8.1

Fixed in Versions: CDH 5.7.2 and higher, CDH 5.8.2 and higher

Bug: HIVE-13590.

Workaround: None.

Apache Oozie

Oozie Web Console returns 500 error when Oozie server runs on JDK 8u75 or higher

Bug: OOZIE-2533

The Oozie Web Console returns a 500 error when the Oozie server is running on JDK 8u75 and higher. The Oozie server still functions, and you can use the Oozie command line, REST API, Java API, or the Hue Oozie Dashboard to review status of those jobs.

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.8.2:

  • FLUME-1899 - Make SpoolDir work with subdirectories
  • FLUME-2652 - Documented transaction handling semantics incorrect in developer guide.
  • FLUME-2901 - Document Kerberos setup for Kafka channel
  • FLUME-2910 - AsyncHBaseSink: Failure callbacks should log the exception that caused them
  • FLUME-2913 - Don't strip SLF4J from imported classpaths
  • FLUME-2918 - Speed up TaildirSource on directories with many files
  • FLUME-2922 - Sync SequenceFile.Writer before calling hflush
  • FLUME-2923 - Bump asynchbase version to 1.7.0
  • FLUME-2934 - Document new cachePatternMatching option for TaildirSource
  • FLUME-2935 - Bump java target version to 1.7
  • FLUME-2948 - docs: Fix parameters on Replicating Channel Selector example
  • FLUME-2954 - Make raw data appearing in log messages explicit
  • FLUME-2963 - FlumeUserGuide: Fix error in Kafka Source properties table
  • FLUME-2972 - Handle offset migration in the new Kafka Channel
  • FLUME-2975 - docs: Fix NetcatSource example
  • FLUME-2982 - Add localhost escape sequence to HDFS sink
  • FLUME-2983 - Handle offset migration in the new Kafka Source
  • HADOOP-8436 - NPE In getLocalPathForWrite ( path, conf ) when the required context item is not configured
  • HADOOP-8437 - getLocalPathForWrite should throw IOException for invalid paths
  • HADOOP-8934 - Shell command ls should include sort options (Jonathan Allen via aw)
  • HADOOP-8934 - Shell command ls should include sort options
  • HADOOP-10048 - LocalDirAllocator should avoid holding locks while accessing the filesystem
  • HADOOP-10971 - Add -C flag to make `hadoop fs -ls` print filenames only
  • HDFS-10512 - VolumeScanner can terminate due to NPE in DataNode.reportBadBlocks.
  • HADOOP-11361 - Fix a race condition in MetricsSourceAdapter.updateJmxCache.
  • HADOOP-11469 - KMS should skip default.key.acl and whitelist.key.acl when loading key acl.
  • HADOOP-11901 - BytesWritable fails to support 2G chunks due to integer overflow
  • HADOOP-12252 - LocalDirAllocator should not throw NPE with empty string configuration
  • HADOOP-12609 - Fix intermittent failure of TestDecayRpcScheduler.
  • HADOOP-12659 - Incorrect usage of config parameters in token manager of KMS
  • HADOOP-12963 - Allow using path style addressing for accessing the s3 endpoint.
  • HADOOP-13079 - Add -q option to Ls to print ? instead of non-printable characters
  • HADOOP-13132 - Handle ClassCastException on AuthenticationException in LoadBalancingKMSClientProvider
  • HADOOP-13155 - Implement TokenRenewer to renew and cancel delegation tokens in KMS
  • HADOOP-13251 - Authenticate with Kerberos credentials when renewing KMS delegation token
  • HADOOP-13255 - KMSClientProvider should check and renew tgt when doing delegation token operations.
  • HADOOP-13263 - Reload cached groups in background after expiry.
  • HADOOP-13270 - BZip2CompressionInputStream finds the same compression marker twice in corner case, causing duplicate data blocks
  • HADOOP-13381 - KMS clients should use KMS Delegation Tokens from current UGI
  • HADOOP-13437 - KMS should reload whitelist and default key ACLs when hot-reloading HADOOP-13457 - Remove hardcoded absolute path for shell executable.
  • HADOOP-13487 - Hadoop KMS should load old delegation tokens from Zookeeper on startup
  • HDFS-4210 - Throw helpful exception when DNS entry for JournalNode cannot be resolved
  • HDFS-6434 - Default permission for creating file should be 644 for WebHdfs/HttpFS
  • HDFS-7597 - DelegationTokenIdentifier should cache the TokenIdentifier to UGI mapping
  • HDFS-8581 - ContentSummary on / skips further counts on yielding lock
  • HDFS-8829 - Make SO_RCVBUF and SO_SNDBUF size configurable for DataTransferProtocol sockets and allow configuring auto-tuning
  • HDFS-8897 - Balancer should handle fs.defaultFS trailing slash in HA
  • HDFS-9085 - Show renewer information in DelegationTokenIdentifier#toString
  • HDFS-9137 - DeadLock between DataNode#refreshVolumes and BPOfferService#registrationSucceeded.
  • HDFS-9141 - Thread leak in Datanode#refreshVolumes.
  • HDFS-9259 - Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario.
  • HDFS-9276 - Failed to Update HDFS Delegation Token for long running application in HA mode
  • HDFS-9365 - Balaner does not work with the HDFS-6376 HA setup.
  • HDFS-9461 - DiskBalancer: Add Report Command
  • HDFS-9466 - TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is unreliable
  • HDFS-9700 - DFSClient and DFSOutputStream should set TCP_NODELAY on sockets for DataTransferProtocol
  • HDFS-9732 - Improve DelegationTokenIdentifier.toString() for better logging
  • HDFS-9805 - Add server-side configuration for enabling TCP_NODELAY for DataTransferProtocol and default it to true
  • HDFS-9906 - Remove spammy log spew when a datanode is restarted.
  • HDFS-9939 - Increase DecompressorStream skip buffer size
  • HDFS-9958 - BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages
  • HDFS-10270 - TestJMXGet:testNameNode() fails
  • HDFS-10381 - , DataStreamer DataNode exclusion log message should be warning.
  • HDFS-10403 - DiskBalancer: Add cancel command
  • HDFS-10457 - DataNode should not auto-format block pool directory if VERSION is missing.
  • HDFS-10481 - HTTPFS server should correctly impersonate as end user to open file
  • HDFS-10500 - Diskbalancer: Print out information when a plan is not generated
  • HDFS-10501 - DiskBalancer: Use the default datanode port if port is not provided
  • HDFS-10516 - Fix bug when warming up EDEK cache of more than one encryption zone
  • HDFS-10517 - DiskBalancer: Support help command
  • HDFS-10525 - Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap
  • HDFS-10541 - Diskbalancer: When no actions in plan, error message says "Plan was generated more than 24 hours ago"
  • HDFS-10544 - Balancer doesn't work with IPFailoverProxyProvider.
  • HDFS-10552 - DiskBalancer "-query" results in NPE if no plan for the node
  • HDFS-10559 - DiskBalancer: Use SHA1 for Plan ID
  • HDFS-10567 - Improve plan command help message
  • HDFS-10588 - False alarm in datanode log - ERROR - Disk Balancer is not enabled
  • HDFS-10598 - DiskBalancer does not execute multi-steps plan
  • HDFS-10600 - PlanCommand#getThrsholdPercentage should not use throughput value.
  • HDFS-10643 - Namenode should use loginUser(hdfs) to generateEncryptedKey
  • HDFS-10681 - DiskBalancer: query command should report Plan file path apart from PlanID.
  • HDFS-10822 - Log DataNodes in the write pipeline. John Zhuge via Lei Xu
  • MAPREDUCE-4784 - TestRecovery occasionally fails
  • MAPREDUCE-6359 - In RM HA setup, Cluster tab links populated with AM hostname instead of RM
  • MAPREDUCE-6442 - Stack trace is missing when error occurs in client protocol provider's constructor Contributed by Chang Li.
  • MAPREDUCE-6473 - Revert "Revert "Job submission can take a long time during Cluster initialization
  • MAPREDUCE-6473 - Revert "Job submission can take a long time during Cluster initialization
  • MAPREDUCE-6473 - Job submission can take a long time during Cluster initialization
  • MAPREDUCE-6670 - TestJobListCache#testEviction sometimes fails on Windows with timeout
  • MAPREDUCE-6680 - JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc
  • MAPREDUCE-6738 - TestJobListCache.testAddExisting failed intermittently in slow VM testbed
  • MAPREDUCE-6761 - Regression when handling providers - invalid configuration ServiceConfiguration causes Cluster initialization failure
  • YARN-2605 - [RM HA] Rest api endpoints doing redirect incorrectly.
  • YARN-2977 - Fixed intermittent TestNMClient failure.
  • YARN-4411 - RMAppAttemptImpl#createApplicationAttemptReport throws IllegalArgumentException
  • YARN-4459 - container-executor should only kill process groups
  • YARN-4866 - FairScheduler: AMs can consume all vcores leading to a livelock when using FAIR policy.
  • YARN-4878 - Expose scheduling policy and max running apps over JMX for Yarn queues.
  • YARN-4989 - TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently
  • YARN-5048 - DelegationTokenRenewer#skipTokenRenewal may throw NPE
  • YARN-5077 - Fix FSLeafQueue#getFairShare() for queues with zero fairshare.
  • YARN-5107 - TestContainerMetrics fails.
  • YARN-5272 - Handle queue names consistently in FairScheduler.
  • YARN-5608 - TestAMRMClient.setup() fails with ArrayOutOfBoundsException
  • HBASE-14644 - Region in transition metric is broken -- addendum
  • HBASE-14644 - Region in transition metric is broken
  • HBASE-14818 - user_permission does not list namespace permissions
  • HBASE-14963 - Remove use of Guava Stopwatch from HBase client code
  • HBASE-15465 - userPermission returned by getUserPermission() for the selected namespace does not have namespace set
  • HBASE-15496 - Throw RowTooBigException only for user scan/get
  • HBASE-15621 - Suppress Hbase SnapshotHFile cleaner error messages when a snaphot is going on
  • HBASE-15683 - Min latency in latency histograms are emitted as Long.MAX_VALUE
  • HBASE-15698 - Increment TimeRange not serialized to server
  • HBASE-15746 - Remove extra RegionCoprocessor preClose() in RSRpcServices#closeRegion
  • HBASE-15808 - Reduce potential bulk load intermediate space usage and waste
  • HBASE-15872 - Split TestWALProcedureStore
  • HBASE-15873 - ACL for snapshot restore / clone is not enforced
  • HBASE-15925 - provide default values for hadoop compat module related properties that match default hadoop profile.
  • HBASE-16034 - Fix ProcedureTestingUtility#LoadCounter.setMaxProcId()
  • HBASE-16056 - Procedure v2 - fix master crash for FileNotFound
  • HBASE-16093 - Fix splits failed before creating daughter regions leave meta inconsistent
  • HBASE-16135 - PeerClusterZnode under rs of removed peer may never be deleted
  • HBASE-16194 - Should count in MSLAB chunk allocation into heap size change when adding duplicate cells
  • HBASE-16195 - Should not add chunk into chunkQueue if not using chunk pool in HeapMemStoreLAB
  • HBASE-16207 - can't restore snapshot without "Admin" permission
  • HBASE-16227 - [Shell] Column value formatter not working in scans. Tested : manually using shell.
  • HBASE-16284 - Unauthorized client can shutdown the cluster
  • HBASE-16288 - HFile intermediate block level indexes might recurse forever creating multi TB files.
  • HBASE-16319 - Fix TestCacheOnWrite after HBASE-16288.
  • HBASE-16317 - revert all ESAPI changes
  • HBASE-16318 - fail build while rendering velocity template if dependency license isn't in whitelist.
  • HBASE-16318 - consistently use the correct name for 'Apache License, Version 2.0'
  • HBASE-16321 - ensure no findbugs-jsr305
  • HBASE-16340 - exclude Xerces iplementation jars from coming in transitively.
  • HBASE-16360 - TableMapReduceUtil addHBaseDependencyJars has the wrong class name for PrefixTreeCodec
  • HIVE-7443 - Fix HiveConnection to communicate with Kerberized Hive JDBC server and alternative JDKs
  • HIVE-10007 - Support qualified table name in analyze table compute statistics for columns
  • HIVE-10728 - deprecate unix_timestamp(void) and make it deterministic
  • HIVE-11243 - Changing log level in Utilities.getBaseWork
  • HIVE-11432 - Hive macro give same result for different arguments
  • HIVE-11487 - Add getNumPartitionsByFilter api in metastore api
  • HIVE-11747 - Unnecessary error log is shown when executing a "INSERT OVERWRITE LOCAL DIRECTORY" cmd in the embedded mode
  • HIVE-11827 - STORED AS AVRO fails SELECT COUNT(*) when empty
  • HIVE-11901 - StorageBasedAuthorizationProvider requires write permission on table for SELECT statements
  • HIVE-11980 - Follow up on HIVE-11696, exception is thrown from CTAS from the table with table-level serde is Parquet while partition-level serde is JSON
  • HIVE-12277 - Hive macro results on macro_duplicate.q different after adding ORDER BY
  • HIVE-12556 - Ctrl-C in beeline doesn't kill Tez query on HS2
  • HIVE-12635 - Hive should return the latest hbase cell timestamp as the row timestamp value
  • HIVE-13043 - Reload function has no impact to function registry
  • HIVE-13090 - Hive metastore crashes on NPE with ZooKeeperTokenStore
  • HIVE-13372 - Hive Macro overwritten when multiple macros are used in one column
  • HIVE-13462 - HiveResultSetMetaData.getPrecision() fails for NULL columns
  • HIVE-13590 - Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case
  • HIVE-13704 - Don't call DistCp.execute() instead of DistCp.run()
  • HIVE-13736 - View's input/output formats are TEXT by default.
  • HIVE-13749 - Memory leak in Hive Metastore
  • HIVE-13884 - Disallow queries in HMS fetching more than a configured number of partitions
  • HIVE-13932 - Hive SMB Map Join with small set of LIMIT failed with NPE
  • HIVE-13953 - Issues in HiveLockObject equals method
  • HIVE-13991 - Union All on view fail with no valid permission on underneath table
  • HIVE-14006 - Hive query with UNION ALL fails with ArrayIndexOutOfBoundsException.
  • HIVE-14015 - SMB MapJoin failed for Hive on Spark when kerberized
  • HIVE-14055 - directSql - getting the number of partitions is broken
  • HIVE-14098 - Logging task properties, and environment variables might contain passwords
  • HIVE-14118 - Make the alter partition exception more meaningful
  • HIVE-14187 - JDOPersistenceManager objects remain cached if MetaStoreClient#close is not called
  • HIVE-14209 - Add some logging info for session and operation management
  • HIVE-14436 - Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and with MR engine
  • HIVE-14457 - Partitions in encryption zone are still trashed though an exception is returned
  • HIVE-14519 - Multi insert query bug
  • HIVE-14538 - beeline throws exceptions with parsing hive config when using !sh statement
  • HIVE-14697 - Can not access kerberized HS2 Web UI
  • HUE-2689 - Sub-workflow submitted from coordinator gets parent workflow graph
  • HUE-2971 - Some links of a Fork can point to deleted nodes
  • HUE-3842 - HTTP 500 while emptying Hue 3.9 trash directory
  • HUE-3908 - [useradmin] Ignore (objectclass=*) filter when searching for LDAP users
  • HUE-3988 - Support schemaless collections
  • HUE-3999 - list_oozie_workflow page shouldn't break incase of bad json from oozie
  • HUE-4005 - Remove oozie.coord.application.path from properties when rerunning workflow
  • HUE-4006 - Create new deployment directory when coordinator or bundle is copied
  • HUE-4007 - Fix deployement_dir for the bundle in oozie example fixtures
  • HUE-4019 - Always fetch the logs on check status
  • HUE-4019 - Do not blank error on query with good syntax but invalid query
  • HUE-4021 - [libsolr] Allow customization of the Solr path in ZooKeeper
  • HUE-4023 - [useradmin] update AuthenticationForm to allow activated users to login
  • HUE-4078 - Drag & Drop hive queries shows queries from the trash
  • HUE-4087 - Unable to kill jobs with Resource Manager HA enabled
  • HUE-4092 - Can't type any / in the HDFS ACLs path input
  • HUE-4119 - Change list jobs call to POST
  • HUE-4129 - Long running query getting terminated when leaving the editor
  • HUE-4134 - [liboozie] Avoid logging truststore credentials
  • HUE-4145 - Older queries after upgrade do not provide direct save
  • HUE-4146 - Older saved queries defaults to default' DB
  • HUE-4148 - Improve import testing of beeswax queries to notebook format
  • HUE-4153 - Report last seen progress when running impala query
  • HUE-4164 - The ApiHelper should treat any negative status in the response as an error
  • HUE-4177 - Horizontal scroll in FF (Chrome fine) with touch pad is extremely slow
  • HUE-4201 - Add warning about max limit of cells before truncation in the export / download query result
  • HUE-4202 - Enable offset param for fetching jobbrowser logs
  • HUE-4215 - Reset API_CACHE on logout
  • HUE-4224 - 'Did you know' on home page is gone
  • HUE-4227 - Fix unittest for MR API Cache
  • HUE-4238 - Ignore history docs in find_jobs_with_no_doc during sync documents
  • HUE-4238 - Ignore history docs in find_jobs_with_no_doc during sync documents
  • HUE-4252 - Handle 307 redirect from YARN upon standby failover
  • HUE-4252 - Handle 307 redirect from YARN upon standby failover
  • HUE-4253 - Prompt for variables just once per variable name
  • HUE-4258 - Close and pool Spark History Server connections
  • HUE-4265 - Bring back the show preview in the assist
  • HUE-4300 - Avoid double file listing call on folder search
  • HUE-4321 - Batch submit of SQL show USE the correct DB
  • HUE-4333 - Properly reset API_CACHE on failover
  • HUE-4346 - Query History disappeared after upgrade to 3.10
  • HUE-4353 - Typing in the search bar always redirect to the end of the input
  • HUE-4362 - List more oozie workflow parameters on the workflow dashboard page
  • HUE-4364 - Handle files with carriage return in create table from a file
  • HUE-4365 - No information surfaced when LOAD data from Create table from file fails
  • HUE-4375 - Horizontal scrollbar can be hidden under the first fixed column
  • HUE-4383 - Trashed queries are showing up in the list of saved queries
  • HUE-4406 - Fails to start if Hive/Impala Not Installed
  • HUE-4409 - Main right scrollbar does not scoll when on the very right of the screen
  • HUE-4411 - Enable scrolling past the end of the editor
  • HUE-4412 - Errors should scroll to the line AND the column too
  • HUE-4477 - Select All is not filtering out the non visible roles from the selection
  • HUE-4493 - Fix sync-workflow action when Workflow includes sub-workflow
  • HUE-4515 - Remove oozie.bundle.application.path from properties when rerunning workflow
  • HUE-4533 - Disable password reveal on IE
  • HUE-4537 - Fix database_logging in hue config so it logs debug database messages
  • HUE-4541 - fixing Hue job browser - Kerberos mutual authentication error in Hue
  • HUE-4564 - Log stderr on failure to coerce password from script
  • HUE-4616 - Only select the snippet DB when executing the first statement
  • HUE-4635 - Fix duration on jobs page for running jobs
  • HUE-4662 - fixing Hue - Wildcard Certificates not supported
  • HUE-4700 - Protect against setting XSS in old editor
  • HUE-4738 - Use Concurrency and Throttle values set in coordinator settings
  • HUE-4739 - fixed Jobbrowser tests which were failing after resource manager pool change
  • HUE-4766 - Replace illegal characters on CSV downloads
  • HUE-4781 - Fix export to hdfs to use download_cell_limit from beeswax.conf
  • HUE-4801 - When importing oozie documents and remapping UUIDs, data should be updated accordingly
  • HUE-4808 - Don't show the edit link for sub workflows when submitted outside Hue
  • IMPALA-1346 - /1590/2344: fix sorter buffer mgmt when spilling
  • IMPALA-3159 - impala-shell does not accept wildcard or SAN certificates
  • IMPALA-3344 - Simplify sorter and document/enforce invariants.
  • IMPALA-3441 - , IMPALA-3659: check for malformed Avro data
  • IMPALA-3499 - Split catalog update.
  • IMPALA-3628 - Fix cancellation from shell when security is enabled
  • IMPALA-3633 - cancel fragment if coordinator is gone
  • IMPALA-3646 - Handle corrupt RLE literal or repeat counts of 0.
  • IMPALA-3670 - fix sorter buffer mgmt bugs
  • IMPALA-3678 - Fix migration of predicates into union operands with an order by + limit.
  • IMPALA-3680 - Cleanup the scan range state after failed hdfs cache reads
  • IMPALA-3711 - Remove unnecessary privilege checks in getDbsMetadata().
  • IMPALA-3732 - handle string length overflow in avro files
  • IMPALA-3745 - parquet invalid data handling
  • IMPALA-3754 - fix TestParquet.test_corrupt_rle_counts flakiness
  • IMPALA-3772 - Fix admission control stress test.
  • IMPALA-3776 - fix 'describe formatted' for Avro tables
  • IMPALA-3820 - Handle linkage errors while loading Java UDFs in Catalog
  • IMPALA-3861 - Replace BetweenPredicates with their equivalent CompoundPredicate.
  • IMPALA-3915 - Register privilege and audit requests when analyzing resolved table refs.
  • IMPALA-3930 - Fix shuffle insert hint with constant partition exprs.
  • IMPALA-3940 - Fix getting column stats through views.
  • IMPALA-3965 - TSSLSocketWithWildcardSAN.py not exported as part of impala-shell build lib
  • IMPALA-4020 - Handle external conflicting changes to HMS gracefully
  • IMPALA-4049 - fix empty batch handling NLJ build side
  • OOZIE-2068 - Configuration as part of sharelib
  • OOZIE-2314 - Unable to kill old instance child job by workflow or coord rerun by Launcher
  • OOZIE-2329 - Make handling yarn restarts configurable
  • OOZIE-2345 - Parallel job submission for forked actions
  • OOZIE-2347 - AmendRemove unnecessary new Configuration()/new jobConf() calls from oozie
  • OOZIE-2347 - amendments patch toRemove unnecessary new Configuration()/new jobConf() calls from oozie
  • OOZIE-2347 - Remove unnecessary new Configuration()/new jobConf() calls from oozie
  • OOZIE-2436 - Fork/join workflow fails with oozie.action.yarn.tag must not be null
  • OOZIE-2504 - Create a log4j.properties under HADOOP_CONF_DIR in Shell Action
  • OOZIE-2533 - Patch-1550 - workaround for
  • OOZIE-2555 - Oozie SSL enable setup does not return port for admin -servers
  • OOZIE-2567 - HCat connection is not closed while getting hcat cred
  • OOZIE-2589 - CompletedActionXCommand is hardcoded to wrong priority
  • OOZIE-2649 - Can't override sub-workflow configuration property if defined in parent workflow XML
  • OOZIE-2656 - OozieShareLibCLI uses op system username instead of Kerberos to upload jars
  • PIG-3807 - Pig creates wrong schema after dereferencing nested tuple fields with sorts
  • SENTRY-1201 - Sentry ignores database prefix for MSCK statement
  • SENTRY-1311 - Improve usability of URI privileges by supporting mixed use of URIs with and without scheme
  • SENTRY-1320 - Queries of the form TRUNCATE TABLE db_name.table_name; no longer fail. The precondition checks allow two child nodes
  • SENTRY-1345 - Revert "ACLS on table folder disappear after insert for unpartitioned tables (Sravya Tirukkovalur, Reviewed by: Hao Hao and Anne Yu)"
  • SENTRY-1345 - ACLS on table folder disappear after insert for unpartitioned tables
  • SOLR-6295 - Fix child filter query creation to never match parent docs in SolrExampleTests
  • SOLR-7280 - Missing test resources
  • SOLR-7280 - BackportLoad cores in sorted order and tweak coreLoadThread counts to improve cluster stability on restarts
  • SOLR-7866 - Harden code to prevent an unhandled NPE when trying to determine the max value of the version field.
  • SOLR-9091 - ZkController#publishAndWaitForDownStates logic is inefficient
  • SOLR-9236 - AutoAddReplicas will append an extra /tlog to the update log location on replica failover.
  • SPARK-8428 - Fix integer overflows in TimSort
  • SPARK-12339 - Added a null check that was removed in
  • SPARK-13242 - codegen fallback in case-when if there many branches
  • SPARK-14391 - Fix launcher communication test, take 2.
  • SPARK-14963 - Fix typo in YarnShuffleService recovery file name
  • SPARK-14963 - Using recoveryPath if NM recovery is enabled
  • SPARK-15165 - Introduce place holder for comments in generated code
  • SPARK-16106 - TaskSchedulerImpl should properly track executors added to existing hosts
  • SPARK-16505 - Optionally propagate error during shuffle service startup.
  • SQOOP-2561 - Special Character removal from Column name as avro data results in duplicate column and fails the import
  • SQOOP-2846 - Sqoop Export with update-key failing for avro data file
  • SQOOP-2906 - Optimization of AvroUtil.toAvroIdentifier
  • SQOOP-2920 - sqoop performance deteriorates significantly on wide datasets; sqoop 100% on cpu
  • SQOOP-2971 - OraOop does not close connections properly
  • SQOOP-2995 - Backward incompatibility introduced by Custom Tool options.
  • SQOOP-2999 - Sqoop ClassNotFoundException (org.apache.commons.lang3.StringUtils) is thrown when executing Oracle direct import map task

Issues Fixed in CDH 5.8.0

Apache Flume

Flume fully compatible with Kafka 2.x

In release CDH 5.8.0, Flume is fully compatible with Kafka 2.x, including support for security features.

Apache HBase

Premature EOF detected in a WAL During Replication

Cloudera Bug: CDH-38113

During the parsing of a write-ahead log (WAL) during replication, an InvalidProtobufException can occur while reading the source RegionServer WAL, if EOF (end-of-file) is incorrectly detected before the actual end of the file. HBase stops reading the WAL after the EOF, and does not parse any bytes which occur after the EOF, causing data loss.

To work around this problem, Cloudera has patched HBase. HBase in CDH 5.8.0 and higher detect whether unparsed bytes exist after the EOF, and if so, the WAL is reset and re-read from the beginning, to attempt a clean read-through.

In testing, a single reset has been sufficient to work around observed data loss. However, the above change will retry a given WAL file indefinitely. On each attempt, a log message such as this will be emitted at the WARN level:
Processing end of WAL file '{}'. At position {}, which is too far away from
reported file length {}. Restarting WAL reading
Additional log detail are emitted at the TRACE level about file offsets seen while handling recoverable errors.

Batch Get after Batch Put Does Not Fetch All Cells

Bug: HBASE-15811

Cloudera Bug: CDH-40344

A batch Get after a batch Put could fail to fetch cells that were written by the Get, resulting in a "read-your-writes" failure. This bug was exacerbated by high load on the client.

Read Replica Failure For PUT Operation During Region Transition

Cloudera Bug: CDH-39758

When the patch for HBASE-10794 was applied in CDH 5.4.4, a new bug was introduced, where, if the primary RegionServer becomes unavailable (for any reason, even a graceful shutdown), while a client is performing PUTs on that region, subsequent PUTs will fail.

Latency Metrics Inaccurate for MultiGet Operations

Bug: HBASE-15673

Cloudera Bug: CDH-39422

Latency values are written after each row is processed. However, if MultiGet is enabled, some rows are not counted in the metrics. This causes the metrics for the 50th, 75th, and 90th percentiles to be reported as 0.

Inconsistent Behavior Among DeleteColumnFamilyProcedure, CreateTableProcedure, and ModifyTableProcedure

Bug: HBASE-15456

Cloudera Bug: CDH-12345

If there is only one family in the table, DeleteColumnFamilyProcedure will fail. When hbase.table.sanity.checks is set to false, the HMaster logs a warning, but CreateTableProcedure and ModifyTableProcedure will now fail, where before they logged a warning, but succeeded. This makes the behavior of all three methods consistent.

Failed hbase-spark Bulk Loads Leave Files Behind

Bug: HBASE-15271

Cloudera Bug: CDH-38145

When using the bulk load helper provided by the hbase-spark module, output files are now written into temporary files and only made available when the executor has successfully completed. Previously, failed executors would leave files behind, and these files would be picked up by subsequent bulk load commands, and spurious copies of some cells were written.

Apache Hive

HIVE-13217, CDH-30121: Replication for HoS MapJoin small file needs to respect dfs.replication.max

HIVE-13039, CDH-37322: BETWEEN predicate is not functioning correctly with predicate PUSHDOWN on Parquet table

HIVE-13065, CDH-37409: Hive throws NullPointerException (NPE) when writing map type data to an HBase-backed table

HIVE-13160, CDH-37847: HS2 unable to load UDFs on startup when HMS is not ready

HIVE-13243, CDH-38477: Hive DROP TABLE on encryption zone fails for external tables

HIVE-13302, CDH-38581: Direct SQL: CAST to DATE doesn't work on Oracle

HIVE-13115, CDH-38612: MetaStore Direct SQL getPartitions() call fails when the columns schemas for a partition are NULL

HIVE-10303, CDH-38685: HIVE-9471 broke forward compatibility of ORC files

HIVE-12706, CDH-39108: Incorrect output from from_utc_timestamp() / to_utc_timestamp when local timezone has DST

HIVE-10685, CDH-39187: ALTER TABLE concatenate operator will cause duplicate data

HIVE-13500: Launching big queries fails with OutOfMemoryException

HIVE-13527, CDH-39616: Using deprecated APIs in HBase client causes ZooKeeper connection leaks.

HIVE-12517, CDH-39722: Beeline's use of failed connection(s) causes failures and leaks.

HIVE-13632, CDH-39911: Hive failing on INSERT empty array into parquet table

HIVE-13285, CDH-39951: Orc concatenation may drop old files from moving to final path

HIVE-13836, CDH-40070: DbNotifications giving an error = Invalid state. Transaction has already started

HIVE-9499, CDH-40478: hive.limit.query.max.table.partition makes queries fail on non-partitioned tables

HIVE-13462, CDH-41031: HiveResultSetMetaData.getPrecision() fails for NULL columns

HIVE-11408, CDH-32685: HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils

HIVE-12481, CDH-36124: Occasionally "Request is a replay" will be thrown from HS2

HIVE-10698, CDH-37320: Query on view results fails with "table not found error" if view is created with subquery alias (CTE)

HIVE-12941, CDH-37979: Unexpected result when using MIN() on struct with NULL in first field

HIVE-13200, CDH-38046: Aggregation functions returning empty rows on partitioned columns

HIVE-11054, CDH-38517: Read error : Partition Varchar column cannot be cast to string

HIVE-13401, CDH-39009: Kerberized HS2 with LDAP auth enabled fails kerberos/delegation token authentication

HIVE-13217: Some queries with UNION all fail when CBO is off

HIVE-11369, CDH-39715: MapJoins in HiveServer2 fail when jmxremote is used

HIVE-13261, CDH-40042: Can not compute column stats for partition when schema evolves

With Sentry enabled, only Hive admin users have access to YARN job logs

As a prerequisite of enabling Sentry, Hive impersonation is turned off, which means all YARN jobs are submitted to the Hive job queue, and are run as the hive user. This is an issue because the YARN History Server now has to block users from accessing logs for their own jobs, since their own usernames are not associated with the jobs. As a result, end users cannot access any job logs unless they can get sudo access to the cluster as the hdfs, hive or other admin users.

In CDH 5.8 (and higher), Hive overrides the default configuration, mapred.job.queuename, and places incoming jobs into the connected user's job queue, even though the submitting user remains hive. Hive obtains the relevant queue/username information for each job by using YARN's fair-scheduler.xml file.

Hue

Cannot query the customers table in Hue

Bug: HUE-3040

Cloudera Bug: CDH-33974

To query the customers table, users must re-create the parquet data for compatibility.

Cloudera Distribution of Apache Kafka

CDH 5.7 is not compatible with Cloudera Distribution of Apache Kafka 1.x

Cloudera Distribution of Apache Kafka 1.x is compatible with CDH 5.4+.

Apache Oozie

PySpark does not work from the Oozie Spark Action

Bug: OOZIE-2482

Cloudera Bug: CDH-36349

The Spark Action would typically fail with a message like, "key not found: SPARK_HOME," but other error messages were possible. After the fix, the Spark Action has the necessary changes to successfully run PySpark jobs. See Oozie Spark Action Extension for more details and an example. Cloudera makes the PySpark dependencies available.

Apache Sentry

Security

Sentry does not check privileges on the URI used for the CREATE INDEX LOCATION '/path' command

Bug: SENTRY-1231

The CREATE INDEX LOCATION '/path' command would succeed even if a user did not have the required URI privileges for the /path.

Upgraded libthrift to version 0.9.3 due to a security vulnerability

Cloudera Bug: CDH-40034

For details on the security vulnerability in the Apache Thrift client libraries, see THRIFT-3231.

Hive Binding

INSERT OVERWRITE DIRECTORY command does not work correctly

Bug: SENTRY-922

The INSERT OVERWRITE DIRECTORY command would write table data into an HDFS directory (hdfs://path/), even if privileges are granted only for the local directory (file://path/).

INSERT INTO no longer requires URI privilege on partition locations

Bug: SENTRY-1095

The INSERT INTO Hive command adds location information to the partition description. Usually if location information is included, you must ensure that the user has privileges on the corresponding URI. However, in this case, since the partition locations are under the table directory and can be easily generated, these requirements have been relaxed.

Change default value of sentry.hive.server

Bug: SENTRY-1112

The default value for sentry.hive.server was changed from server1 to an empty string.

Sentry Service

Sentry's Oracle upgrade scripts fails with ORA-00955

Bug: SENTRY-1066

Sentry upgrade scripts for Oracle would fail with error, ORA-00955, because during the upgrade, the script inadvertently creates an index with the same name as the constraint being dropped. The script will now run DROP INDEX before it adds the constraint again and completes the schema upgrade successfully.

grantServerPrivilege() and revokeServerPrivilege() should treat '*' and 'ALL' as synonyms

Bug: SENTRY-1252

The grantServerPrivilege() and revokeServerPrivilege() methods should treat * and ALL as synonyms when an action is not explicitly specified. Previously, if grantServerPrivilege() was called without an action, and followed up with a revokeServerPrivilege() invocation with an action such as ALL, the server-level privilege would not be revoked. This fix only applies to privileges that are granted after upgrading to CDH 5.8.

Sentry Debugging

Error in Hive Metastore Plugin (renameAuthzObject) log messages

Bug: SENTRY-1169

The renameAuthzObject plugin prints log messages with old path names in place of new path names.

Apache ZooKeeper

Upgrade Netty Due to Security Vulnerabilities

Bug: ZOOKEEPER-2450

Cloudera Bug: CDH-39988

Netty was upgraded from version 3.2.2 to 3.10.5 to resolve security vulnerabilities.

Fix Privacy Violation in Login.java

Bug: ZOOKEEPER-2405

Cloudera Bug: CDH-36673

In Login.java, getTGT() was logging confidential information in DEBUG mode. After the fix, only principals are logged.