Issues Fixed in CDH 5.8.x

The following topics describe issues fixed in CDH 5.8.x, from newest to oldest release. You can also review What's New In CDH 5.8.x or Known Issues in CDH 5.

Issues Fixed in CDH 5.8.4

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.8.4:

  • AVRO-1943 - Unreliable test: TestNettyServerWithCompression.testConnectionsCount
  • CRUNCH-592 - Job fails for null ByteBuffer value in Avro tables
  • FLUME-2171 - Add Interceptor to remove headers from event
  • FLUME-2812 - Fix semaphore leak causing java.lang.Error: Maximum permit count exceeded in MemoryChannel
  • FLUME-2889 - Fixes to DateTime computations
  • FLUME-2997 - Fix unreliable test in SpillableMemoryChannel
  • FLUME-2999 - Kafka channel and sink should enable statically assigned partition per event via header
  • FLUME-3002 - Fix tests in TestBucketWriter
  • FLUME-3003 - Fix unreliable testSourceCounter in TestSyslogUdpSource
  • FLUME-3020 - Improve HDFS Sink escape sequence substitution
  • FLUME-3027 - Change Kafka Channel to clear offsets map after commit
  • FLUME-3031 - Change sequence source to reset its counter for event body on channel exception
  • HADOOP-7930 - Kerberos relogin interval in UserGroupInformation should be configurable
  • HADOOP-10300 - Deferred sending of call responses allowed
  • HADOOP-11031 - Design Document for Credential Provider API
  • HADOOP-12453 - Support decoding KMS Delegation Token with its own Identifier
  • HADOOP-12483 - Maintain wrapped SASL ordering for postponed IPC responses
  • HADOOP-12537 - S3A to support Amazon STS temporary credentials
  • HADOOP-12655 - TestHttpServer.testBindAddress bind port range is wider than expected
  • HADOOP-12723 - S3A to add ability to plug in any AWSCredentialsProvider
  • HADOOP-12973 - Make DU pluggable
  • HADOOP-12974 - Create a CachingGetSpaceUsed implementation that uses df
  • HADOOP-12975 - Add jitter to CachingGetSpaceUsed's thread
  • HADOOP-13034 - Log message about input options in distcp lacks some items
  • HADOOP-13072 - WindowsGetSpaceUsed constructor should be public
  • HADOOP-13317 - Add logs to KMS server-side to improve supportability
  • HADOOP-13590 - Retry until TGT expires even if the UGI renewal thread encountered exception
  • HADOOP-13641 - Update UGI#spawnAutoRenewalThreadForUserCreds to reduce indentation
  • HADOOP-13669 - Addendum patch 2 for KMS Server should log exceptions before throwing
  • HADOOP-13669 - Addendum patch for KMS Server should log exceptions before throwing
  • HADOOP-13669 - KMS Server should log exceptions before throwing
  • HADOOP-13693 - Remove the message about HTTP OPTIONS in SPNEGO initialization message from kms audit log
  • HADOOP-13838 - KMSTokenRenewer should close providers
  • HDFS-4176 - EditLogTailer should call rollEdits with a timeout
  • HDFS-6962 - ACLs inheritance conflict with umaskmode
  • HDFS-7413 - Some unit tests should use NameNodeProtocols instead of FSNameSystem
  • HDFS-7964 - Add support for async edit logging
  • HDFS-8709 - Clarify automatic sync in FSEditLog#logEdit
  • HDFS-9038 - DFS reserved space is erroneously counted towards non-DFS used
  • HDFS-9630 - DistCp minor refactoring and clean up
  • HDFS-9638 - Improve DistCp Help and documentation
  • HDFS-9820 - Improve distcp to support efficient restore to an earlier snapshot
  • HDFS-10216 - Distcp -diff throws exception when handling relative path
  • HDFS-10298 - Document the usage of distcp -diff option
  • HDFS-10312 - Large block reports may fail to decode at NameNode due to 64 MB protobuf maximum length restriction
  • HDFS-10313 - Distcp need to enforce the order of snapshot names passed to -diff
  • HDFS-10336 - TestBalancer failing intermittently because of not reseting UserGroupInformation completely
  • HDFS-10397 - Distcp should ignore -delete option if -diff option is provided instead of exiting
  • HDFS-10556 - DistCpOptions should be validated automatically
  • HDFS-10609 - Uncaught InvalidEncryptionKeyException during pipeline recovery may abort downstream applications
  • HDFS-10652 - Add a unit test for HDFS-4660
  • HDFS-10722 - Fix race condition in TestEditLog#testBatchedSyncWithClosedLogs
  • HDFS-10760 - DataXceiver#run() should not log InvalidToken exception as an error
  • HDFS-10763 - Open files can leak permanently due to inconsistent lease update
  • HDFS-11012 - Unnecessary INFO logging on DFSClients for InvalidToken
  • HDFS-11040 - Add documentation for HDFS-9820 distcp improvement
  • HDFS-11056 - Concurrent append and read operations lead to checksum error
  • HDFS-11160 - VolumeScanner reports write-in-progress replicas as corrupt incorrectly
  • HDFS-11229 - HDFS-11056 failed to close meta file
  • HDFS-11275 - Check groupEntryIndex and throw a helpful exception on failures when removing ACL
  • MAPREDUCE-6571 - JobEndNotification info logs are missing in AM container syslog
  • MAPREDUCE-6633 - AM should retry map attempts if the reduce task encounters compression related errors
  • MAPREDUCE-6728 - Give fetchers hint when ShuffleHandler rejects a shuffling connection
  • MAPREDUCE-6763 - Shuffle server listen queue is too small
  • MAPREDUCE-6798 - Fix intermittent failure of TestJobHistoryParsing.testJobHistoryMethods
  • MAPREDUCE-6801 - Fix unreliable TestKill.testKillJob
  • MAPREDUCE-6817 - The format of job start time in JHS is different from submit and finish times
  • YARN-3601 - Fix UT TestRMFailover.testRMWebAppRedirect
  • YARN-3654 - ContainerLogsPage web UI should not have meta-refresh
  • YARN-3722 - Merge multiple TestWebAppUtils into o.a.h.yarn.webapp.util.TestWebAppUtils
  • YARN-4004 - Container-executor should print output of docker logs if the docker container exits with non-0 exit status
  • YARN-4017 - Container-executor overuses PATH_MAX
  • YARN-4092 - Fixed UI redirection to print useful messages when both RMs are in standby mode (Addendum)
  • YARN-4245 - Generalize config file handling in container-executor
  • YARN-4255 - Container-executor does not clean up docker operation command files
  • YARN-4544 - All the log messages about rolling monitoring interval are shown with WARN level
  • YARN-4556 - TestFifoScheduler.testResourceOverCommit fails
  • YARN-4820 - ResourceManager web redirects in HA mode drops query parameters
  • YARN-5001 - Aggregated Logs root directory is created with wrong group if nonexistent
  • YARN-5136 - Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
  • YARN-5246 - NMWebAppFilter web redirects drop query parameters
  • YARN-5704 - Provide config knobs to control enabling/disabling new/work in progress features in container-executor
  • YARN-5837 - NPE when getting node status of a decommissioned node after an RM restart
  • YARN-5862 - TestDiskFailures.testLocalDirsFailures failed
  • YARN-5890 - FairScheduler should log information about AM-resource-usage and max-AM-share for queues
  • HBASE-15324 - Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy and trigger unexpected split
  • HBASE-15430 - Failed taking snapshot - Manifest proto-message too large
  • HBASE-16146 - Counter performance is expensive
  • HBASE-16172 - Unify the retry logic in ScannerCallableWithReplicas and RpcRetryingCallerWithReadReplicas
  • HBASE-16270 - Handle duplicate clearing of snapshot in region replicas
  • HBASE-16345 - RpcRetryingCallerWithReadReplicas#call() should catch some RegionServer Exceptions
  • HBASE-16824 - Writer.flush() can be called on already closed streams in WAL roll
  • HBASE-16841 - Data loss in MOB files after cloning a snapshot and deleting that snapshot
  • HBASE-17058 - Lower epsilon used for jitter verification from HBASE-15324
  • HBASE-17072 - CPU usage starts to climb up to 90-100% when using G1GC
  • HBASE-17241 - Avoid compacting already compacted mob files with _del files
  • HBASE-17452 - Failed taking snapshot - region Manifest proto-message too large
  • HIVE-10384 - BackportRetryingMetaStoreClient does not retry wrapped TTransportExceptions
  • HIVE-11849 - NPE in HiveHBaseTableShapshotInputFormat in query with just count(*)
  • HIVE-12077 - MSCK Repair table should fix partitions in batches
  • HIVE-12465 - Hive might produce wrong results when (outer) joins are merged
  • HIVE-12619 - Switching the field order within an array of structs causes the query to fail
  • HIVE-12757 - Fix TestCodahaleMetrics#testFileReporting
  • HIVE-12780 - Fix the output of the history command in Beeline
  • HIVE-12789 - Fix output twice in the history command of Beeline
  • HIVE-12891 - Hive fails when java.io.tmpdir is set to a relative location
  • HIVE-12976 - MetaStoreDirectSql doesn't batch IN lists in all cases
  • HIVE-13051 - Deadline class has numerous issues
  • HIVE-13129 - CliService leaks HMS connection
  • HIVE-13149 - Remove some unnecessary HMS connections from HS2
  • HIVE-13149 - Remove some unnecessary HMS connections from HS2
  • HIVE-13381 - Timestamp and date should have precedence in type hierarchy than string group
  • HIVE-13429 - Tool to remove dangling scratch dir
  • HIVE-13539 - HiveHFileOutputFormat searching the wrong directory for HFiles
  • HIVE-13866 - Flatten callstack for directSQL errors
  • HIVE-13895 - HoS start-up overhead in yarn-client mode
  • HIVE-13997 - Insert overwrite directory doesn't overwrite existing files
  • HIVE-14137 - Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty tables
  • HIVE-14173 - NPE was thrown after enabling direct.sql in the middle of session
  • HIVE-14313 - Test failure TestMetaStoreMetrics.testConnections
  • HIVE-14421 - FS.deleteOnExit holds references to _tmp_space.db files
  • HIVE-14762 - Add logging while removing scratch space
  • HIVE-14799 - Query operation are not thread safe during its cancellation
  • HIVE-14810 - Fix failing test: TestMetaStoreMetrics.testMetaDataCounts
  • HIVE-14817 - Shutdown the SessionManager timeoutChecker thread properly upon shutdown
  • HIVE-14839 - Improve the stability of TestSessionManagerMetrics
  • HIVE-14960 - Improve the stability of TestNotificationListener
  • HIVE-15054 - Hive insertion query execution fails on Hive on Spark
  • HIVE-15061 - Metastore types are sometimes case sensitive
  • HIVE-15090 - Temporary DB failure can stop ExpiredTokenRemover thread
  • HIVE-15231 - Query on view with CTE and alias fails with "table not found error"
  • HIVE-15291 - Comparison of timestamp fails if only date part is provided
  • HIVE-15410 - WebHCat supports get/set table property with its name containing period and hyphen
  • HIVE-15551 - Memory leak in directsql for MySQL with BoneCP specific initialization
  • HUE-4466 - [security] deliver csrftoken cookie with secure bit set if possible.
  • HUE-4546 - Auto-strip invalid characters from name field of converted docs
  • HUE-4747 - [editor] Download form should be submitted to a new tab otherwise the snippet gets closed
  • HUE-5028 - [oozie] User can't edit shared WF with modify permissions in new editor mode
  • HUE-5050 - [core] Logout fails for local login when multiple backends are used
  • HUE-5154 - [oozie] Create an new oozie workflow throws server error 500
  • HUE-5161 - [security] Speed up roles rendering
  • HUE-5163 - [security] Speed up initial page rendering
  • HUE-5166 - [impala] Handle empty session properties during upgrades
  • HUE-5218 - [search] Validate dashboard sharing works
  • HUE-5295 - [desktop] Avoid microsecond comparison for last_modified field MySQL < 5.6 doesn't support microsecond precision. https://code.djangoproject.com/ticket/19716
  • HUE-5295 - [desktop] Do not change the last_modified field when migrating history queries
  • HUE-5305 - [home] Fix empty share document modal and improve sharing UX
  • HUE-5310 - [search] Use Doc2 modal in search_controller
  • HUE-5476 - [core] Fix TTL is_idle middleware check
  • HUE-5482 - [home] Handle multiple home/trash directories by merging them into one.
  • IMPALA-1702 - "invalidate metadata" can cause duplicate TableIds
  • IMPALA-3167 - Fix assignment of WHERE clause predicate through grouping aggregate and outer join
  • IMPALA-3314 - Fix Avro schema loading for partitioned tables
  • IMPALA-3552 - Make incremental stats max serialized size configurable
  • IMPALA-3575 - Add retry to backend connection request and rpc timeout
  • IMPALA-3682 - Don't retry unrecoverable socket creation errors
  • IMPALA-3875 - Thrift threaded server hang in some cases
  • IMPALA-3884 - Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
  • IMPALA-3949 - Log the error message in FileSystemUtil.copyToLocal()
  • IMPALA-3964 - Fix crash when a count(*) is performed on a nested collection.
  • IMPALA-3983 - Delete function jar resources after load
  • IMPALA-4037 - ChildQuery::Cancel() appears to violate lock ordering
  • IMPALA-4038 - Fix locking during query cancellation
  • IMPALA-4076 - Fix runtime filter sort compare method
  • IMPALA-4099 - Fix the error message while loading UDFs with no JARs
  • IMPALA-4120 - Incorrect results with LEAD() analytic function
  • IMPALA-4153 - Fix count(*) on all blank('') columns - test
  • IMPALA-4223 - Handle truncated file read from HDFS cache
  • IMPALA-4246 - SleepForMs() utility function has undefined behavior for > 1s
  • IMPALA-4336 - Cast expressions after unnesting union operands
  • IMPALA-4363 - Add Parquet timestamp validation
  • IMPALA-4391 - Fix dropped statuses in scanners
  • IMPALA-4423 - Correct but conservative implementation of Subquery.equals()
  • IMPALA-4433 - Always generate test data using the same time zone setting
  • IMPALA-4449 - Revisit table locking pattern in the catalog. Fixes an issue where multiple long-running operations on the same catalog object (for example, a table) can block other catalog operations from making progress
  • IMPALA-4550 - Fix CastExpr analysis for substituted slots
  • IMPALA-4391 - Fix dropped status in scanners
  • LUCENE-5889 - AnalyzingInfixSuggester should expose commit()
  • LUCENE-7564 - AnalyzingInfixSuggester should close its IndexWriter by default at the end of build()
  • PIG-3818 - PIG-2499 is accidentally reverted
  • PIG-5025 - Fix unreliable test failures in TestLoad.java
  • SENTRY-858 - Add a test case for database prefix not honored when executing grant statement
  • SENTRY-1265 - Sentry service should not require a TGT as it is not talking to other Kerberos services as a client
  • SENTRY-1313 - Database prefix is not honoured when executing grant statement
  • SPARK-12241 - [YARN] Improve failure reporting in Yarn client obtainTokenForHBase()
  • SPARK-12523 - [YARN] Support long-running of the Spark On HBase and hive meta store.
  • SPARK-12966 - [SQL] ArrayType(DecimalType) support in Postgres JDBC
  • SPARK-13566 - [CORE] Avoid deadlock between BlockManager and Executor Thread
  • SPARK-13958 - Executor OOM due to unbounded growth of pointer array in…
  • SPARK-14204 - [SQL] register driverClass rather than user-specified class
  • SPARK-16044 - [SQL] Backport input_file_name() for data source based on NewHadoopRDD to branch 1.6
  • SPARK-17245 - [SQL][BRANCH-1.6] Do not rely on Hive's session state to retrieve HiveConf
  • SPARK-17465 - [SPARK CORE] Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may lead to memory leak
  • SQOOP-2349 - Add command line option for setting transaction isolation levels for metadata queries
  • SQOOP-2884 - Document argument overriding --temporary directory
  • SQOOP-2909 - Oracle related ImportTest fails after SQOOP-2737
  • SQOOP-2911 - Fix failing HCatalogExportTest caused by SQOOP-2863
  • SQOOP-2915 - Fixing Oracle related unit tests
  • SQOOP-2950 - Fix Sqoop trunk consistent UT failures
  • SQOOP-2952 - Row key not added into column family using --hbase-bulkload
  • SQOOP-2983 - OraOOP export has degraded performance with wide tables
  • SQOOP-2986 - Add validation check for --hive-import and --incremental lastmodified
  • SQOOP-2990 - Sqoop(oracle) export [updateTableToOracle] with "--update-mode allowinsert" : app fails with java.sql.SQLException: Missing IN or OUT parameter at index
  • SQOOP-3013 - Configuration "tmpjars" is not checked for empty strings before passing to MR
  • SQOOP-3028 - Include stack trace in the logging of exceptions in ExportTool
  • SQOOP-3034 - HBase import should fail fast if using anything other than as-textfile
  • SQOOP-3053 - Create a cmd line argument for sqoop.throwOnError and use it through SqoopOptions
  • SQOOP-3055 - Fixing MySQL tests failing due to ignored test inputs/configuration
  • SQOOP-3057 - Fixing 3rd party Oracle tests failing due to invalid case of column names
  • SQOOP-3066 - Introduce an option + env variable to enable/disable SQOOP-2737 feature
  • SQOOP-3069 - Get OracleExportTest#testUpsertTestExport in line with SQOOP-3066
  • SQOOP-3071 - Fix OracleManager to apply localTimeZone correctly in case of Date objects too
  • SQOOP-3072 - Re-enable escaping in ImportTest#testProductWithWhiteSpaceImport for proper execution
  • SQOOP-3081 - Use OracleEscapeUtils.escapeIdentifier in OracleUpsertOutputFormat instead of inline appending quotes
  • SQOOP-3124 - Fix ordering in column list query of PostgreSQL connector to reflect logical order (rather than ad hoc ordering)

Issues Fixed in CDH 5.8.3

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.8.3:

  • FLUME-2797 - Use SourceCounter for SyslogTcpSource
  • FLUME-2844 - SpillableMemoryChannel must start ChannelCounter
  • HADOOP-12548 - Read s3a credentials from a Credential Provider
  • HADOOP-13353 - LdapGroupsMapping getPassward should not return null when IOException throws
  • HADOOP-13526 - Add detailed logging in KMS for the authentication failure of proxy user
  • HADOOP-13558 - UserGroupInformation created from a Subject incorrectly tries to renew the Kerberos ticket
  • HADOOP-13579 - Fix source-level compatibility after HADOOP-11252
  • HADOOP-13638 - KMS should set UGI's Configuration object properly
  • HDFS-7415 - Move FSNameSystem.resolvePath() to FSDirectory
  • HDFS-7420 - Delegate permission checks to FSDirectory
  • HDFS-7463 - Simplify FSNamesystem#getBlockLocationsUpdateTimes
  • HDFS-7478 - Move org.apache.hadoop.hdfs.server.namenode.NNConf to FSNamesystem
  • HDFS-7517 - Remove redundant non-null checks in FSNamesystem#getBlockLocations
  • HDFS-8224 - Schedule a block for scanning if its metadata file is corrupt
  • HDFS-8269 - getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
  • HDFS-9601 - NNThroughputBenchmark.BlockReportStats should handle NotReplicatedYetException on adding block.
  • HDFS-9781 - FsDatasetImpl#getBlockReports can occasionally throw NullPointerException
  • HDFS-10641 - TestBlockManager#testBlockReportQueueing fails intermittently
  • HDFS-10879 - TestEncryptionZonesWithKMS#testReadWrite fails intermittently
  • HDFS-10962 - TestRequestHedgingProxyProvider fails intermittently
  • HDFS-10963 - Reduce log level when network topology cannot find enough datanodes
  • MAPREDUCE-6628 - Potential memory leak in CryptoOutputStream
  • MAPREDUCE-6641 - TestTaskAttempt fails in trunk
  • MAPREDUCE-6718 - Add progress log to JHS during startup
  • MAPREDUCE-6771 - RMContainerAllocator sends container diagnostics event after corresponding completion event
  • YARN-4940 - yarn node -list -all fails if RM starts with decommissioned node
  • HBASE-15856 - Addendum Fix UnknownHostException import in MetaTableLocator
  • HBASE-15856 - Do not cache unresolved addresses for connections
  • HBASE-16294 - hbck reporting "No HDFS region dir found" for replicas
  • HBASE-16699 - Overflows in AverageIntervalRateLimiter's refill() and getWaitInterval()
  • HBASE-16767 - Mob compaction needs to clean up files in /hbase/mobdir/.tmp and /hbase/mobdir/.tmp/.bulkload when running into IO exceptions
  • HIVE-9570 - Investigate test failure on union_view.q
  • HIVE-10965 - Direct SQL for stats fails in 0-column case
  • HIVE-12083 - HIVE-10965 introduces thrift error if partNames or colNames are empty
  • HIVE-12475 - Parquet schema evolution within array<struct<>> does not work
  • HIVE-12785 - View with union type and UDF to the struct is broken
  • HIVE-13058 - Add session and operation_log directory deletion messages
  • HIVE-13198 - Authorization issues with cascading views
  • HIVE-13237 - Select parquet struct field with upper case throws NPE
  • HIVE-13620 - Merge llap branch work to master
  • HIVE-13625 - Hive Prepared Statement when executed with escape characters in parameter fails
  • HIVE-13645 - Beeline needs null-guard around hiveVars and hiveConfVars read
  • HIVE-14296 - Session count is not decremented when HS2 clients do not shutdown cleanly
  • HIVE-14383 - SparkClientImpl should pass principal and keytab to spark-submit instead of calling kinit explicitly
  • HIVE-14715 - Hive throws NumberFormatException with query with Null value
  • HIVE-14743 - ArrayIndexOutOfBoundsException - HBASE-backed views' query with JOINs
  • HIVE-14784 - Operation logs are disabled automatically if the parent directory does not exist.
  • HIVE-14805 - Subquery inside a view will have the object in the subquery as the direct input
  • HUE-4064 - Format creation and update date on the table details popover
  • HUE-4138 - Last modified time of a saved query is not in the correct timezone
  • HUE-4141 - Graph breaks for external workflows when there is more than one kill node
  • HUE-4804 - Download function of HTML widget breaks the display
  • HUE-4809 - Add trustore parameters only if SSL is turned on
  • HUE-4809 - Only add trustore paths when they are actually existing
  • HUE-4810 - Fix tests by setting data to valid JSON type
  • HUE-4871 - An unprivileged user can enumerate users
  • HUE-4891 - An unprivileged user can list document items
  • HUE-4916 - Truncate last name to 30 chars on ldap import
  • HUE-4968 - Remove access to /oozie/import_wokflow when v2 is enabled
  • HUE-4994 - Consider default path for decision nodes in dashboard graph
  • HUE-5041 - Hue export large file to HDFS does not work on non-default database
  • IMPALA-1619 - Support 64-bit allocations
  • IMPALA-3687 - Prefer Avro field name during schema reconciliation
  • IMPALA-3751 - Fix clang build errors and warnings
  • IMPALA-4135 - Thrift threaded server times-out connections during high load
  • IMPALA-4170 - Fix identifier quoting in COMPUTE INCREMENTAL STATS
  • IMPALA-4180 - Synchronize accesses to RuntimeState::reader_contexts_
  • IMPALA-4196 - Cross compile bit-byte-functions
  • IMPALA-4237 - Fix materialization of 4-byte decimals in data source scan node
  • OOZIE-1814 - Oozie should mask any passwords in logs and REST interfaces
  • SOLR-9310 - PeerSync fails on a node restart due to IndexFingerPrint mismatch
  • SPARK-12009 - Avoid reallocating YARN container when driver wants to stop all Executors
  • SPARK-12392 - Optimize a location order of broadcast blocks by considering preferred local hosts
  • SPARK-12941 - Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype mapping
  • SPARK-12941 - Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype
  • SPARK-13328 - Poor read performance for broadcast variables with dynamic resource allocation
  • SPARK-16625 - General data types to be mapped to Oracle
  • SPARK-16711 - YarnShuffleService doesn't re-init properly on YARN rolling upgrade
  • SPARK-17171 - DAG will list all partitions in the graph
  • SPARK-17433 - YarnShuffleService doesn't handle moving credentials levelDb
  • SPARK-17611 - Make shuffle service test really test authentication
  • SPARK-17644 - Do not add failedStages when abortStage for fetch failure
  • SPARK-17696 - Partial backport of to branch-1.6.
  • SQOOP-3021 - ClassWriter fails if a column name contains a backslash character

Issues Fixed in CDH 5.8.2

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.8.2:

  • FLUME-1899 - Make SpoolDir work with subdirectories
  • FLUME-2652 - Documented transaction handling semantics incorrect in developer guide.
  • FLUME-2901 - Document Kerberos setup for Kafka channel
  • FLUME-2910 - AsyncHBaseSink: Failure callbacks should log the exception that caused them
  • FLUME-2913 - Don't strip SLF4J from imported classpaths
  • FLUME-2918 - Speed up TaildirSource on directories with many files
  • FLUME-2922 - Sync SequenceFile.Writer before calling hflush
  • FLUME-2923 - Bump asynchbase version to 1.7.0
  • FLUME-2934 - Document new cachePatternMatching option for TaildirSource
  • FLUME-2935 - Bump java target version to 1.7
  • FLUME-2948 - docs: Fix parameters on Replicating Channel Selector example
  • FLUME-2954 - Make raw data appearing in log messages explicit
  • FLUME-2963 - FlumeUserGuide: Fix error in Kafka Source properties table
  • FLUME-2972 - Handle offset migration in the new Kafka Channel
  • FLUME-2975 - docs: Fix NetcatSource example
  • FLUME-2982 - Add localhost escape sequence to HDFS sink
  • FLUME-2983 - Handle offset migration in the new Kafka Source
  • HADOOP-8436 - NPE In getLocalPathForWrite ( path, conf ) when the required context item is not configured
  • HADOOP-8437 - getLocalPathForWrite should throw IOException for invalid paths
  • HADOOP-8934 - Shell command ls should include sort options (Jonathan Allen via aw)
  • HADOOP-8934 - Shell command ls should include sort options
  • HADOOP-10048 - LocalDirAllocator should avoid holding locks while accessing the filesystem
  • HADOOP-10971 - Add -C flag to make `hadoop fs -ls` print filenames only
  • HDFS-10512 - VolumeScanner can terminate due to NPE in DataNode.reportBadBlocks.
  • HADOOP-11361 - Fix a race condition in MetricsSourceAdapter.updateJmxCache.
  • HADOOP-11469 - KMS should skip default.key.acl and whitelist.key.acl when loading key acl.
  • HADOOP-11901 - BytesWritable fails to support 2G chunks due to integer overflow
  • HADOOP-12252 - LocalDirAllocator should not throw NPE with empty string configuration
  • HADOOP-12609 - Fix intermittent failure of TestDecayRpcScheduler.
  • HADOOP-12659 - Incorrect usage of config parameters in token manager of KMS
  • HADOOP-12963 - Allow using path style addressing for accessing the s3 endpoint.
  • HADOOP-13079 - Add -q option to Ls to print ? instead of non-printable characters
  • HADOOP-13132 - Handle ClassCastException on AuthenticationException in LoadBalancingKMSClientProvider
  • HADOOP-13155 - Implement TokenRenewer to renew and cancel delegation tokens in KMS
  • HADOOP-13251 - Authenticate with Kerberos credentials when renewing KMS delegation token
  • HADOOP-13255 - KMSClientProvider should check and renew tgt when doing delegation token operations.
  • HADOOP-13263 - Reload cached groups in background after expiry.
  • HADOOP-13270 - BZip2CompressionInputStream finds the same compression marker twice in corner case, causing duplicate data blocks
  • HADOOP-13381 - KMS clients should use KMS Delegation Tokens from current UGI
  • HADOOP-13437 - KMS should reload whitelist and default key ACLs when hot-reloading HADOOP-13457 - Remove hardcoded absolute path for shell executable.
  • HADOOP-13487 - Hadoop KMS should load old delegation tokens from Zookeeper on startup
  • HDFS-4210 - Throw helpful exception when DNS entry for JournalNode cannot be resolved
  • HDFS-6434 - Default permission for creating file should be 644 for WebHdfs/HttpFS
  • HDFS-7597 - DelegationTokenIdentifier should cache the TokenIdentifier to UGI mapping
  • HDFS-8581 - ContentSummary on / skips further counts on yielding lock
  • HDFS-8829 - Make SO_RCVBUF and SO_SNDBUF size configurable for DataTransferProtocol sockets and allow configuring auto-tuning
  • HDFS-8897 - Balancer should handle fs.defaultFS trailing slash in HA
  • HDFS-9085 - Show renewer information in DelegationTokenIdentifier#toString
  • HDFS-9137 - DeadLock between DataNode#refreshVolumes and BPOfferService#registrationSucceeded.
  • HDFS-9141 - Thread leak in Datanode#refreshVolumes.
  • HDFS-9259 - Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario.
  • HDFS-9276 - Failed to Update HDFS Delegation Token for long running application in HA mode
  • HDFS-9365 - Balaner does not work with the HDFS-6376 HA setup.
  • HDFS-9461 - DiskBalancer: Add Report Command
  • HDFS-9466 - TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is unreliable
  • HDFS-9700 - DFSClient and DFSOutputStream should set TCP_NODELAY on sockets for DataTransferProtocol
  • HDFS-9732 - Improve DelegationTokenIdentifier.toString() for better logging
  • HDFS-9805 - Add server-side configuration for enabling TCP_NODELAY for DataTransferProtocol and default it to true
  • HDFS-9906 - Remove spammy log spew when a datanode is restarted.
  • HDFS-9939 - Increase DecompressorStream skip buffer size
  • HDFS-9958 - BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages
  • HDFS-10270 - TestJMXGet:testNameNode() fails
  • HDFS-10381 - , DataStreamer DataNode exclusion log message should be warning.
  • HDFS-10403 - DiskBalancer: Add cancel command
  • HDFS-10457 - DataNode should not auto-format block pool directory if VERSION is missing.
  • HDFS-10481 - HTTPFS server should correctly impersonate as end user to open file
  • HDFS-10500 - Diskbalancer: Print out information when a plan is not generated
  • HDFS-10501 - DiskBalancer: Use the default datanode port if port is not provided
  • HDFS-10516 - Fix bug when warming up EDEK cache of more than one encryption zone
  • HDFS-10517 - DiskBalancer: Support help command
  • HDFS-10525 - Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap
  • HDFS-10541 - Diskbalancer: When no actions in plan, error message says "Plan was generated more than 24 hours ago"
  • HDFS-10544 - Balancer doesn't work with IPFailoverProxyProvider.
  • HDFS-10552 - DiskBalancer "-query" results in NPE if no plan for the node
  • HDFS-10559 - DiskBalancer: Use SHA1 for Plan ID
  • HDFS-10567 - Improve plan command help message
  • HDFS-10588 - False alarm in datanode log - ERROR - Disk Balancer is not enabled
  • HDFS-10598 - DiskBalancer does not execute multi-steps plan
  • HDFS-10600 - PlanCommand#getThrsholdPercentage should not use throughput value.
  • HDFS-10643 - Namenode should use loginUser(hdfs) to generateEncryptedKey
  • HDFS-10681 - DiskBalancer: query command should report Plan file path apart from PlanID.
  • HDFS-10822 - Log DataNodes in the write pipeline. John Zhuge via Lei Xu
  • MAPREDUCE-4784 - TestRecovery occasionally fails
  • MAPREDUCE-6359 - In RM HA setup, Cluster tab links populated with AM hostname instead of RM
  • MAPREDUCE-6442 - Stack trace is missing when error occurs in client protocol provider's constructor Contributed by Chang Li.
  • MAPREDUCE-6473 - Revert "Revert "Job submission can take a long time during Cluster initialization
  • MAPREDUCE-6473 - Revert "Job submission can take a long time during Cluster initialization
  • MAPREDUCE-6473 - Job submission can take a long time during Cluster initialization
  • MAPREDUCE-6670 - TestJobListCache#testEviction sometimes fails on Windows with timeout
  • MAPREDUCE-6680 - JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc
  • MAPREDUCE-6738 - TestJobListCache.testAddExisting failed intermittently in slow VM testbed
  • MAPREDUCE-6761 - Regression when handling providers - invalid configuration ServiceConfiguration causes Cluster initialization failure
  • YARN-2605 - [RM HA] Rest api endpoints doing redirect incorrectly.
  • YARN-2977 - Fixed intermittent TestNMClient failure.
  • YARN-4411 - RMAppAttemptImpl#createApplicationAttemptReport throws IllegalArgumentException
  • YARN-4459 - container-executor should only kill process groups
  • YARN-4866 - FairScheduler: AMs can consume all vcores leading to a livelock when using FAIR policy.
  • YARN-4878 - Expose scheduling policy and max running apps over JMX for Yarn queues.
  • YARN-4989 - TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently
  • YARN-5048 - DelegationTokenRenewer#skipTokenRenewal may throw NPE
  • YARN-5077 - Fix FSLeafQueue#getFairShare() for queues with zero fairshare.
  • YARN-5107 - TestContainerMetrics fails.
  • YARN-5272 - Handle queue names consistently in FairScheduler.
  • YARN-5608 - TestAMRMClient.setup() fails with ArrayOutOfBoundsException
  • HBASE-14644 - Region in transition metric is broken -- addendum
  • HBASE-14644 - Region in transition metric is broken
  • HBASE-14818 - user_permission does not list namespace permissions
  • HBASE-14963 - Remove use of Guava Stopwatch from HBase client code
  • HBASE-15465 - userPermission returned by getUserPermission() for the selected namespace does not have namespace set
  • HBASE-15496 - Throw RowTooBigException only for user scan/get
  • HBASE-15621 - Suppress Hbase SnapshotHFile cleaner error messages when a snaphot is going on
  • HBASE-15683 - Min latency in latency histograms are emitted as Long.MAX_VALUE
  • HBASE-15698 - Increment TimeRange not serialized to server
  • HBASE-15746 - Remove extra RegionCoprocessor preClose() in RSRpcServices#closeRegion
  • HBASE-15808 - Reduce potential bulk load intermediate space usage and waste
  • HBASE-15872 - Split TestWALProcedureStore
  • HBASE-15873 - ACL for snapshot restore / clone is not enforced
  • HBASE-15925 - provide default values for hadoop compat module related properties that match default hadoop profile.
  • HBASE-16034 - Fix ProcedureTestingUtility#LoadCounter.setMaxProcId()
  • HBASE-16056 - Procedure v2 - fix master crash for FileNotFound
  • HBASE-16093 - Fix splits failed before creating daughter regions leave meta inconsistent
  • HBASE-16135 - PeerClusterZnode under rs of removed peer may never be deleted
  • HBASE-16194 - Should count in MSLAB chunk allocation into heap size change when adding duplicate cells
  • HBASE-16195 - Should not add chunk into chunkQueue if not using chunk pool in HeapMemStoreLAB
  • HBASE-16207 - can't restore snapshot without "Admin" permission
  • HBASE-16227 - [Shell] Column value formatter not working in scans. Tested : manually using shell.
  • HBASE-16284 - Unauthorized client can shutdown the cluster
  • HBASE-16288 - HFile intermediate block level indexes might recurse forever creating multi TB files.
  • HBASE-16319 - Fix TestCacheOnWrite after HBASE-16288.
  • HBASE-16317 - revert all ESAPI changes
  • HBASE-16318 - fail build while rendering velocity template if dependency license isn't in whitelist.
  • HBASE-16318 - consistently use the correct name for 'Apache License, Version 2.0'
  • HBASE-16321 - ensure no findbugs-jsr305
  • HBASE-16340 - exclude Xerces iplementation jars from coming in transitively.
  • HBASE-16360 - TableMapReduceUtil addHBaseDependencyJars has the wrong class name for PrefixTreeCodec
  • HIVE-7443 - Fix HiveConnection to communicate with Kerberized Hive JDBC server and alternative JDKs
  • HIVE-10007 - Support qualified table name in analyze table compute statistics for columns
  • HIVE-10728 - deprecate unix_timestamp(void) and make it deterministic (Sergey Shelukhin, reveiwed by Ashutosh Chauhan( Also include the unit tests by HIVE-10932 : Unit test udf_nondeterministic failure due to
  • HIVE-11243 - Changing log level in Utilities.getBaseWork
  • HIVE-11432 - Hive macro give same result for different arguments
  • HIVE-11487 - Add getNumPartitionsByFilter api in metastore api
  • HIVE-11747 - Unnecessary error log is shown when executing a "INSERT OVERWRITE LOCAL DIRECTORY" cmd in the embedded mode
  • HIVE-11827 - STORED AS AVRO fails SELECT COUNT(*) when empty
  • HIVE-11901 - StorageBasedAuthorizationProvider requires write permission on table for SELECT statements
  • HIVE-11980 - Follow up on HIVE-11696, exception is thrown from CTAS from the table with table-level serde is Parquet while partition-level serde is JSON
  • HIVE-12277 - Hive macro results on macro_duplicate.q different after adding ORDER BY
  • HIVE-12556 - Ctrl-C in beeline doesn't kill Tez query on HS2
  • HIVE-12635 - Hive should return the latest hbase cell timestamp as the row timestamp value
  • HIVE-13043 - Reload function has no impact to function registry
  • HIVE-13090 - Hive metastore crashes on NPE with ZooKeeperTokenStore
  • HIVE-13372 - Hive Macro overwritten when multiple macros are used in one column
  • HIVE-13462 - HiveResultSetMetaData.getPrecision() fails for NULL columns
  • HIVE-13590 - Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case
  • HIVE-13704 - Don't call DistCp.execute() instead of DistCp.run()
  • HIVE-13736 - View's input/output formats are TEXT by default.
  • HIVE-13749 - Memory leak in Hive Metastore
  • HIVE-13884 - Disallow queries in HMS fetching more than a configured number of partitions
  • HIVE-13932 - Hive SMB Map Join with small set of LIMIT failed with NPE
  • HIVE-13953 - Issues in HiveLockObject equals method
  • HIVE-13991 - Union All on view fail with no valid permission on underneath table
  • HIVE-14006 - Hive query with UNION ALL fails with ArrayIndexOutOfBoundsException.
  • HIVE-14015 - SMB MapJoin failed for Hive on Spark when kerberized
  • HIVE-14055 - directSql - getting the number of partitions is broken
  • HIVE-14098 - Logging task properties, and environment variables might contain passwords
  • HIVE-14118 - Make the alter partition exception more meaningful
  • HIVE-14187 - JDOPersistenceManager objects remain cached if MetaStoreClient#close is not called
  • HIVE-14209 - Add some logging info for session and operation management
  • HIVE-14436 - Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and with MR engine
  • HIVE-14457 - Partitions in encryption zone are still trashed though an exception is returned
  • HIVE-14519 - Multi insert query bug
  • HIVE-14538 - beeline throws exceptions with parsing hive config when using !sh statement
  • HIVE-14697 - Can not access kerberized HS2 Web UI
  • HUE-2689 - Sub-workflow submitted from coordinator gets parent workflow graph
  • HUE-2971 - Some links of a Fork can point to deleted nodes
  • HUE-3842 - HTTP 500 while emptying Hue 3.9 trash directory
  • HUE-3908 - [useradmin] Ignore (objectclass=*) filter when searching for LDAP users
  • HUE-3988 - Support schemaless collections
  • HUE-3999 - list_oozie_workflow page shouldn't break incase of bad json from oozie
  • HUE-4005 - Remove oozie.coord.application.path from properties when rerunning workflow
  • HUE-4006 - Create new deployment directory when coordinator or bundle is copied
  • HUE-4007 - Fix deployement_dir for the bundle in oozie example fixtures
  • HUE-4019 - Always fetch the logs on check status
  • HUE-4019 - Do not blank error on query with good syntax but invalid query
  • HUE-4021 - [libsolr] Allow customization of the Solr path in ZooKeeper
  • HUE-4023 - [useradmin] update AuthenticationForm to allow activated users to login
  • HUE-4078 - Drag & Drop hive queries shows queries from the trash
  • HUE-4087 - Unable to kill jobs with Resource Manager HA enabled
  • HUE-4092 - Can't type any / in the HDFS ACLs path input
  • HUE-4119 - Change list jobs call to POST
  • HUE-4129 - Long running query getting terminated when leaving the editor
  • HUE-4134 - [liboozie] Avoid logging truststore credentials
  • HUE-4145 - Older queries after upgrade do not provide direct save
  • HUE-4146 - Older saved queries defaults to default' DB
  • HUE-4148 - Improve import testing of beeswax queries to notebook format
  • HUE-4153 - Report last seen progress when running impala query
  • HUE-4164 - The ApiHelper should treat any negative status in the response as an error
  • HUE-4177 - Horizontal scroll in FF (Chrome fine) with touch pad is extremely slow
  • HUE-4201 - Add warning about max limit of cells before truncation in the export / download query result
  • HUE-4202 - Enable offset param for fetching jobbrowser logs
  • HUE-4215 - Reset API_CACHE on logout
  • HUE-4224 - 'Did you know' on home page is gone
  • HUE-4227 - Fix unittest for MR API Cache
  • HUE-4238 - Ignore history docs in find_jobs_with_no_doc during sync documents
  • HUE-4238 - Ignore history docs in find_jobs_with_no_doc during sync documents
  • HUE-4252 - Handle 307 redirect from YARN upon standby failover
  • HUE-4252 - Handle 307 redirect from YARN upon standby failover
  • HUE-4253 - Prompt for variables just once per variable name
  • HUE-4258 - Close and pool Spark History Server connections
  • HUE-4265 - Bring back the show preview in the assist
  • HUE-4300 - Avoid double file listing call on folder search
  • HUE-4321 - Batch submit of SQL show USE the correct DB
  • HUE-4333 - Properly reset API_CACHE on failover
  • HUE-4346 - Query History disappeared after upgrade to 3.10
  • HUE-4353 - Typing in the search bar always redirect to the end of the input
  • HUE-4362 - List more oozie workflow parameters on the workflow dashboard page
  • HUE-4364 - Handle files with carriage return in create table from a file
  • HUE-4365 - No information surfaced when LOAD data from Create table from file fails
  • HUE-4375 - Horizontal scrollbar can be hidden under the first fixed column
  • HUE-4383 - Trashed queries are showing up in the list of saved queries
  • HUE-4406 - Fails to start if Hive/Impala Not Installed
  • HUE-4409 - Main right scrollbar does not scoll when on the very right of the screen
  • HUE-4411 - Enable scrolling past the end of the editor
  • HUE-4412 - Errors should scroll to the line AND the column too
  • HUE-4477 - Select All is not filtering out the non visible roles from the selection
  • HUE-4493 - Fix sync-workflow action when Workflow includes sub-workflow
  • HUE-4515 - Remove oozie.bundle.application.path from properties when rerunning workflow
  • HUE-4533 - Disable password reveal on IE
  • HUE-4537 - Fix database_logging in hue config so it logs debug database messages
  • HUE-4541 - fixing Hue job browser - Kerberos mutual authentication error in Hue
  • HUE-4564 - Log stderr on failure to coerce password from script
  • HUE-4616 - Only select the snippet DB when executing the first statement
  • HUE-4635 - Fix duration on jobs page for running jobs
  • HUE-4662 - fixing Hue - Wildcard Certificates not supported
  • HUE-4700 - Protect against setting XSS in old editor
  • HUE-4738 - Use Concurrency and Throttle values set in coordinator settings
  • HUE-4739 - fixed Jobbrowser tests which were failing after resource manager pool change
  • HUE-4766 - Replace illegal characters on CSV downloads
  • HUE-4781 - Fix export to hdfs to use download_cell_limit from beeswax.conf
  • HUE-4801 - When importing oozie documents and remapping UUIDs, data should be updated accordingly
  • HUE-4808 - Don't show the edit link for sub workflows when submitted outside Hue
  • IMPALA-1346 - /1590/2344: fix sorter buffer mgmt when spilling
  • IMPALA-3159 - impala-shell does not accept wildcard or SAN certificates
  • IMPALA-3344 - Simplify sorter and document/enforce invariants.
  • IMPALA-3441 - , IMPALA-3659: check for malformed Avro data
  • IMPALA-3499 - Split catalog update.
  • IMPALA-3628 - Fix cancellation from shell when security is enabled
  • IMPALA-3633 - cancel fragment if coordinator is gone
  • IMPALA-3646 - Handle corrupt RLE literal or repeat counts of 0.
  • IMPALA-3670 - fix sorter buffer mgmt bugs
  • IMPALA-3678 - Fix migration of predicates into union operands with an order by + limit.
  • IMPALA-3680 - Cleanup the scan range state after failed hdfs cache reads
  • IMPALA-3711 - Remove unnecessary privilege checks in getDbsMetadata().
  • IMPALA-3732 - handle string length overflow in avro files
  • IMPALA-3745 - parquet invalid data handling
  • IMPALA-3754 - fix TestParquet.test_corrupt_rle_counts flakiness
  • IMPALA-3772 - Fix admission control stress test.
  • IMPALA-3776 - fix 'describe formatted' for Avro tables
  • IMPALA-3820 - Handle linkage errors while loading Java UDFs in Catalog
  • IMPALA-3861 - Replace BetweenPredicates with their equivalent CompoundPredicate.
  • IMPALA-3915 - Register privilege and audit requests when analyzing resolved table refs.
  • IMPALA-3930 - Fix shuffle insert hint with constant partition exprs.
  • IMPALA-3940 - Fix getting column stats through views.
  • IMPALA-3965 - TSSLSocketWithWildcardSAN.py not exported as part of impala-shell build lib
  • IMPALA-4020 - Handle external conflicting changes to HMS gracefully
  • IMPALA-4049 - fix empty batch handling NLJ build side
  • OOZIE-2068 - Configuration as part of sharelib
  • OOZIE-2314 - Unable to kill old instance child job by workflow or coord rerun by Launcher
  • OOZIE-2329 - Make handling yarn restarts configurable
  • OOZIE-2345 - Parallel job submission for forked actions
  • OOZIE-2347 - AmendRemove unnecessary new Configuration()/new jobConf() calls from oozie
  • OOZIE-2347 - amendments patch toRemove unnecessary new Configuration()/new jobConf() calls from oozie
  • OOZIE-2347 - Remove unnecessary new Configuration()/new jobConf() calls from oozie
  • OOZIE-2436 - Fork/join workflow fails with oozie.action.yarn.tag must not be null
  • OOZIE-2504 - Create a log4j.properties under HADOOP_CONF_DIR in Shell Action
  • OOZIE-2533 - Patch-1550 - workaround for
  • OOZIE-2555 - Oozie SSL enable setup does not return port for admin -servers
  • OOZIE-2567 - HCat connection is not closed while getting hcat cred
  • OOZIE-2589 - CompletedActionXCommand is hardcoded to wrong priority
  • OOZIE-2649 - Can't override sub-workflow configuration property if defined in parent workflow XML
  • OOZIE-2656 - OozieShareLibCLI uses op system username instead of Kerberos to upload jars
  • PIG-3807 - Pig creates wrong schema after dereferencing nested tuple fields with sorts
  • SENTRY-1201 - Sentry ignores database prefix for MSCK statement
  • SENTRY-1311 - Improve usability of URI privileges by supporting mixed use of URIs with and without scheme
  • SENTRY-1320 - Queries of the form TRUNCATE TABLE db_name.table_name; no longer fail. The precondition checks allow two child nodes
  • SENTRY-1345 - Revert "ACLS on table folder disappear after insert for unpartitioned tables (Sravya Tirukkovalur, Reviewed by: Hao Hao and Anne Yu)"
  • SENTRY-1345 - ACLS on table folder disappear after insert for unpartitioned tables
  • SENTRY-1346 - add a test case into hdfs acl e2e suite to test a db.tbl wit out partition, can take more than certain number groups. (Anne Yu, reviewed by Haohao).
  • SOLR-6295 - Fix child filter query creation to never match parent docs in SolrExampleTests
  • SOLR-7280 - Missing test resources
  • SOLR-7280 - BackportLoad cores in sorted order and tweak coreLoadThread counts to improve cluster stability on restarts
  • SOLR-7866 - Harden code to prevent an unhandled NPE when trying to determine the max value of the version field.
  • SOLR-9091 - ZkController#publishAndWaitForDownStates logic is inefficient
  • SOLR-9236 - AutoAddReplicas will append an extra /tlog to the update log location on replica failover.
  • SPARK-8428 - Fix integer overflows in TimSort
  • SPARK-12339 - Added a null check that was removed in
  • SPARK-13242 - codegen fallback in case-when if there many branches
  • SPARK-14391 - Fix launcher communication test, take 2.
  • SPARK-14963 - Fix typo in YarnShuffleService recovery file name
  • SPARK-14963 - Using recoveryPath if NM recovery is enabled
  • SPARK-15165 - Introduce place holder for comments in generated code
  • SPARK-16106 - TaskSchedulerImpl should properly track executors added to existing hosts
  • SPARK-16505 - Optionally propagate error during shuffle service startup.
  • SQOOP-2561 - Special Character removal from Column name as avro data results in duplicate column and fails the import
  • SQOOP-2846 - Sqoop Export with update-key failing for avro data file
  • SQOOP-2906 - Optimization of AvroUtil.toAvroIdentifier
  • SQOOP-2920 - sqoop performance deteriorates significantly on wide datasets; sqoop 100% on cpu
  • SQOOP-2971 - OraOop does not close connections properly
  • SQOOP-2995 - Backward incompatibility introduced by Custom Tool options.
  • SQOOP-2999 - Sqoop ClassNotFoundException (org.apache.commons.lang3.StringUtils) is thrown when executing Oracle direct import map task

Issues Fixed in CDH 5.8.0

Apache Flume

Flume fully compatible with Kafka 2.x

In release CDH 5.8.0, Flume is fully compatible with Kafka 2.x, including support for security features.

Apache HBase

Premature EOF detected in a WAL During Replication

Bug: NoneDuring the parsing of a write-ahead log (WAL) during replication, an InvalidProtobufException can occur while reading the source RegionServer WAL, if EOF (end-of-file) is incorrectly detected before the actual end of the file. HBase stops reading the WAL after the EOF, and does not parse any bytes which occur after the EOF, causing data loss.

To work around this problem, Cloudera has patched HBase. HBase in CDH 5.8.0 and higher detect whether unparsed bytes exist after the EOF, and if so, the WAL is reset and re-read from the beginning, to attempt a clean read-through.

In testing, a single reset has been sufficient to work around observed data loss. However, the above change will retry a given WAL file indefinitely. On each attempt, a log message such as this will be emitted at the WARN level:
Processing end of WAL file '{}'. At position {}, which is too far away from
reported file length {}. Restarting WAL reading
Additional log detail are emitted at the TRACE level about file offsets seen while handling recoverable errors.

Batch Get after Batch Put Does Not Fetch All Cells

Bug: HBASE-15811

A batch Get after a batch Put could fail to fetch cells that were written by the Get, resulting in a "read-your-writes" failure. This bug was exacerbated by high load on the client.

Read Replica Failure For PUT Operation During Region Transition

Bug: None

When the patch for HBASE-10794 was applied in CDH 5.4.4, a new bug was introduced, where, if the primary RegionServer becomes unavailable (for any reason, even a graceful shutdown), while a client is performing PUTs on that region, subsequent PUTs will fail.

Latency Metrics Inaccurate for MultiGet Operations

Bug: HBASE-15673

Latency values are written after each row is processed. However, if MultiGet is enabled, some rows are not counted in the metrics. This causes the metrics for the 50th, 75th, and 90th percentiles to be reported as 0.

Inconsistent Behavior Among DeleteColumnFamilyProcedure, CreateTableProcedure, and ModifyTableProcedure

Bug: HBASE-15456

If there is only one family in the table, DeleteColumnFamilyProcedure will fail. When hbase.table.sanity.checks is set to false, the HMaster logs a warning, but CreateTableProcedure and ModifyTableProcedure will now fail, where before they logged a warning, but succeeded. This makes the behavior of all three methods consistent.

Failed hbase-spark Bulk Loads Leave Files Behind

Bug: HBASE-15271

When using the bulk load helper provided by the hbase-spark module, output files are now written into temporary files and only made available when the executor has successfully completed. Previously, failed executors would leave files behind, and these files would be picked up by subsequent bulk load commands, and spurious copies of some cells were written.

Apache Hive

HIVE-13217: Replication for HoS MapJoin small file needs to respect dfs.replication.max

HIVE-13039: BETWEEN predicate is not functioning correctly with predicate PUSHDOWN on Parquet table

HIVE-13065: Hive throws NullPointerException (NPE) when writing map type data to an HBase-backed table

HIVE-13160: HS2 unable to load UDFs on startup when HMS is not ready

HIVE-13243: Hive DROP TABLE on encryption zone fails for external tables

HIVE-13302: Direct SQL: CAST to DATE doesn't work on Oracle

HIVE-13115: MetaStore Direct SQL getPartitions() call fails when the columns schemas for a partition are NULL

HIVE-10303: HIVE-9471 broke forward compatibility of ORC files

HIVE-12706: Incorrect output from from_utc_timestamp() / to_utc_timestamp when local timezone has DST

HIVE-10685: ALTER TABLE concatenate operator will cause duplicate data

HIVE-13500: Launching big queries fails with OutOfMemoryException

HIVE-13527: Using deprecated APIs in HBase client causes ZooKeeper connection leaks.

HIVE-12517: Beeline's use of failed connection(s) causes failures and leaks.

HIVE-13632: Hive failing on INSERT empty array into parquet table

HIVE-13285: Orc concatenation may drop old files from moving to final path

HIVE-13836: DbNotifications giving an error = Invalid state. Transaction has already started

HIVE-9499: hive.limit.query.max.table.partition makes queries fail on non-partitioned tables

HIVE-13462: HiveResultSetMetaData.getPrecision() fails for NULL columns

HIVE-11408: HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils

HIVE-12481: Occasionally "Request is a replay" will be thrown from HS2

HIVE-10698: Query on view results fails with "table not found error" if view is created with subquery alias (CTE)

HIVE-12941: Unexpected result when using MIN() on struct with NULL in first field

HIVE-13200: Aggregation functions returning empty rows on partitioned columns

HIVE-11054: Read error : Partition Varchar column cannot be cast to string

HIVE-13401: Kerberized HS2 with LDAP auth enabled fails kerberos/delegation token authentication

HIVE-13217: Some queries with UNION all fail when CBO is off

HIVE-11369: MapJoins in HiveServer2 fail when jmxremote is used

HIVE-13261: Can not compute column stats for partition when schema evolves

With Sentry enabled, only Hive admin users have access to YARN job logs

As a prerequisite of enabling Sentry, Hive impersonation is turned off, which means all YARN jobs are submitted to the Hive job queue, and are run as the hive user. This is an issue because the YARN History Server now has to block users from accessing logs for their own jobs, since their own usernames are not associated with the jobs. As a result, end users cannot access any job logs unless they can get sudo access to the cluster as the hdfs, hive or other admin users.

In CDH 5.8 (and higher), Hive overrides the default configuration, mapred.job.queuename, and places incoming jobs into the connected user's job queue, even though the submitting user remains hive. Hive obtains the relevant queue/username information for each job by using YARN's fair-scheduler.xml file.

Hue

Cannot query the customers table in Hue

Bug: HUE-3040

To query the customers table, users must re-create the parquet data for compatibility.

Cloudera Distribution of Apache Kafka

CDH 5.7 is not compatible with Cloudera Distribution of Apache Kafka 1.x

Cloudera Distribution of Apache Kafka 1.x is compatible with CDH 5.4+.

Apache Oozie

PySpark does not work from the Oozie Spark Action

Bug: OOZIE-2482

The Spark Action would typically fail with a message like, "key not found: SPARK_HOME," but other error messages were possible. After the fix, the Spark Action has the necessary changes to successfully run PySpark jobs. See Oozie Spark Action Extension for more details and an example. Cloudera makes the PySpark dependencies available.

Apache Sentry

Security

Sentry does not check privileges on the URI used for the CREATE INDEX LOCATION '/path' command

Bug: SENTRY-1231

The CREATE INDEX LOCATION '/path' command would succeed even if a user did not have the required URI privileges for the /path.

Upgraded libthrift to version 0.9.3 due to a security vulnerability

For details on the security vulnerability in the Apache Thrift client libraries, see THRIFT-3231.

Hive Binding

INSERT OVERWRITE DIRECTORY command does not work correctly

Bug: SENTRY-922

The INSERT OVERWRITE DIRECTORY command would write table data into an HDFS directory (hdfs://path/), even if privileges are granted only for the local directory (file://path/).

INSERT INTO no longer requires URI privilege on partition locations

Bug: SENTRY-1095

The INSERT INTO Hive command adds location information to the partition description. Usually if location information is included, you must ensure that the user has privileges on the corresponding URI. However, in this case, since the partition locations are under the table directory and can be easily generated, these requirements have been relaxed.

Change default value of sentry.hive.server

Bug: SENTRY-1112

The default value for sentry.hive.server was changed from server1 to an empty string.

Sentry Service

Sentry's Oracle upgrade scripts fails with ORA-00955

Bug: SENTRY-1066

Sentry upgrade scripts for Oracle would fail with error, ORA-00955, because during the upgrade, the script inadvertently creates an index with the same name as the constraint being dropped. The script will now run DROP INDEX before it adds the constraint again and completes the schema upgrade successfully.

grantServerPrivilege() and revokeServerPrivilege() should treat '*' and 'ALL' as synonyms

Bug: SENTRY-1252

The grantServerPrivilege() and revokeServerPrivilege() methods should treat * and ALL as synonyms when an action is not explicitly specified. Previously, if grantServerPrivilege() was called without an action, and followed up with a revokeServerPrivilege() invocation with an action such as ALL, the server-level privilege would not be revoked. This fix only applies to privileges that are granted after upgrading to CDH 5.8.

Sentry Debugging

Error in Hive Metastore Plugin (renameAuthzObject) log messages

Bug: SENTRY-1169

The renameAuthzObject plugin prints log messages with old path names in place of new path names.

Apache ZooKeeper

Upgrade Netty Due to Security Vulnerabilities

Bug: ZOOKEEPER-2450

Netty was upgraded from version 3.2.2 to 3.10.5 to resolve security vulnerabilities.

Fix Privacy Violation in Login.java

Bug: ZOOKEEPER-2405

In Login.java, getTGT() was logging confidential information in DEBUG mode. After the fix, only principals are logged.