Fixed Issues in CDH 6.3.0

CDH 6.3.0 fixes the following issues:

Kudu Masters unable to join back after a restart

In a multi master Kudu environment, if a master is restarted or goes offline for a few minutes, it can occasionally have trouble joining the cluster on startup. For example, if this happens in case of three kudu masters, and one of the other two masters is stopped or dies during this time, then the overall Kudu cluster is down because the majority of the masters are not running.

This issue is resolved by the KUDU-2748 upstream JIRA.

Products affected: Apache Kudu

Affected version:
  • CDH 5.14.0, 5.14.2, 5.14.4
  • CDH 5.15.0, 5.15.1, 5.15.2
  • CDH 5.16.1, 5.16.2
  • CDH 6.0.0, 6.0.1
  • CDH 6.1.0, 6.1.1
  • CDH 6.2.0, 6.2.1
Fixed version:
  • CDH 6.3.0

For the latest update on this issue see the corresponding Knowledge article:TSB 2020-442: Kudu Masters unable to join back after a restart

Spark’s stage retry logic could result in duplicate data

Apache Spark’s retry logic may allow tasks from both a failed output stage attempt and a successful retry attempt to commit output for the same partition.

Products affected: CDS Powered By Apache Spark

Affected versions:
  • CDS 2.1.0 release 1 and release 2
  • CDS 2.2.0 release 1 and release 2
  • CDS 2.3.0 release 2
Fixed versions:
  • CDH 6.2.0, 6.3.0
  • CDS 2.1.0 release 3
  • CDS 2.2.0 release 3
  • CDS 2.3.0 release 3
For the latest update on this issue see the corresponding Knowledge article: TSB 2019-337-1: Spark’s stage retry logic could result in duplicate data

Spark’s stage retry logic could result in missing data

Apache Spark’s retry logic may allow a task from a failed stage attempt to clean up data from its corresponding task in a successful stage retry attempt..

Products affected: CDS Powered By Apache Spark

Affected versions:
  • CDS 2.2.0 release 1, release 2
  • CDS 2.3.0 release 1, release 2
Fixed versions:
  • CDH 6.2.0, 6.3.0
  • CDS 2.2.0 release 3
  • CDS 2.3.0 release 3
For the latest update on this issue see the corresponding Knowledge article: TSB 2019-337-2: Spark’s stage retry logic could result in missing data

Shuffle+Repartition on a DataFrame could lead to incorrect answers

When a repartition follows a shuffle, the assignment of rows to partitions is nondeterministic. If Spark has to recompute a partition, for example, due to an executor failure, the retry can consume a different set of input rows than the original computation. As a result, some rows can be dropped, and others can be duplicated.

Products affected: CDS Powered By Apache Spark

Affected versions:
  • CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1
  • CDS 2.1.0 release 1, release 2
  • CDS 2.2.0 release 1, release 2
Fixed versions:
  • CDH 6.2.0, 6.3.0
  • CDS 2.1.0 release 3
  • CDS 2.2.0 release 3
  • CDS 2.3.0 release 3
For the latest update on this issue see the corresponding Knowledge article: TSB 2019-337-3: Shuffle+Repartition on a DataFrame could lead to incorrect answers

Shuffle+Repartition on an RDD could lead to incorrect answers

When a repartition follows a shuffle, the assignment of records to partitions is nondeterministic. If Spark has to recompute a partition, for example, due to an executor failure, the retry can consume a different set of input records than the original computation. As a result, some records can be dropped, and others can be duplicated.

Products affected: CDS Powered By Apache Spark

Affected versions:
  • CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1
  • CDS 2.1.0 release 1, release 2, release 3
  • CDS 2.2.0 release 1, release 2, release 3
  • CDS 2.3.0 release 1, release 2, release 3
Fixed versions:
  • CDH 6.2.0, 6.3.0
  • CDS 2.1.0 release 4
  • CDS 2.2.0 release 4
  • CDS 2.3.0 release 4
For the latest update on this issue see the corresponding Knowledge article: TSB 2019-337-4: Shuffle+Repartition on an RDD could lead to incorrect answers

Kafka Broker Java configuration options in Cloudera Manager 6.2.0 are not applied to the broker JVM process

Cloudera Manager allows the configuration of JVM option for Kafka brokers via the Additional Broker Java Options (broker_java_opts) service parameter. In Cloudera Manager 6.2.0, when managing CDH 6.2.0 clusters, ‘broker_java_opts’ are ignored when starting the Kafka broker process, resulting in using default JVM configuration options. This can lead to the following problems (depending on other environment variables):

  • Kafka broker process does not use the recommended garbage collector settings leading to poor performance and increased resource (heap memory) utilization.
  • Kafka broker process allows remote connection to JMX interface making the process vulnerable to remote code execution on the broker nodes.

Products affected: Apache Kafka

Affected version:
  • CDH 6.2.0
  • Cloudera Manager 6.2.0
Fixed version:
  • CDH 6.2.1, 6.3.0

For the latest update on this issue see the corresponding Knowledge article:TSB 2019-377: Kafka Broker Java configuration options in Cloudera Manager 6.2.0 are not applied to the broker JVM process Labels:

XSS Cloudera Manager

Malicious Impala queries can result in Cross Site Scripting (XSS) when viewed in Cloudera Manager.

Products affected: Apache Impala

Releases affected:
  • Cloudera Manager 5.13.x, 5.14.x, 5.15.1, 5.15.2, 5.16.1
  • Cloudera Manager 6.0.0, 6.0.1, 6.1.0

Users affected: All Cloudera Manager Users

Date/time of detection: November 2018

Severity (Low/Medium/High): High

Impact: When a malicious user generates a piece of JavaScript in the impala-shell and then goes to the Queries tab of the Impala service in Cloudera Manager, that piece of JavaScript code gets evaluated, resulting in an XSS.

CVE: CVE-2019-14449

Immediate action required: There is no workaround, upgrade to the latest available maintenance release.

Addressed in release/refresh/patch:
  • Cloudera Manager 5.16.2
  • Cloudera Manager 6.0.2, 6.1.1, 6.2.0, 6.3.0

Oozie database upgrade fails when PostgreSQL version 9.6 or higher is used

Oozie database upgrade fails when PostgreSQL version 9.6 or higher is used due to a sys table change in PostgreSQL from version 9.5 to 9.6. The failure only happens if Oozie uses a JDBC driver earlier than 9.4.1209.

Workaround:
  1. After the parcels of the new version are distributed, replace the PostgreSQL JDBC driver with a newer one (version 9.4.1209 or higher) in the new parcel, at the following locations:
    • /opt/cloudera/parcels/${newparcel.version}/lib/oozie/lib/
    • /opt/cloudera/parcels/${newparcel.version}/lib/oozie/libtools/
  2. Perform the upgrade.
If your cluster is installed from packages, you must change the drivers at the following locations:
  • /usr/lib/oozie/libtools/
  • /usr/lib/oozie/lib/

You can download the driver from the PostgreSQL JDBC driver homepage.

Affected Versions: CDH 6.0.0 and higher

Fixed Version: CDH 6.2.1 and higher

Cloudera Issue: CDH-75951

Error when executing Java classes from a CDH cluster running on Ubuntu 18

Using the hadoop command-line interface for executing Java classes that are not in the default package results in error messages similar to the following:
#hadoop org.apache.hadoop.conf.Configuration
/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.914039/bin/../lib/hadoop/libexec//hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.HADOOP.CONF.CONFIGURATION_USER: bad substitution
/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.x.p0.914039/bin/../lib/hadoop/libexec//hadoop-functions.sh: line 2331: HADOOP_ORG.APACHE.HADOOP.CONF.CONFIGURATION_USER: bad substitution
/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.x.p0.914039/bin/../lib/hadoop/libexec//hadoop-functions.sh: line 2426: HADOOP_ORG.APACHE.HADOOP.CONF.CONFIGURATION_OPTS: bad substitution

This issue occurs only in CDH 6.2 clusters running on Ubuntu 18 and the error messages can be safely ignored.

Workaround: Run the java command directly using hadoop classpath to get the classpath. For example, instead of hadoop org.apache.hadoop.conf.Configuration, you can run java -cp `hadoop classpath` org.apache.hadoop.conf.Configuration.

Affected Versions: CDH 6.2.0

Fixed Versions: CDH 6.2.1

Apache Issue: HADOOP-16167

Connections with Expired Delegation Tokens Remain Active

Connections with expired delegation tokens stay alive even if the token expires. The connection will only terminate if the client disconnects. Once the client is disconnected it will not be able to reconnect with the expired token.

Workaround: N/A

Affected Versions: CDH 6.2.x

Fixed Versions: CDH 6.3.0 and higher

Apache Issue: KAFKA-7352

Cloudera Issue: N/A

Upstream Issues Fixed

The following upstream issues are fixed in CDH 6.3.0:

Apache Accumulo

There are no notable fixed issues in this release.

Apache Avro

The following issue is fixed in CDH 6.3.0:

Apache Crunch

There are no notable fixed issues in this release.

Apache Flume

There are no notable fixed issues in this release.

Apache Hadoop

The following issues are fixed in CDH 6.3.0:

  • HADOOP-10848 - Cleanup calling of sun.security.krb5.Config.
  • HADOOP-12760 - sun.misc.Cleaner has moved to a new location in OpenJDK 9
  • HADOOP-14445 - Addendum: Use DelegationTokenIssuer to create KMS delegation tokens that can authenticate to all KMS instances.
  • HADOOP-15775 - [JDK9] Add missing javax.activation-api dependency.
  • HADOOP-15783 - [JDK10] TestSFTPFileSystem.testGetModifyTime fails.
  • HADOOP-15861 - Move DelegationTokenIssuer to the right path.
  • HADOOP-15873 - Add JavaBeans Activation Framework API to LICENSE.txt.
  • HADOOP-15997 - KMS client always authenticates itself using the credentials from login user, rather than current user.
  • HADOOP-16011 - OsSecureRandom very slow compared to other SecureRandom implementations.
  • HADOOP-16016 - TestSSLFactory#testServerWeakCiphers fails on Java 1.8.0_191 or later.
  • HADOOP-16109 - Parquet reading S3AFileSystem causes an EOF exception.
  • HADOOP-16199 - KMSLoadBlanceClientProvider does not select token correctly.
  • HADOOP-16289 - Allow extra jsvc startup option in hadoop_start_secure_daemon in hadoop-functions.sh.

HDFS

The following issues are fixed in CDH 6.3.0:

  • HDFS-3246 - pRead equivalent for direct read path.
  • HDFS-7663 - Erasure Coding: Append on striped file.
  • HDFS-10477 - Stopping the decommission of a rack of DataNodes causes the NameNode failover to standby.
  • HDFS-12781 - After stopping a DataNode, the DataNode tab in the NameNode UI displays a warning message.
  • HDFS-12818 - Support a multiple storage configuration in DataNodeCluster / SimulatedFSDataset.
  • HDFS-13231 - Extend visualization for decommissioning and maintenance mode under the DataNode tab in the NameNode UI .
  • HDFS-13677 - Dynamic refresh of disk configuration results in overwriting the VolumeMap.
  • HDFS-14046 - In-Maintenance icon is missing on the DataNode information page .
  • HDFS-14101 - Random failure of testListCorruptFilesCorruptedBlock.
  • HDFS-14111 - hdfsOpenFile on HDFS causes unnecessary IO from file offset 0.
  • HDFS-14132 - Add BlockLocation.isStriped() to determine if block is replicated or striped.
  • HDFS-14242 - OIV WebImageViewer: NPE when param op is not specified.
  • HDFS-14285 - libhdfs hdfsRead copies entire array even if its only partially filled.
  • HDFS-14314 - fullBlockReportLeaseId should be reset after registering to the NameNode.
  • HDFS-14333 - Datanode fails to start if any disk has errors during NameNode registration.
  • HDFS-14348 - Fix JNI exception handling issues in libhdfs.
  • HDFS-14359 - Inherited ACL permissions masked when parent directory does not exist.
  • HDFS-14389 - getAclStatus returns incorrect permissions and owner when an iNodeAttributeProvider is configured.

MapReduce 2

The following issue is fixed in CDH 6.3.0:

  • MAPREDUCE-7190 - Add SleepJob additional parameter to make parallel runs distinguishable

YARN

The following issues are fixed in CDH 6.3.0:

  • YARN-9118 - Handle exceptions with parsing user defined GPU devices in GpuDiscoverer
  • YARN-9552 - FairScheduler: NODE_UPDATE can cause NoSuchElementException

Apache HBase

The following issues are fixed in CDH 6.3.0:

  • HBASE-18484 - VerifyRep by snapshot does not work when Yarn/SourceHBase/PeerHBase located in three different HDFS clusters
  • HBASE-19008 - Add missing equals or hashCode method(s) to stock Filter implementations
  • HBASE-20586 - add support for clusters on different realms
  • HBASE-20662 - Increasing space quota on a violated table does not remove SpaceViolationPolicy.DISABLE enforcement
  • HBASE-20851 - Change rubocop config for max line length of 100
  • HBASE-21201 - Support to run VerifyReplication MR tool without peerid
  • HBASE-21225 - Having RPC amp Space quota on a table/Namespace doesn't allow space quota to be removed using 'NONE'
  • HBASE-21371 - Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
  • HBASE-21402 - parent "HBASE-21325 Force to terminate regionserver when abort hang in somewhere"
  • HBASE-21475 - Put mutation (having TTL set) added via co-processor is retrieved even after TTL expires
  • HBASE-21535 - Zombie Master detector is not working
  • HBASE-21634 - Print error message when user uses unacceptable values for LIMIT while setting quotas.
  • HBASE-21636 - Enhance the shell scan command to support missing scanner specifications like ReadType, IsolationLevel etc.
  • HBASE-21644 - Modify table procedure runs infinitely for a table having region replication > 1
  • HBASE-21658 - Should get the meta replica number from zk instead of config at client side
  • HBASE-21684 - Throw DNRIOE when connection or rpc client is closed
  • HBASE-21688 - Address WAL filesystem issues
  • HBASE-21699 - Fixed create table failed when using SPLITS_FILE => 'splits.txt'
  • HBASE-21715 - set timeout instead of throwing Exception when calling ProcedureFuture.get in client side.
  • HBASE-21736 - Remove the server from online servers before scheduling SCP for it in hbck
  • HBASE-21749 - RS UI may throw NPE and make rs-status page inaccessible with multiwal and replication
  • HBASE-21754 - ReportRegionStateTransitionRequest should be executed in priority executor
  • HBASE-21764 - Size of in-memory compaction thread pool shoud be configurable
  • HBASE-21775 - The BufferedMutator doesn't ever refresh region location cache
  • HBASE-21781 - list_deadservers elapsed time is incorrect
  • HBASE-21795 - Client application may get stuck (time bound) if a table modify op is called immediately after split op
  • HBASE-21800 - RegionServer aborted due to NPE from MetaTableMetrics coprocessor
  • HBASE-21815 - Make isTrackingMetrics and getMetrics of ScannerContext public
  • HBASE-21816 - Print source cluster replication config directory
  • HBASE-21828 - Make sure we do not return CompletionException when locating region
  • HBASE-21829 - Use FutureUtils.addListener instead of calling whenComplete directly
  • HBASE-21832 - parent "HBASE-21595 Print thread's information and stack traces when RS is aborting forcibly" to branch-2.0/2.1
  • HBASE-21843 - RegionGroupingProvider breaks the meta wal file name pattern which may cause data loss for meta region
  • HBASE-21857 - Do not need to check clusterKey if replicationEndpoint is provided when adding a peer
  • HBASE-21867 - Support multi-threads in HFileArchiver
  • HBASE-21871 - Added support to specify a peer table name in VerifyReplication tool
  • HBASE-21884 - avoid autoboxing in ugi ref counting for secure bulk load
  • HBASE-21890 - Use execute instead of submit to submit a task in RemoteProcedureDispatcher
  • HBASE-21899 - Fix missing variables for slf4j Logger
  • HBASE-21900 - Infinite loop in AsyncMetaRegionLocator if we can not get the location for meta
  • HBASE-21906 - the CallQueueTooBigException related changes in HBASE-21875 to branch-2.1/branch-2.0
  • HBASE-21910 - The nonce implementation is wrong for AsyncTable
  • HBASE-21926 - Profiler servlet
  • HBASE-21927 - Always fail the locate request when error occur
  • HBASE-21930 - Deal with ScannerResetException when opening region scanner
  • HBASE-21932 - Use Runtime.getRuntime().halt to terminate regionserver when abort timeout
  • HBASE-21934 - RemoteProcedureDispatcher should track the ongoing dispatched calls
  • HBASE-21960 - RESTServletContainer not configured for REST Jetty server
  • HBASE-21961 - Infinite loop in AsyncNonMetaRegionLocator if there is only one region and we tried to locate before a non empty row
  • HBASE-21976 - Deal with RetryImmediatelyException for batching request
  • HBASE-21978 - Should close AsyncRegistry if we fail to get cluster id when creating AsyncConnection
  • HBASE-21983 - Should track the scan metrics in AsyncScanSingleRegionRpcRetryingCaller if scan metrics is enabled
  • HBASE-21991 - Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements
  • HBASE-22032 - KeyValue validation should check for null byte array
  • HBASE-22042 - Missing @Override annotation for RawAsyncTableImpl.scan
  • HBASE-22045 - Mutable range histogram reports incorrect outliers
  • HBASE-22047 - LeaseException in Scan should be retried
  • HBASE-22054 - Space Quota: Compaction is not working for super user in case of NO_WRITES_COMPACTIONS
  • HBASE-22070 - Checking restoreDir in RestoreSnapshotHelper
  • HBASE-22072 - High read/write intensive regions may cause long crash
  • HBASE-22073 - /rits.jsp throws an exception if no procedure
  • HBASE-22086 - Space Quota issue: Deleting snapshot doesn't update the usage of table
  • HBASE-22094 - Throw TableNotFoundException if table not exists in AsyncAdmin.compact
  • HBASE-22097 - Modify the description of split command in shell
  • HBASE-22098 - HBASE-18667 "Disable error-prone for hbase-protocol-shaded" to branch-2
  • HBASE-22099 - HBASE-21895 "Error prone upgrade" to branch-2
  • HBASE-22100 - False positive for error prone warnings in pre commit job
  • HBASE-22101 - AsyncAdmin.isTableAvailable should not throw TableNotFoundException
  • HBASE-22123 - REST gateway reports Insufficient permissions exceptions as 404 Not Found
  • HBASE-22128 - Move namespace region then master crashed make deadlock
  • HBASE-22135 - AsyncAdmin will not refresh master address
  • HBASE-22144 - Correct MultiRowRangeFilter to work with reverse scans
  • HBASE-22177 - Do not recreate IOException in RawAsyncHBaseAdmin.adminCall
  • HBASE-22179 - Fix RawAsyncHBaseAdmin.getCompactionState
  • HBASE-22185 - RAMQueueEntry#writeToCache should freeBlock if any exception encountered instead of the IOException catch block
  • HBASE-22189 - Removed remaining usage of StoreFile.getModificationTimeStamp
  • HBASE-22190 - SnapshotFileCache may fail to load the correct snapshot file list when there is an on-going snapshot operation
  • HBASE-22200 - WALSplitter.hasRecoveredEdits should use same FS instance from WAL region dir
  • HBASE-22225 - Profiler tab on Master/RS UI not working w/o comprehensive message
  • HBASE-22230 - REST Server drops connection on long scan
  • HBASE-22235 - OperationStatus.{SUCCESS|FAILURE|NOT_RUN} are not visible to 3rd party coprocessors
  • HBASE-22236 - AsyncNonMetaRegionLocator should not cache HRegionLocation with null location
  • HBASE-22249 - Rest Server throws NoClassDefFoundError with Java 11
  • HBASE-22274 - Cell size limit check on append considers cell's previous size
  • HBASE-22278 - RawAsyncHBaseAdmin should not use cached region location
  • HBASE-22282 - Should deal with error in the callback of RawAsyncHBaseAdmin.splitRegion methods
  • HBASE-22291 - Fix recovery of recovered.edits files under root dir
  • HBASE-22292 - PreemptiveFastFailInterceptor clean repeatedFailuresMap issue
  • HBASE-22324 - loss a mass of data when the sequenceId of cells greater than Integer.Max
  • HBASE-22325 - AsyncRpcRetryingCaller will not schedule retry if we hit a NotServingRegionException but there is no TableName provided
  • HBASE-22354 - master never sets abortRequested, and thus abort timeout doesn't work for it
  • HBASE-22375 - Promote AccessChecker to LimitedPrivate
  • HBASE-22378 - HBase Canary fails with TableNotFoundException when table deleted during Canary run
  • HBASE-22581 - user with "CREATE" permission can grant, but not revoke permissions on created table

Apache Hive

The following issues are fixed in CDH 6.3.0:

  • HIVE-13278 - Avoid FileNotFoundException when map/reduce.xml is not available
  • HIVE-14229 - The jars in hive.aux.jar.paths are not added to session classpath
  • HIVE-15397 - Metadata-only queries may return incorrect results with empty tables
  • HIVE-21363 - Ldap auth issue: group filter match should be case insensitive
  • HIVE-21484 - Metastore API getVersion() should return real version
  • HIVE-21526 - JSONDropDatabaseMessage needs to have the full database object

Hue

The following issues are fixed in CDH 6.3.0:

  • HUE-7712 - [spark] Livy-batch not available in HUE 4.1.
  • HUE-8727 - [frontend] Chrome browser autofills the user name in the top search and in the left assist filter.
  • HUE-8745 - [editor] Support AWS Athena using JDBC Driver.
  • HUE-8747 - [editor] Download query result as a task.
  • HUE-8782 - [hbase] Support Python 3 in Thrift bindings.
  • HUE-8805 - [core] Add basic Query Analytics reporting.
  • HUE-8813 - [hbase] HBase examples are not installed on secure cluster
  • HUE-8814 - [backend] Allow OIDC username attribute to be customizable
  • HUE-8816 - [notebook] Support parsing columns with varchar type.
  • HUE-8817 - [core] Improve get_ordered_interpreters performance
  • HUE-8826 - [frontend] Can't close log block on services page.
  • HUE-8827 - [docs] Update presto website links.
  • HUE-8828 - [editor] Fix notebook user's searching not displaying.
  • HUE-8830 - [search] Fix js exception from right assist in the dashboard.
  • HUE-8831 - [search] Support all SQL dialects in the dashboard autocomplete.
  • HUE-8832 - [spark] Support SparkSql in Livy.
  • HUE-8833 - [editor] Error - hidden popup menu in the presentation section.
  • HUE-8834 - [docker] Simplify the Hue server container.
  • HUE-8836 - [core] request.get_host() is broken when HTTP_X_FORWARDED_HOST contains multiple hosts.
  • HUE-8840 - [catalog] Fix import to non-Hive tables.
  • HUE-8841 - [metadata] Add read-only mode for SQL catalog metadata.
  • HUE-8860 - [beeswax] Truncate column size to 5000 if too large.
  • HUE-8864 - [search] Loading a dashboard fails to show the proper layout.
  • HUE-8867 - [metastore] Expanding columns of a table in left assist fails.
  • HUE-8869 - [frontend] Improve the editor icon.
  • HUE-8870 - [frontend] Charting sometimes throws an 'UncaughtReferenceError.'
  • HUE-8871 - [frontend] Search with "tag" facet should work with Navigator.
  • HUE-8872 - [editor] Result column count is off by one when no filter is present.
  • HUE-8873 - [jobbrowser] Auto refresh deselects your selection for rerun workflows and schedulers if a job is running.
  • HUE-8874 - [security] Privilege checker cannot be cached.
  • HUE-8875 - [indexer] '/hue/indexer/indexes' is not found.
  • HUE-8876 - [core] Fix the redirect for is_embeddable when 401 is returned.
  • HUE-8878 - [oozie] Fix Hive Document Action variable with pre-filled value.
  • HUE-8880 - [oozie] Fix KeyError for execute coordinator.
  • HUE-8881 - [search] Solr examples cannot be loaded.
  • HUE-8883 - [docs] Update the requirements and headers, and troubleshoot for MacOS.
  • HUE-8884 - [editor] When executing multiple statements quickly, errors are shown to the user.
  • HUE-8885 - [frontend] Downgrade knockout to 3.4.2.
  • HUE-8886 - [importer] Changing the "Has Header" checkbox should refresh the importer preview.

Apache Impala

The following issues are fixed in CDH 6.3.0:

  • IMPALA-8322 - Confined the impact of slowly completing I/O requests to the issuing query.
  • IMPALA-8444 - Fixed a performance regression when building privilege name in an environment secured by Sentry using a large number of privileges per role.
  • IMPALA-7800 - Impala now times out new connections after it reaches the maximum number of concurrent client connections. The limit is specified by the --fe_service_threads startup flag. The default value is 64 with which 64 queries can run simultaneously. Previously the connection attempts that could not be serviced were hanging infinitely.
  • IMPALA-8283 - Fixed the issue where the order of Kudu PRIMARY KEYs can be silently ignored when a Kudu-based table was copied with a changed primary key definition.
  • IMPALA-8177 - Fixed log DDL failures in coordinator logs.

Apache Kafka

The following issues are fixed in CDH 6.3.0:

  • KAFKA-4217 - Add KStream.flatTransform
  • KAFKA-4453 - Added code to separate controller connections and requests from the data plane
  • KAFKA-4850 - Enable bloomfilters
  • KAFKA-5117 - Stop resolving externalized configs in Connect REST API
  • KAFKA-5692 - Change PreferredReplicaLeaderElectionCommand to use Admin...
  • KAFKA-5994 - Log ClusterAuthorizationException for all ClusterAction requests
  • KAFKA-6627 - Prevent config default values overriding ones specified through --producer-property on command line.
  • KAFKA-6789 - Handle retriable group errors in AdminClient API
  • KAFKA-6833 - Producer should await metadata for unknown partitions
  • KAFKA-7024 - Rocksdb state directory should be created before opening the DB
  • KAFKA-7027 - Add an overload build method in scala
  • KAFKA-7051 - Improve the efficiency of ReplicaManager
  • KAFKA-7253 - The returned connector type is always null when creating connector
  • KAFKA-7352 - KIP-368: Allow SASL Connections to Periodically Re-Authenticate
  • KAFKA-7391 - Introduce close(Duration) to Producer and AdminClient instead of close(long, TimeUnit)
  • KAFKA-7433 - Introduce broker options in TopicCommand to use AdminClient
  • KAFKA-7503 - MINOR: Start Connect REST server in standalone mode to match distributed mode
  • KAFKA-7601 - Clear leader epoch cache on downgraded format in append
  • KAFKA-7609 - Add Protocol Generator for Kafka
  • KAFKA-7633 - Allow Kafka Connect to access internal topics without cluster ACLs
  • KAFKA-7641 - Introduce "group.max.size" config to limit group sizes
  • KAFKA-7652 - Part I; Fix SessionStore's findSession(single-key)
  • KAFKA-7652 - Part III; Put to underlying before Flush
  • KAFKA-7672 - The local state not fully restored after KafkaStream rebalanced, resulting in data loss
  • KAFKA-7692 - Fix ProducerStateManager SequenceNumber overflow
  • KAFKA-7693 - Fix SequenceNumber overflow in producer
  • KAFKA-7719 - Improve fairness in SocketServer processors (KIP-402)
  • KAFKA-7738 - Track leader epochs in client Metadata
  • KAFKA-7741 - Streams exclude javax dependency
  • KAFKA-7755 - Look up client host name since DNS entry may have changed
  • KAFKA-7758 - Reuse KGroupedStream/KGroupedTable with named repartition topics
  • KAFKA-7781 - Add validation check for retention.ms topic property.
  • KAFKA-7786 - Ignore OffsetsForLeaderEpoch response if epoch changed while request in flight
  • KAFKA-7789 - Fixby increasing the key size for the RSA keys generated for
  • KAFKA-7790 - Fix Bugs in Trogdor Task Expiration
  • KAFKA-7792 - Add simple /agent/uptime and /coordinator/uptime health check endpoints
  • KAFKA-7793 - Improve the Trogdor command line.
  • KAFKA-7798 - Expose embedded clientIds
  • KAFKA-7808 - AdminClient#describeTopics should not throw InvalidTopic if topic name is not found
  • KAFKA-7824 - Require member.id for initial join group request [KIP-394]
  • KAFKA-7837 - Ensure offline partitions are picked up as soon as possible when shrinking ISR
  • KAFKA-7838 - Log leader and follower end offsets when shrinking ISR
  • KAFKA-7844 - Use regular subproject for generator to fix *All targets
  • KAFKA-7855 - Kafka Streams Maven Archetype quickstart fails to compile out of the box
  • KAFKA-7859 - Use automatic RPC generation in LeaveGroups
  • KAFKA-7866 - Ensure no duplicate offsets after txn index append failure
  • KAFKA-7873 - Always seek to beginning in KafkaBasedLog
  • KAFKA-7890 - Invalidate ClusterConnectionState cache for a broker if the hostname of the broker changes.
  • KAFKA-7895 - Ktable supress operator emitting more than one record for the same key per window
  • KAFKA-7897 - Disable leader epoch cache when older message formats are used
  • KAFKA-7902 - Replace original loginContext if SASL/OAUTHBEARER refresh login fails
  • KAFKA-7909 - Ensure timely rebalance completion after pending members rejoin or fail
  • KAFKA-7915 - Don't return sensitive authentication errors to clients
  • KAFKA-7916 - Unify store wrapping code for clarity
  • KAFKA-7920 - Do not permit zstd produce requests until IBP is updated to 2.1
  • KAFKA-7935 - UNSUPPORTED_COMPRESSION_TYPE if ReplicaManager.getLogConfig returns None
  • KAFKA-7945 - Calc refresh time correctly when token created in the past
  • KAFKA-7974 - Fix forAvoid zombie AdminClient when node host isn't resolvable
  • KAFKA-7979 - Clean up threads and increase timeout in PartitionTest
  • KAFKA-8002 - Log dir reassignment stalls if future replica has different segment base offset
  • KAFKA-8011 - Fix for race condition causing concurrent modification exception
  • KAFKA-8012 - Ensure partitionStates have not been removed before truncating.
  • KAFKA-8014 - Extend Connect integration tests to add and remove workers dynamically
  • KAFKA-8040 - Streams handle initTransactions timeout
  • KAFKA-8058 - Fix ConnectClusterStateImpl.connectors() method
  • KAFKA-8061 - Handle concurrent ProducerId reset and call to Sender thread shutdown
  • KAFKA-8062 - Do not remore StateListener when shutting down stream thread
  • KAFKA-8065 - restore original input record timestamp in forward()
  • KAFKA-8066 - Always close the sensors in Selector.close()
  • KAFKA-8069 - Fix early expiration of offsets due to invalid loading of expire timestamp
  • KAFKA-8121 - Shutdown ZK client expiry handler earlier during close
  • KAFKA-8134 - `linger.ms` must be a long
  • KAFKA-8142 - Fix NPE for nulls in Headers
  • KAFKA-8150 - Fix bugs in handling null arrays in generated RPC code
  • KAFKA-8157 - fix the incorrect usage of segment.index.bytes (2.2)
  • KAFKA-8190 - Don't update keystore modification time during validation
  • KAFKA-8204 - fix Streams store flush order
  • KAFKA-8229 - Reset WorkerSinkTask offset commit interval after task commit
  • KAFKA-8240 - Fix NPE in Source.equals()
  • KAFKA-8241 - Handle configs without truststore for broker keystore update
  • KAFKA-8248 - Ensure time updated before sending transactional request
  • KAFKA-8254 - Pass Changelog as Topic in Suppress Serdes
  • KAFKA-8277 - Fix NPEs in several methods of ConnectHeaders
  • KAFKA-8289 - Fix Session Expiration and Suppression (#6654)
  • KAFKA-8290 - Close producer for zombie task
  • KAFKA-8298 - Fix possible concurrent modification exception
  • KAFKA-8304 - Fix registration of Connect REST extensions
  • KAFKA-8306 - Initialize log end offset accurately when start offset is non-zero
  • KAFKA-8320 - fix retriable exception package for source connectors
  • KAFKA-8323 - Close RocksDBStore's BloomFilter
  • KAFKA-8335 - Clean empty batches when sequence numbers are reused
  • KAFKA-8347 - Choose next record to process by timestamp
  • KAFKA-8348 - Fix KafkaStreams JavaDocs
  • KAFKA-8351 - Cleaner should handle transactions spanning multiple segments
  • KAFKA-8363 - Fix parsing bug for config providers

Apache Kite

The following issue is fixed in CDH 6.3.0:

  • KITE-1185 - Make root temp directory path configurable in HiveAbstractDatasetRepository

Apache Kudu

The following issues are fixed in CDH 6.3.0:

  • KUDU-1868 - The Java client no longer fails when the scans take a very long time to return a single block of rows, such as highly selective scans over a large amount of data.
  • The SERVICE_UNAVAILABLE errors that caused the Java client to do unnecessary master lookups are handled gracefully.
  • The Kudu scan tokens now work correctly when the target table is renamed between the time when the scan token is created and when it is rehydrated into a scanner.
  • Kudu’s “NTP synchronization wait” behavior at startup now works as expected when Kudu is run in a containerized environment.
  • KUDU-2807 - The system doesn’t crash when a flush or a compaction overlaps with another compaction.
  • KUDU-2748 - Fixed a rare race at startup where the leader master would fruitlessly try to tablet copy to a healthy follower master, causing the cluster to operate as if it had two masters until the master leadership changed.
  • KUDU-2706 - Kudu does not crash in libkrb5 when negotiating multiple TLS connections concurrently.
  • KUDU-2721 - Kudu no longer crashes at startup on machines with disabled CPUs.

Apache Oozie

The following issues are fixed in CDH 6.3.0:

  • OOZIE-3312 - Add support for HSTS.
  • OOZIE-3365 - Workflow and coordinator action status remains RUNNING after rerun.
  • OOZIE-3409 - Oozie Server : Memory leak in EL evaluation.
  • Oozie-3463 - Migrate from com.google.common.base.Charsets to java.nio.charset.StandardCharsets.
  • Oozie-3466 - Migrate from com.google.common.io.Closeables to org.apache.commons.io.IOUtils.
  • Oozie-3467 - Migrate from com.google.common.base.Stopwatch.
  • OOZIE-3478 - Oozie needs execute permission on the submitting users home directory.

Apache Parquet

The following issues are fixed in CDH 6.3.0:

  • PARQUET-1143 - Update to Parquet format 2.4.0. Contains Zstandard codec support.
  • PARQUET-1585 - Update old external links in the code base

Apache Pig

The following issue is fixed in CDH 6.3.0:

Cloudera Search

There are no notable fixed issues in this release.

Apache Sentry

The following issues are fixed in CDH 6.3.0:

  • SENTRY-2440 - Add a new thrift API for checking if a user is in admin group
  • SENTRY-2471 - Table rename should sync Sentry privilege even without location information
  • SENTRY-2511 - Debug level logging on HMSPaths significantly affects performance
  • SENTRY-2522 - Add a new thrift API for getting all privileges a user has for a given set of authorizable
  • SENTRY-2523 - Fix response of list_sentry_privileges_by_authorizable_and_user API

Apache Spark

The following issues are fixed in CDH 6.3.0:

  • SPARK-13704 - [CORE][YARN] Reduce rack resolution time
  • SPARK-24421 - [BUILD][CORE] Accessing sun.misc.Cleaner in JDK11
  • SPARK-24421 - [CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set
  • SPARK-25429 - [SQL] Use Set instead of Array to improve lookup performance
  • SPARK-25946 - [BUILD] Upgrade ASM to 7.x to support JDK11
  • SPARK-25984 - [CORE][SQL][STREAMING] Remove deprecated .newInstance(), primitive box class constructor calls
  • SPARK-26003 - Improve SQLAppStatusListener.aggregateMetrics performance
  • SPARK-26089 - [CORE] Handle corruption in large shuffle blocks
  • SPARK-26188 - [SQL] FileIndex: don't infer data types of partition columns if user specifies schema
  • SPARK-26349 - [PYSPARK] Forbid insecure py4j gateways
  • SPARK-26430 - [BUILD][TEST-MAVEN] Upgrade Surefire plugin to 3.0.0-M2
  • SPARK-26507 - [CORE] Fix core tests for Java 11
  • SPARK-26536 - [BUILD][TEST] Upgrade Mockito to 2.23.4
  • SPARK-26708 - [SQL][BRANCH-2.4] Incorrect result caused by inconsistency between a SQL cache's cached RDD and its physical plan
  • SPARK-26839 - [SQL] Work around classloader changes in Java 9 for Hive isolation
  • SPARK-26963 - [MLLIB] SizeEstimator can't make some JDK fields accessible in Java 9+
  • SPARK-26966 - [ML] Update to JPMML 1.4.8
  • SPARK-26986 - [ML][FOLLOWUP] Add JAXB reference impl to build for Java 9+
  • SPARK-26986 - [ML] Add JAXB reference impl to build for Java 9+
  • SPARK-26990 - [SQL][BACKPORT-2.4] FileIndex: use user specified field names if possible
  • SPARK-27094 - [YARN] Work around RackResolver swallowing thread interrupt.
  • SPARK-27112 - [CORE] : Create a resource ordering between threads to resolve the deadlocks encountered ...
  • SPARK-27121 - [REPL] Resolve Scala compiler failure for Java 9+ in REPL
  • SPARK-27122 - [CORE] Jetty classes must not be return via getters in org.apache.spark.ui.WebUI
  • SPARK-27178 - [K8S] add nss to the spark/k8s Dockerfile
  • SPARK-27260 - [SS] Upgrade to Kafka 2.2.0
  • SPARK-27704 - Accept zstd and lz4 as parquet compression algorithms. This is basically adding these codecs back according toOriginally these were not supported by CDH parquet and thus Spark support had been adjusted accordingly.
  • SPARK-27794 - [R][DOCS] Use https URL for CRAN repo

Apache Sqoop

There are no notable fixed issues in this release.

Apache ZooKeeper

The following issues are fixed in CDH 6.3.0:

  • ZOOKEEPER-271 - Better command line parsing in ZookeeperMain.
  • ZOOKEEPER-442 - need a way to remove watches that are no longer of interest
  • ZOOKEEPER-1220 - ./zkCli.sh 'create' command is throwing ArrayIndexOutOfBoundsException
  • ZOOKEEPER-1392 - Request READ or ADMIN permission for getAcl()
  • ZOOKEEPER-1673 - Zookeeper don't support cidr in expression in ACL with ip scheme
  • ZOOKEEPER-1748 - TCP keepalive for leader election connections
  • ZOOKEEPER-1830 - Support command line shell for removing watches
  • ZOOKEEPER-1831 - Document remove watches details to the guide
  • ZOOKEEPER-1887 - C implementation of removeWatches
  • ZOOKEEPER-1909 - removeWatches doesn't return NOWATCHER when there is
  • ZOOKEEPER-1910 - RemoveWatches wrongly removes the watcher if multiple watches
  • ZOOKEEPER-1919 - Update the C implementation of removeWatches to have it match ZOOKEEPER-1910
  • ZOOKEEPER-2062 - RemoveWatchesTest takes forever to run
  • ZOOKEEPER-2141 - ACL cache in DataTree never removes entries
  • ZOOKEEPER-2184 - Zookeeper Client should re-resolve hosts when connection attempts fail
  • ZOOKEEPER-2237 - Port async multi to 3.4 branch
  • ZOOKEEPER-2611 - zoo_remove_watchers - can remove the wrong watch
  • ZOOKEEPER-3263 - JAVA9/11 Warnings: Illegal reflective access in zookeeper's kerberosUtil