Fixed Issues in CDH 6.2.0

Hue allows unsigned SAML assertions

If Hue receives an unsigned assertion, it continues to process it as valid. This means it is possible for an end-user to forge or remove the signature and manipulate a SAML assertion to gain access without a successful authentication.

Products affected: Hue, CDH

Releases affected:
  • CDH 5.15.x and earlier
  • CDH 5.16.0, 5.16.1
  • CDH 6.0.x
  • CDH 6.1.x

User affected: All users who are using SAML with Hue.

CVE: CVE-2019-14775

Date/time of detection: January 2019

Detected by: Joel Snape

Severity (Low/Medium/High): High

Impact:

This is a significant security risk as it allows anyone to fake their access validity and therefore access Hue, even if they should not have access. In more detail: if Hue receives an unsigned assertion, it continues to process it as valid. This means it is possible for an end-user to forge or remove the signature and manipulate a SAML assertion to gain access without a successful authentication.

CVE: CVE-2019-14775

Immediate action required:
  • Upgrade (recommended): Upgrade to a version of CDH containing the fix.
  • Workaround: None
Addressed in release/refresh/patch:
  • CDH 5.16.2
  • CDH 6.2.0

Hue external users granted super user priviliges in C6

When using either the LdapBackend or the SAML2Backend authentication backends in Hue, users that are created on login when logging in for the first time are granted superuser privileges in CDH 6. This does not apply to users that are created through the User Admin application in Hue.

Products affected: Hue

Releases affected: CDH 6.0.0, CDH 6.0.1, CDH 6.1.0

Users affected: All user

Date/time of detection: Dec/12/18

Severity (Low/Medium/High): Medium

Impact:

The superuser privilege is granted to any user that logs in to Hue when LDAP or SAML authentication is used. For example, if you have the create_users_on_login property set to true in the Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini, and you are using LDAP or SAML authentication, a user that logs in to Hue for the first time is created with superuser privileges and can perform the following actions:

When the SAML2Backend is used, Hue accounts that have superuser privileges can:
  • Create/Delete users and groups
  • Assign users to groups
  • Alter group permissions
However, when the SAML2Backend is used, users can only log in to Hue using SAML authentication.
When the LdapBackend is used, Hue accounts that have superuser privileges can:
  • Synchronize Hue users with your LDAP server
  • Create local users and groups (these local users can login to Hue only if the mode of multi-backend authentication is set up as LdapBackend and AllowFirstUserDjangoBackend)
  • Assign users to groups
  • Alter group permissios
This impact does not apply to the following other scenarios:
  • When users are synced with your LDAP server manually by using the User Admin page in Hue.
  • When you are using other authentication methods. For example:
    • AllowFirstUserDjangoBackend
    • Spnego
    • PAM
    • Oauth
When the LdapBackend and AllowFirstUserDjangoBackend are used, administrators should note:
  • Local users, including users created by unexpected superusers, can login throug AllowFirstUserDjangoBackend.
  • Local users in Hue that created as hive, hdfs, or solr have privileges to access protected data and alter permissions in security app.
  • Removing the AllowFirstUserDjangoBackend authentication backend can stop local users login to Hue, but it requires the administrator to have Cloudera Manager access

CVE: CVE-2019-7319

Immediate action required: Upgrade and follow the instructions below.

Addressed in release/refresh/patch: CDH 6.1.1 and CDH 6.2.0

After upgrading to 6.1.1 or later, you must run the following update statement in the Hue database:
UPDATE useradmin_userprofile SET `creation_method` = 'EXTERNAL' WHERE `creation_method` = 'CreationMethod.EXTERNAL';

After executing the UPDATE statement, new Hue users are no longer automatically created as superusers.

To find out the list of superusers, run SQL query:

SELECT username FROM auth_user WHERE superuser = 1;
Users who obtained superuser privilege due to this issue need to be revoked manually by using the following steps:
  1. Log in to the Hue UI as an administrator.
  2. In the upper right corner of the page, click the user drop-down list and select Manage User:
  3. In the User Admin page, make sure that the Users tab is selected and click the name of the user in the list that you want to edit:
  4. In the Hue Users - Edit user page, click Step 3: Advanced:
  5. Clear the checkbox for Superuser status:
  6. At the bottom of the page, click Update user to save the change.

For the latest update on this issue see the corresponding Knowledge article:

TSB 2019-360: Hue external users granted super user privileges in C6

Spark’s stage retry logic could result in duplicate data

Apache Spark’s retry logic may allow tasks from both a failed output stage attempt and a successful retry attempt to commit output for the same partition.

Products affected: CDS Powered By Apache Spark

Affected versions:
  • CDS 2.1.0 release 1 and release 2
  • CDS 2.2.0 release 1 and release 2
  • CDS 2.3.0 release 2
Fixed versions:
  • CDH 6.2.0, 6.3.0
  • CDS 2.1.0 release 3
  • CDS 2.2.0 release 3
  • CDS 2.3.0 release 3
For the latest update on this issue see the corresponding Knowledge article: TSB 2019-337-1: Spark’s stage retry logic could result in duplicate data

Spark’s stage retry logic could result in missing data

Apache Spark’s retry logic may allow a task from a failed stage attempt to clean up data from its corresponding task in a successful stage retry attempt..

Products affected: CDS Powered By Apache Spark

Affected versions:
  • CDS 2.2.0 release 1, release 2
  • CDS 2.3.0 release 1, release 2
Fixed versions:
  • CDH 6.2.0, 6.3.0
  • CDS 2.2.0 release 3
  • CDS 2.3.0 release 3
For the latest update on this issue see the corresponding Knowledge article: TSB 2019-337-2: Spark’s stage retry logic could result in missing data

Shuffle+Repartition on a DataFrame could lead to incorrect answers

When a repartition follows a shuffle, the assignment of rows to partitions is nondeterministic. If Spark has to recompute a partition, for example, due to an executor failure, the retry can consume a different set of input rows than the original computation. As a result, some rows can be dropped, and others can be duplicated.

Products affected: CDS Powered By Apache Spark

Affected versions:
  • CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1
  • CDS 2.1.0 release 1, release 2
  • CDS 2.2.0 release 1, release 2
Fixed versions:
  • CDH 6.2.0, 6.3.0
  • CDS 2.1.0 release 3
  • CDS 2.2.0 release 3
  • CDS 2.3.0 release 3
For the latest update on this issue see the corresponding Knowledge article: TSB 2019-337-3: Shuffle+Repartition on a DataFrame could lead to incorrect answers

Shuffle+Repartition on an RDD could lead to incorrect answers

When a repartition follows a shuffle, the assignment of records to partitions is nondeterministic. If Spark has to recompute a partition, for example, due to an executor failure, the retry can consume a different set of input records than the original computation. As a result, some records can be dropped, and others can be duplicated.

Products affected: CDS Powered By Apache Spark

Affected versions:
  • CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1
  • CDS 2.1.0 release 1, release 2, release 3
  • CDS 2.2.0 release 1, release 2, release 3
  • CDS 2.3.0 release 1, release 2, release 3
Fixed versions:
  • CDH 6.2.0, 6.3.0
  • CDS 2.1.0 release 4
  • CDS 2.2.0 release 4
  • CDS 2.3.0 release 4
For the latest update on this issue see the corresponding Knowledge article: TSB 2019-337-4: Shuffle+Repartition on an RDD could lead to incorrect answers

Inconsistent rows returned from queries in Kudu

Due to KUDU-2463, upon restarting Kudu, inconsistent rows may be returned from tables that have not recently been written to, resulting in any of the following:

  • multiple rows for the same key being returned
  • deleted data being returned
  • inconsistent results consistently being returned for the same query

If this happens, you have two options to resolve the conflicts: write to the affected Kudu partitions by:

  • re-deleting the known and deleted data
  • upserting the most up-to-date version of affected rows.

Products affected: Apache Kudu

Affected version:
  • CDH 5.12.2, 5.13.3, 5.14.4, 5.15.1, 5.16.1
  • CDH 6.0.1, 6.1.0, 6.1.1
Fixed version:
  • CDH 5.16.2
  • CDH 6.2.0

For the latest update on this issue see the corresponding Knowledge article:TSB 2019-353: Inconsistent rows returned from queries in Kudu

Timestamp type-casted to varchar in a binary predicate can produce incorrect result

In an Impala query the timestamp can be type-casted to a varchar of smaller length to convert a timestamp value to a date string. However, if such Impala query is used in a binary comparison against a string literal, it can produce incorrect results, because of a bug in the expression rewriting code. The following is an example of this:
> select * from (select cast('2018-12-11 09:59:37' as timestamp) as ts) tbl where cast(ts as varchar(10)) = '2018-12-11';
The output will have 0 rows.
Affected version:
  • CDH 5.15.0, 5.15.1, 5.15.2, 5.16.0, 5.16.1
  • CDH 6.0.0, 6.0.1, 6.1.0, 6.1.1
Fixed versions:
  • CDH 5.16.2
  • CDH 6.2.0

For the latest update on this issue see the corresponding Knowledge article:TSB 2019-358: Timestamp type-casted to varchar in a binary predicate can produce incorrect result

XSS Cloudera Manager

Malicious Impala queries can result in Cross Site Scripting (XSS) when viewed in Cloudera Manager.

Products affected: Apache Impala

Releases affected:
  • Cloudera Manager 5.13.x, 5.14.x, 5.15.1, 5.15.2, 5.16.1
  • Cloudera Manager 6.0.0, 6.0.1, 6.1.0

Users affected: All Cloudera Manager Users

Date/time of detection: November 2018

Severity (Low/Medium/High): High

Impact: When a malicious user generates a piece of JavaScript in the impala-shell and then goes to the Queries tab of the Impala service in Cloudera Manager, that piece of JavaScript code gets evaluated, resulting in an XSS.

CVE: CVE-2019-14449

Immediate action required: There is no workaround, upgrade to the latest available maintenance release.

Addressed in release/refresh/patch:
  • Cloudera Manager 5.16.2
  • Cloudera Manager 6.0.2, 6.1.1, 6.2.0, 6.3.0

CVE-2018-1296 Permissive Apache Hadoop HDFS listXAttr Authorization Exposes Extended Attribute Key/Value Pairs

AHDFS exposes extended attribute key/value pairs during listXAttrs, verifying only path-level search access to the directory rather than path-level read permission to the referent.

Products affected: Apache HDFS

Releases affected:
  • CDH 5.4.0 - 5.15.1, 5.16.0
  • CDH 6.0.0, 6.0.1, 6.1.0

Users affected: Users who store sensitive data in extended attributes, such as users of HDFS encryption.

Date/time of detection: Dcember 12, 2017

Detected by: Rushabh Shah, Yahoo! Inc., Hadoop committer

Severity (Low/Medium/High): Medium

Impact: HDFS exposes extended attribute key/value pairs during listXAttrs, verifying only path-level search access to the directory rather than path-level read permission to the referent. This affects features that store sensitive data in extended attributes.

CVE: CVE-2018-1296

Immediate action required:
  • Upgrade: Update to a version of CDH containing the fix.
  • Workaround: If a file contains sensitive data in extended attributes, users and admins need to change the permission to prevent others from listing the directory that contains the file.
Addressed in release/refresh/patch:
  • CDH 5.15.2, 5.16.1
  • CDH 6.1.1, 6.2.0

Kafka JMX Tool Cannot Connect to JMX

The Kafka JMX tool cannot connect to the JMX agent of the Kafka Broker or MirrorMaker if the specified address of the JMX remote connector is bound to 127.0.0.1.

Workaround:
  1. In Cloudera Manager go to Kafka > Instances and select the affected broker.
  2. Find the Additional Broker Java Options and Additional MirrorMaker Java Optionsproperties and add the following Java option to the configuration:
    -Djava.rmi.server.hostname=127.0.0.1
  3. Restart the affected brokers.

Affected Versions: CDH 6.0.0 and higher

Fixed Versions: CDH 6.2.0

Cloudera Issue: OPSAPS-48695

Kafka Broker Fails to Start Due to Slow Sentry and HMS startup

This issue is encountered on cluster startup and is caused by misalignment between Kafka, Sentry, and HMS. The slow startup of HMS slows down Sentry startup which consequently makes the Kafka connection to Sentry time out. Ultimately, the Kafka broker will be unable to start.

Workaround: Manually increase the number of remote procedure call retries between Sentry and Kafka through the Sentry Client Advanced Configuration Snippet (Safety Valve) for sentry-site.xml property.

  1. Go to Sentry > Configuration and find the Sentry Client Advanced Configuration Snippet (Safety Valve) for sentry-site.xml property.
  2. Click on the add button.
  3. Enter the following data:
    • Name: sentry.service.client.rpc.retry-total
    • Value: 20
  4. Enter a Reason for change, and then click Save Changes to commit the changes.
  5. Return to the Home page by clicking the Cloudera Manager logo.
  6. Click the restart stale services icon next to the Sentry service to invoke the cluster restart wizard.
  7. Click Restart Stale Services.
  8. Click Restart Now.
  9. Click Finish.

Affected Versions: CDH 6.1.0 and higher

Fixed Versions: CDH 6.2.0

Cloudera Issue: CDH-74713

Hadoop LdapGroupsMapping does not support LDAPS for self-signed LDAP server

Hadoop LdapGroupsMapping does not work with LDAP over SSL (LDAPS) if the LDAP server certificate is self-signed. This use case is currently not supported even if Hadoop User Group Mapping LDAP TLS/SSL Enabled, Hadoop User Group Mapping LDAP TLS/SSL Truststore, and Hadoop User Group Mapping LDAP TLS/SSL Truststore Password are filled properly.

Affected Versions: CDH 5.x and 6.0.x versions

Fixed Versions: CDH 6.1.0

Apache Issue: HADOOP-12862

Cloudera Issue: CDH-37926

Upstream Issues Fixed

Apache Accumulo

There are no notable fixed issues in this release.

Apache Avro

There are no notable fixed issues in this release.

Apache Crunch

There are no notable fixed issues in this release.

Apache Flume

The following issues are fixed in CDH 6.2.0:

  • FLUME-2050 - Upgrade to Log4j 2.10.0
  • FLUME-2071 - Flume Context doesn't support float or double configuration values.
  • FLUME-2464 - Remove hadoop and hbase profiles.
  • FLUME-2653 - Allow hdfs sink inUseSuffix to be empty
  • FLUME-2698 - Upgrade Jetty Version
  • FLUME-2786, FLUME-3056, FLUME-3117 - Application enters a deadlock when stopped while handleConfigurationEvent
  • FLUME-2799 - Kafka Source - Add message offset to headers
  • FLUME-2894 - Flume components should stop in the correct order
  • FLUME-2976 - Exception when JMS source tries to connect to a Weblogic server without authentication
  • FLUME-2988 - Kafka Sink metrics missing eventDrainAttemptCount
  • FLUME-2989 - Added 2 KafkaChannel metrics
  • FLUME-3046 - Kafka Sink and Source Configuration Improvements
  • FLUME-3087 - Change log level from WARN to INFO
  • FLUME-3101 - Add maxBatchCount config property to Taildir Source.
  • FLUME-3115 - Update netty library
  • FLUME-3133 - Add client IP / hostname headers to Syslog sources.
  • FLUME-3142 - Adding HBase2 sink
  • FLUME-3158 - Upgrade surefire version and config
  • FLUME-3183 - Maven: generate SHA-512 checksum during deploy
  • FLUME-3186 - Make asyncHbaseClient config parameters available from Flume config
  • FLUME-3194 - Upgrade derby to the latest version
  • FLUME-3201 - Fix SyslogUtil to handle RFC3164 format in December correctly
  • FLUME-3223 - Flume HDFS Sink should retry close prior recover lease
  • FLUME-3228 - Incorrect parameter name in timestamp interceptor docs
  • FLUME-3243 - hdfs.callTimeout deafault increased and deprecated
  • FLUME-3246 - Validate flume configuration to prevent larger source batchsize than
  • FLUME-3253 - Update jackson-databind dependecy to the latest version
  • FLUME-3270 - Close JMS resources in JMSMessageConsumer constructor in
  • FLUME-3281 - Update to Kafka 2.0
  • FLUME-3282 - Use slf4j in every component
  • FLUME-3294 - Fix polling logic in TaildirSource
  • FLUME-3296 - Revert log4j 2 upgrade on 1.9 branch
  • FLUME-3298 - Make hadoop-common optional in hadoop-credential-store-config-filter
  • FLUME-3299 - Fix log4j scopes in pom files
  • FLUME-3302 - Fix issues discovered during the release
  • FLUME-3314 - Fixed NPE in Kafka source/channel during offset migration

Apache Hadoop

The following issues are fixed in CDH 6.2.0:

  • HADOOP-9567 - Provide auto-renewal for keytab based logins.
  • HADOOP-11100 - Support to configure ftpClient.setControlKeepAliveTimeout.
  • HADOOP-14314 - The OpenSolaris taxonomy link is dead in InterfaceClassification.md.
  • HADOOP-14970 - MiniHadoopClusterManager does not respect the lack of format option.
  • HADOOP-15214 - Make Hadoop compatible with Guava 21.0.
  • HADOOP-15813 - Enable a more reliable SSL connection reuse.
  • HADOOP-15823 - ABFS: Stop requiring client ID and tenant ID for MSI.
  • HADOOP-15832 - Upgrade BouncyCastle to 1.60.
  • HADOOP-15860 - ABFS: Throw IllegalArgumentException when a directory or a file name ends with a period.

HDFS

The following issues are fixed in CDH 6.2.0:

  • HDFS-12498 - JournalNodeSyncer is not started in a federated HA cluster.
  • HDFS-12579 - JournalNodeSyncer should use fromUrl field of EditLogManifestResponse to construct the servlet Url.
  • HDFS-12716 - The 'dfs.datanode.failed.volumes.tolerated' property to support minimum number of volumes that should be available.
  • HDFS-12886 - Ignore minReplication for block recovery.
  • HDFS-12946 - Add a tool to check the rack configuration against EC policies.
  • HDFS-13023 - JournalNodeSyncer does not work on a secure cluster.
  • HDFS-13626 - Fix incorrect username when the setOwner operation is denied.
  • HDFS-13744 - OIV tool should better handle control characters present in file or directory names.
  • HDFS-13761 - Add toString method to the AclFeature class.
  • HDFS-13818 - Extend OIV to detect FSImage corruption.
  • HDFS-13996 - Make HttpFS ACLs RegEx-configurable.
  • HDFS-14008 - NameNode should log the snapshotdiff report.
  • HDFS-14015 - Improve error handling in hdfsThreadDestructor in the native thread local storage.
  • HDFS-14027 - DFSStripedOutputStream should implement both the hsync methods.
  • HDFS-14028 - The HDFS OIV temporary directory deletes a folder.
  • HDFS-14053 - Provide ability for NameNode to re-replicate based on topology changes.
  • HDFS-14061 - Check if the cluster topology supports the EC policy before setting, enabling, or adding it.
  • HDFS-14125 - Use a parameterized log format in ECTopologyVerifier.
  • HDFS-14140 - JournalNodeSyncer authentication is failing in a secure cluster.
  • HDFS-14188 - Make the hdfs ec -verifyClusterSetup command accept an EC policy as a parameter.
  • HDFS-14231 - DataXceiver#run() should not log exceptions caused by InvalidTokenException as an error.

MapReduce 2

The following issues are fixed in CDH 6.2.0:

  • MAPREDUCE-4669 - MRAM web UI does not work with HTTPS.
  • MAPREDUCE-7125 - JobResourceUploader creates LocalFileSystem when it's not necessary.

YARN

The following issues are fixed in CDH 6.2.0:

  • YARN-7396 - NPE when accessing container logs due to null dirsHandler
  • YARN-8582 - Document YARN support for HTTPS in AM Web server.
  • YARN-8865 - RMStateStore contains large number of expired RMDelegationToken
  • YARN-8899 - Fixed minicluster dependency on yarn-server-web-proxy.
  • YARN-8908 - Fix errors in yarn-default.xml related to GPU/FPGA.
  • YARN-9087 - Improve logging for initialization of Resource plugins.
  • YARN-9095 - Removed Unused field from Resource: NUM_MANDATORY_RESOURCES
  • YARN-9213 - RM Web UI v1 does not show custom resource allocations for containers page
  • YARN-9318 - Resources#multiplyAndRoundUp does not consider Resource Types
  • YARN-9322 - Store metrics for custom resource types into FSQueueMetrics and query them in FairSchedulerQueueInfo
  • YARN-9323 - FSLeafQueue#computeMaxAMResource does not override zero values for custom resources

Apache HBase

The following issues are fixed in CDH 6.2.0:

  • HBASE-17356 - Add replica get support
  • HBASE-18735 - Provide an option to kill a MiniHBaseCluster without waiting on shutdown
  • HBASE-19695 - Handle disabled table for async client
  • HBASE-19722 - Meta query statistics metrics source
  • HBASE-20220 - [RSGroup] Check if table exists in the cluster before moving it to the specified regionserver group
  • HBASE-20604 - ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled
  • HBASE-20917 - MetaTableMetrics#stop references uninitialized requestsMap for non-meta region
  • HBASE-21178 - [BC break] : Get and Scan operation with a custom converter_class not working
  • HBASE-21215 - Figure how to invoke hbck2; make it easy to find
  • HBASE-21247 - Custom Meta WAL Provider doesn't default to custom WAL Provider whose configuration value is outside the enums in Providers
  • HBASE-21281 - Upgrade bouncycastle to latest
  • HBASE-21282 - Upgrade to latest jetty 9.2 and 9.3 versions
  • HBASE-21297 - ModifyTableProcedure can throw TNDE instead of IOE in case of REGION_REPLICATION change
  • HBASE-21300 - Fix the wrong reference file path when restoring snapshots for tables with MOB columns
  • HBASE-21314 - The implementation of BitSetNode is not efficient
  • HBASE-21321 - HBASE-21278 to branch-2.1 and branch-2.0
  • HBASE-21322 - Add a scheduleServerCrashProcedure() API to HbckService
  • HBASE-21336 - Simplify the implementation of WALProcedureMap
  • HBASE-21338 - Warn if balancer is an ill-fit for cluster size
  • HBASE-21342 - FileSystem in use may get closed by other bulk load call in secure bulkLoad
  • HBASE-21345 - [hbck2] Allow version check to proceed even though master is 'initializing'.
  • HBASE-21349 - Do not run CatalogJanitor or Nomalizer when cluster is shutting down
  • HBASE-21354 - Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
  • HBASE-21355 - HStore's storeSize is calculated repeatedly which causing the confusing region split
  • HBASE-21356 - bulkLoadHFile API should ensure that rs has the source hfile's write permissionls
  • HBASE-21363 - Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
  • HBASE-21364 - Procedure holds the lock should put to front of the queue after restart
  • HBASE-21371 - Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
  • HBASE-21372 - ) Set hbase.assignment.maximum.attempts to Long.MAX
  • HBASE-21375 - Revisit the lock and queue implementation in MasterProcedureScheduler
  • HBASE-21377 - Add debug log for procedure stack id related operations
  • HBASE-21384 - Procedure with holdlock=false should not be restored lock when restarts
  • HBASE-21385 - HTable.delete request use rpc call directly instead of AsyncProcess
  • HBASE-21387 - Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files
  • HBASE-21388 - No need to instantiate MemStoreLAB for master which not carry table
  • HBASE-21391 - RefreshPeerProcedure should also wait master initialized before executing
  • HBASE-21395 - Abort split/merge procedure if there is a table procedure of the same table going on
  • HBASE-21401 - Sanity check when constructing the KeyValue
  • HBASE-21407 - Resolve NPE in backup Master UI
  • HBASE-21410 - A helper page that help find all problematic regions and procedures
  • HBASE-21413 - Empty meta log doesn't get split when restart whole cluster
  • HBASE-21421 - Do not kill RS if reportOnlineRegions fails
  • HBASE-21423 - Procedures for meta table/region should be able to execute in separate workers
  • HBASE-21437 - Bypassed procedure throw IllegalArgumentException when its state is WAITING_TIMEOUT
  • HBASE-21439 - RegionLoads aren't being used in RegionLoad cost functions
  • HBASE-21440 - Assign procedure on the crashed server is not properly interrupted
  • HBASE-21445 - CopyTable by bulkload will write hfile into yarn's HDFS
  • HBASE-21468 - separate workers for meta table is not working
  • HBASE-21473 - RowIndexSeekerV1 may return cell with extra two \x00\x00 bytes which has no tags
  • HBASE-21480 - Taking snapshot when RS crashes prevent we bring the regions online
  • HBASE-21485 - Add more debug logs for remote procedure execution
  • HBASE-21490 - WALProcedure may remove proc wal files still with active procedures
  • HBASE-21492 - CellCodec Written To WAL Before It's Verified
  • HBASE-21498 - Master OOM when SplitTableRegionProcedure new CacheConfig and instantiate a new BlockCache
  • HBASE-21511 - Remove in progress snapshot check in SnapshotFileCache#getUnreferencedFiles
  • HBASE-21524 - Fix logging in ConnectionImplementation.isTableAvailable()
  • HBASE-21545 - NEW_VERSION_BEHAVIOR breaks Get/Scan with specified columns
  • HBASE-21551 - Memory leak when use scan with STREAM at server side -
  • HBASE-21554 - Show replication endpoint classname for replication peer on master web UI
  • HBASE-21567 - Allow overriding configs starting up the shell
  • HBASE-21568 - Use CacheConfig.DISABLED where we don't expect to have blockcache running
  • HBASE-21570 - Add write buffer periodic flush support for AsyncBufferedMutator
  • HBASE-21580 - Support getting Hbck instance from AsyncConnection
  • HBASE-21582 - If call HBaseAdmin#snapshotAsync but forget call isSnapshotFinished, then SnapshotHFileCleaner will skip to run every time
  • HBASE-21590 - Optimize trySkipToNextColumn in StoreScanner a bit.
  • HBASE-21592 - quota.addGetResult(r) throw NPE
  • HBASE-21610 - , numOpenConnections metric is set to -1 when zero server channel exist
  • HBASE-21620 - Problem in scan query when using more than one column prefix filter in some cases
  • HBASE-21629 - draining_servers.rb is broken
  • HBASE-21630 - [shell] Define ENDKEY == STOPROW
  • HBASE-21631 - list_quotas should print human readable values for LIMIT
  • HBASE-21639 - maxHeapUsage value not read properly from config during EntryBuffers initialization
  • HBASE-21645 - Perform sanity check and disallow table creation/modification with region replication < 1
  • HBASE-21662 - Add append_peer_exclude_namespaces and remove_peer_exclude_namespaces shell commands
  • HBASE-21663 - Add replica scan support
  • HBASE-21682 - Support getting from specific replica
  • HBASE-21694 - Add append_peer_exclude_tableCFs and remove_peer_exclude_tableCFs shell commands
  • HBASE-21704 - The implementation of DistributedHBaseCluster.getServerHoldingRegion is incorrect
  • HBASE-21705 - Should treat meta table specially for some methods in AsyncAdmin
  • HBASE-21712 - Make submit-patch.py python3 compatible
  • HBASE-21732 - Should call toUpperCase before using Enum.valueOf in some methods for ColumnFamilyDescriptor
  • HBASE-21738 - Remove all the CLSM#size operation in our memstore because it's an quite time consuming.
  • HBASE-21746 - Fix two concern cases in RegionMover
  • HBASE-21843 - RegionGroupingProvider breaks the meta wal file name pattern which may cause data loss for meta region
  • HBASE-21862 - IPCUtil.wrapException should keep the original exception types for all the connection exceptions
  • HBASE-21915 - Make FileLinkInputStream implement CanUnbuffer
  • HBASE-21960 - Ensure RESTServletContainer used by RESTServer

Apache Hive

The following issues are fixed in CDH 6.2.0:

Code Changes Might Be Required

The following fixes might require code changes for the CDH 6.2.0 release of Apache Hive:

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • HIVE-15884 - Optimize not between for vectorization
  • HIVE-16839 - Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently
  • HIVE-18238 - Driver execution may not have configuration changing side-effects
  • HIVE-18652 - Expose remoteBytesReadToDisk via HoS
  • HIVE-19564 - Vectorization: Fix NULL / Wrong Results issues in Arithmetic
  • HIVE-20306 - Implement projection spec for fetching only requested fields from partitions
  • HIVE-20307 - Add support for filterspec to the getPartitions with projection API
  • HIVE-20330 - HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs
  • HIVE-20331 - Query with union all, lateral view and Join fails with "cannot find parent in the child operator"
  • HIVE-20484 - Disable Block Cache By Default With HBase SerDe
  • HIVE-20535 - Add new configuration to set the size of the global compile lock
  • HIVE-20661 - Dynamic partitions loading calls add partition for every partition 1-by-1
  • HIVE-20722 - Switch HS2 CompileLock to use fair locks
  • HIVE-20737 - Local SparkContext is shared between user sessions and should be closed only when there is no active
  • HIVE-20776 - HMS filterHooks on server-side in addition to client-side
  • HIVE-20796 - jdbc URL can contain sensitive information that should not be logged
  • HIVE-20818 - Views created with a WHERE subquery will regard views referenced in the subquery as direct input
  • HIVE-20843 - Properly detect RELY constraint in primary keys and foreign keys
  • HIVE-20914 - MRScratchDir permission denied when "hive.server2.enable.doAs", "hive.exec.submitviachild" are set to "true" and impersonated/proxy user is used
  • HIVE-20924 - Property 'hive.driver.parallel.compilation.global.limit' should be immutable at runtime
  • HIVE-20992 - Split the config "hive.metastore.dbaccess.ssl.properties" into more meaningful configs
  • HIVE-21015 - HCatLoader can't provide statistics for tables not in default DB
  • HIVE-21028 - get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException
  • HIVE-21030 - Add credential store env properties redaction in JobConf
  • HIVE-21035 - Race condition in SparkUtilities#getSparkSession
  • HIVE-21044 - Add SLF4J reporter to the metastore metrics systems
  • HIVE-21035: Add HMS total api count stats and connection pool stats to metrics
  • HIVE-21077 - Database and Catalogs should have creation time
  • HIVE-21083 - Remove the requirement to specify the truststore location when TLS to the database is turned on
  • HIVE-21116 - HADOOP_CREDSTORE_PASSWORD is not populated under yarn.app.mapreduce.am.admin.user.env
  • HIVE-21320 - Portget_fields() and get_tables_by_type() are not protected by HMS server access control

Hue

The following issues are fixed in CDH 6.2.0:

  • HUE-7128 - [core] Apply config ENABLE_DOWNLOAD to search dashboard download
  • HUE-7258 - [jb] Add config check for Spark history server URL
  • HUE-7919 - oozie error 'NoneType' object has no attribute 'is_superuser'
  • HUE-8140 - [editor] Stabilize multi-statement execution
  • HUE-8330 - [core] Multi cluster support of namespaces and compute
  • HUE-8564 - Avro viewer for File Browser
  • HUE-8577 - [autocomplete] Update Hive and Impala autocompleter to the latest version
  • HUE-8584 - [useradmin] Exposing errors for Add Sync Ldap Group
  • HUE-8585 - [useradmin] Exposing errors for Add Sync Ldap Users
  • HUE-8587 - Enable queries in Job Browser to work with Smart Connection Pool
  • HUE-8598 - [autocomplete] Improve autocomplete for CREATE statements
  • HUE-8605 - [metadata] Only show the Table Privilege tab when Sentry is enabled
  • HUE-8610 - [tb] The sample call from the Table Browser fails for computes other than default
  • HUE-8616 - [cluster] getNamespaces for impala returns namespace with hive compute
  • HUE-8617 - [frontend] Support multi cluster in invalidate metadata
  • HUE-8638 - [importer] Add autocompletion to query editor in second step of importer
  • HUE-8641 - [frontend] Trigger a namespace refresh when the context catalog is cleared
  • HUE-8645 - [assist] Improve namespace listing after cluster creation
  • HUE-8648 - [importer] Sqoop-configured RDBMS fails
  • HUE-8649 - [frontend] Add a performance graph component
  • HUE-8651 - [editor] Add a dedicated execution analysis tab in the editor
  • HUE-8657 - [frontend] Improve create and configure cluster forms
  • HUE-8659 - [importer] Fix js exception with the field editor
  • HUE-8661 - [assist] Enable scrollbars in context popover view sql
  • HUE-8664 - [importer] Fixed Flume source import properties initialization
  • HUE-8665 - [editor] Add basic execution analysis for Impala
  • HUE-8666 - [autocomplete] Fix timing issue with "... ? from table" completion
  • HUE-8667 - [autocomplete] Fix issue where order by and group by suggestions aren't displayed properly
  • HUE-8668 - [editor] Add table names to syntax checker suggestions
  • HUE-8670 - [cluster] Adding auto resize option to the update cluster API
  • HUE-8679 - [jb] Support query interface in multi cluster node
  • HUE-8680 - [core] Fill in Impalad WEBUI username passwords automatically if needed
  • HUE-8681 - [assist] Include unopened topics in the language ref filter
  • HUE-8682 - [backend] Change PAM lib to python-pam-1.8.4
  • HUE-8685 - [importer] DB importer always shows DB already exists
  • HUE-8688 - Update Chinese language code to enable localization
  • HUE-8690 - Fix Hue allows unsigned SAML assertions
  • HUE-8691 - [useradmin] Add/sync group does not add users if the objectClass posixGroup already exists in the group LDAP entry
  • HUE-8692 - [useradmin] Group sync fails if all group members are not found
  • HUE-8693 - [useradmin] Security app only displays 100 users in the impersonate list
  • HUE-8694 - [frontend] Fix scroll in the database drop-down menu
  • HUE-8695 - [importer] Do not show the command but submit when clicking on submit button

Apache Impala

The following issues are fixed in CDH 6.2.0:

  • IMPALA-341 - Remote profiles are no longer ignored by the coordinator for the queries with the LIMIT clause.
  • IMPALA-941- Impala supports fully qualified table names that start with a number.
  • IMPALA-1048 - The query execution summary now includes the total time taken and memory consumed by the data sink at the root of each query fragment.
  • IMPALA-3323 - Fixed the issue where valid impala-shell options, such as --ldap_password_cmd, were unrecognized when the --config_file option was specified.
  • IMPALA-5397 - If a query has a dedicated coordinator, its end time is now set when the query releases its admission control resources. With no dedicated coordinator, the end time is set on un-registration.
  • IMPALA-5474 - Fixed an issue where adding a trivial subquery to a query with an error turns the error into a warning.
  • IMPALA-6521 - When set, experimental flags are now shown in /varz in web UI and log files.
  • IMPALA-6900 - INVALIDATE METADATA operation is no longer ignored when HMS is empty.
  • IMPALA-7446 - Impala enables buffer pool garbage collection when near process memory limit to prevent queries from spilling to disk earlier than necessary.
  • IMPALA-7659 - In COMPUTE STATS, Impala counts the number of NULL values in a table
  • IMPALA-7857 - Logs more information about StateStore failure detection.
  • IMPALA-7928 - To increase the efficiency of the HDFS file handle cache, remote reads for a particular file are scheduled to a consistent set of executor nodes.
  • IMPALA-7929 - Impala query on tables created via Hive and mapped to HBase failed with an internal exception because the qualifier of the HBase key column is null in the mapped table. Impala relaxed the requirement and allows a NULL qualifier.
  • IMPALA-7960 - Impala now returns a correct result when comparing TIMESTAMP to a string literal in a binary predicate where the TIMESTAMP is casted to VARCHAR of smaller length.
  • IMPALA-7961 - Fixed an issue where queries running with the SYNC_DDL query option can fail when the Catalog Server is under a heavy load with concurrent catalog operations of long-running DDLs.
  • IMPALA-8026 - Impala query profile now reports correct row counts for all nested loop join modes.
  • IMPALA-8061 - Impala correctly initializes S3_ACCESS_VALIDATED variable to zero when TARGET_FILESYSTEM=3.
  • IMPALA-8154 - Disabled the Kerberos auth_to_local setting to prevent connection issues between impalads.
  • IMPALA-8188 - Impala now correctly detects an NVME device name and handles it.
  • IMPALA-8245 - Added hostname to the timeout error message to enable the user to easily identify the host which has reached a bad connection state with the HDFS NameNode.
  • IMPALA-8254 - COMPUTE STATS failed if COMPRESSION_CODEC is set.

Apache Kafka

The following issues are fixed in CDH 6.2.0:

  • KAFKA-3514 - Stream timestamp computation needs some further thoughts.
  • KAFKA-4932 - Add support for UUID serialization and deserialization
  • KAFKA-5690 - Add support to list ACLs for a given principal
  • KAFKA-5975 - No response when deleting topics and delete.topic.enable=false
  • KAFKA-6082 - Fence zookeeper updates with controller epoch zkVersion
  • KAFKA-6123 - Give client MetricsReporter auto-generated client.id
  • KAFKA-6195 - Resolve DNS aliases in bootstrap.server (KIP-235)
  • KAFKA-6684 - Support casting Connect values with bytes schema to string
  • KAFKA-6753 - Updating the OfflinePartitions count only when necessary
  • KAFKA-6835 - Enable topic unclean leader election to be enabled without controller change
  • KAFKA-6863 - Kafka clients should try to use multiple DNS resolved IP
  • KAFKA-6914 - Set parent classloader of DelegatingClassLoader same as the worker's
  • KAFKA-6923 - Refactor Serializer/Deserializer for KIP-336
  • KAFKA-6926 - Simplified some logic to eliminate some suppressions of NPath complexity checks
  • KAFKA-6950 - Delay response to failed client authentication to prevent potential DoS issues (KIP-306)
  • KAFKA-6998 - Disable Caching when max.cache.bytes are zero.
  • KAFKA-7080 - and KAFKA-7222: Cleanup overlapping KIP changes
  • KAFKA-7096 - Clear buffered data for partitions that are explicitly unassigned by user
  • KAFKA-7117 - Support AdminClient API in AclCommand (KIP-332)
  • KAFKA-7134 - KafkaLog4jAppender exception handling with ignoreExceptions
  • KAFKA-7139 - Support option to exclude the internal topics in kafka-topics.sh
  • KAFKA-7196 - Remove heartbeat delayed operation for those removed consumers at the end of each rebalance
  • KAFKA-7211 - MM should handle TimeoutException in commitSync
  • KAFKA-7215 - Improve LogCleaner Error Handling
  • KAFKA-7223 - In-Memory Suppression Buffering
  • KAFKA-7240 - -total metrics in Streams are incorrect
  • KAFKA-7277 - Migrate Streams API to Duration instead of longMs times
  • KAFKA-7299 - Batch LeaderAndIsr requests for AutoLeaderRebalance
  • KAFKA-7311 - Reset next batch expiry time on each poll loop
  • KAFKA-7313 - StopReplicaRequest should attempt to remove future replica for the partition only if future replica exists
  • KAFKA-7324 - NPE due to lack of SASLExtensions in SASL/OAUTHBEARER
  • KAFKA-7326 - KStream.print() should flush on each line for PrintStream
  • KAFKA-7332 - Update CORRUPT_MESSAGE exception message description
  • KAFKA-7333 - Protocol changes for KIP-320
  • KAFKA-7338 - Specify AES128 default encryption type for Kerberos tests
  • KAFKA-7366 - Make topic configs segment.bytes and segment.ms to take effect immediately
  • KAFKA-7379 - [streams] send.buffer.bytes should be allowed to set -1 in KafkaStreams
  • KAFKA-7394 - OffsetsForLeaderEpoch supports topic describe access
  • KAFKA-7395 - Add fencing to replication protocol (KIP-320)
  • KAFKA-7396 - Materialized, Serialized, Joined, Consumed and Produced with implicit Serdes
  • KAFKA-7399 - KIP-366, Make FunctionConversions deprecated
  • KAFKA-7400 - Compacted topic segments that precede the log start offse...
  • KAFKA-7403 - Use default timestamp if no expire timestamp set in offset commit value
  • KAFKA-7406 - Name join group repartition topics
  • KAFKA-7409 - Validate message format version before creating topics or altering configs
  • KAFKA-7415 - Persist leader epoch and start offset on becoming a leader
  • KAFKA-7428 - ConnectionStressSpec: add "action", allow multiple clients
  • KAFKA-7429 - Enable key/truststore update with same filename/password
  • KAFKA-7437 - Persist leader epoch in offset commit metadata
  • KAFKA-7439 - Replace EasyMock and PowerMock with Mockito in clients module
  • KAFKA-7441 - Allow LogCleanerManager.resumeCleaning() to be used concurrently
  • KAFKA-7456 - Serde Inheritance in DSL
  • KAFKA-7462 - Make token optional for OAuthBearerLoginModule
  • KAFKA-7464 - catch exceptions in "leaderEndpoint.close()" when shutting down ReplicaFetcherThread
  • KAFKA-7467 - NoSuchElementException is raised because controlBatch is empty
  • KAFKA-7475 - capture remote address on connection authetication errors, and log it
  • KAFKA-7476 - Fix Date-based types in SchemaProjector
  • KAFKA-7477 - Improve Streams close timeout semantics
  • KAFKA-7481 - Add upgrade/downgrade notes for 2.1.x
  • KAFKA-7482 - LeaderAndIsrRequest should be sent to the shutting down broker
  • KAFKA-7483 - Allow streams to pass headers through Serializer.
  • KAFKA-7496 - Handle invalid filters gracefully in KafkaAdminClient#describeAcls
  • KAFKA-7498 - Remove references from `common.requests` to `clients`
  • KAFKA-7501 - Fix producer batch double deallocation when receiving message too large error on expired batch
  • KAFKA-7505 - Process incoming bytes on write error to report SSL failures
  • KAFKA-7519 - Clear pending transaction state when expiration fails
  • KAFKA-7532 - Clean-up controller log when shutting down brokers
  • KAFKA-7534 - Error in flush calling close may prevent underlying store from closing
  • KAFKA-7535 - KafkaConsumer doesn't report records-lag if isolation.level is read_committed
  • KAFKA-7560 - PushHttpMetricsReporter should not convert metric value to double
  • KAFKA-7742 - Fixed removing hmac entry for a token being removed from DelegationTokenCache

Apache Kudu

The following issues are fixed in CDH 6.2.0:

  • The Kudu Python client now detects and reports on conflicting/incorrect initialization of the OpenSSL library to avoid glitches and undefined behavior.

  • KUDU-1678 - Fixed a crash caused by a race condition between altering tablet schemas and deleting tablet replicas.
  • KUDU-2680 - Now the kudu fs update_dirs tool can correctly remove directories in the presence of tablet tombstones.
  • KUDU-2195 - Now you can use the ‑‑cmeta_force_fsync flag to fsync Kudu’s consensus metadata more aggressively. Setting this to truemay decrease Kudu’s performance, but will improve its durability in the face of power failures and forced shutdowns.
  • KUDU-2684 - Fixed an issue that would cause an excessive amount of RPC traffic from Kudu masters if the tablet servers were configured with duplicated master addresses.
  • KUDU-2688 - Fixed an issue that would cause the kudu cluster rebalance tool to run indefinitely in the case of tables with a replication factor of 2.
  • KUDU-2690 - Fixed an issue that could lead to a failure to bootstrap tablet replicas that were a part of workloads with many alter table operations.
  • KUDU-2710 - Fixed an issue with the Java scanner’s keepAlive that could lead to a permanent hang in the scanner.
  • KUDU-2706 - Fixed an issue that would cause undefined behavior upon connecting to a secure cluster concurrently from multiple C++ clients.

Apache Oozie

The following issues are fixed in CDH 6.2.0:

  • OOZIE-1393 - Allow sending emails via TLS
  • OOZIE-2211 - Remove OozieCLI#validateCommandV41
  • OOZIE-2339 - [fluent-job] Minimum Viable Fluent Job API
  • OOZIE-2352 - Unportable shebang in shell scripts
  • OOZIE-2494 - Cron syntax not handling DST properly
  • OOZIE-2684 - Bad database schema error for WF_ACTIONS table
  • OOZIE-2718 - Improve -dryrun for bundles
  • OOZIE-2791 - ShareLib installation may fail on busy Hadoop clusters
  • OOZIE-2826 - Upgrade joda-time to 2.9.9
  • OOZIE-2829 - Improve sharelib upload to accept multiple source folders
  • OOZIE-2937 - Remove redundant groupId from the child POMs
  • OOZIE-2942 - [examples] Fix Findbugs warnings
  • OOZIE-2949 - Fix and backportEscape quotes whitespaces in Sqoop <command> field
  • OOZIE-3109 - [log-streaming] Escape HTML-specific characters
  • OOZIE-3134 - Potential inconsistency between the in-memory SLA map and the Oozie database
  • OOZIE-3155 - [ui] Job DAG is not refreshed when a job is finished
  • OOZIE-3156 - Retry SSH action check when cannot connect to remote host
  • OOZIE-3160 - PriorityDelayQueue put()/take() can cause significant CPU load due to busy waiting
  • OOZIE-3178 - /bin/mkdistro.sh -Papache-release fails due to javadoc errors
  • OOZIE-3185 - Upgrade org.apache.derby to 10.11.1.1
  • OOZIE-3193 - Applications are not killed when submitted via subworkflow
  • OOZIE-3208 - "It should never happen" error messages should be more specific to root cause
  • OOZIE-3209 - XML schema error when submitting pyspark example
  • OOZIE-3210 - [build] Revision information is empty
  • OOZIE-3219 - Cannot compile with hadoop 3.1.0
  • OOZIE-3224 - Upgrade Jetty to 9.3
  • OOZIE-3227 - Eliminate duplicate dependencies when using Hadoop 3 DistributedCache
  • OOZIE-3229 - [client] [ui] Improved SLA filtering options
  • OOZIE-3233 - Remove DST shift from the coordinator job's end time
  • OOZIE-3235 - Upgrade ActiveMQ to 5.15.3
  • OOZIE-3260 - [sla] Remove stale item above max retries on JPA related errors from in-memory SLA map
  • OOZIE-3278 - Oozie fails to start with Hadoop 2.6.0
  • OOZIE-3297 - Retry logic does not handle the exception from BulkJPAExecutor properly
  • OOZIE-3298 - [MapReduce action] External ID is not filled properly and failing MR job is treated as SUCCEEDED
  • OOZIE-3303 - Oozie UI does not work after Jetty 9.3 upgrade
  • OOZIE-3304 - Parsing sharelib timestamps is not threadsafe
  • OOZIE-3307 - [core] Limit heap usage of LauncherAM
  • OOZIE-3309 - Runtime error during /v2/sla filtering for bundle
  • OOZIE-3310 - SQL error during /v2/sla filtering
  • OOZIE-3330 - [spark-action] Remove double quotes inside plain option values
  • OOZIE-3331 - [spark-action] Inconsistency while parsing quoted Spark options
  • OOZIE-3334 - Don't use org.apache.hadoop.hbase.security.User in HDFSCredentials
  • OOZIE-3340 - [fluent-job] Create error handler ACTION only if needed
  • OOZIE-3348 - [Hive action] Remove dependency hive-contrib
  • OOZIE-3354 - [core] [SSH action] SSH action gets hung
  • OOZIE-3369 - [core] Upgrade guru.nidi:graphviz-java to 0.7.0
  • OOZIE-3370 - Property filtering is not consistent across job submission
  • OOZIE-3389 - Getting input dependency list on the UI throws NPE
  • OOZIE-3390 - [Shell action] STDERR contains a bogus error message
  • OOZIE-3400 - [core] Fix PurgeService sub-sub-workflow checking

Apache Parquet

The following issues are fixed in CDH 6.2.0:

  • PARQUET-196 - parquet-tools command for row count & size
  • PARQUET-852 - Slowly ramp up sizes of byte in ByteBasedBitPackingEncoder
  • PARQUET-969 - Decimal datatype support for parquet-tools output
  • PARQUET-1336 - PrimitiveComparator should implements Serializable
  • PARQUET-1407 -Avro: Fix binary values returned from dictionary encoding
  • PARQUET-1421 - InternalParquetRecordWriter logs debug messages at the INFO level
  • PARQUET-1440 - Parquet-tools: Parse int32 or int64 decimal values to big decimals with the proper scale
  • PARQUET-1472 - Parquet-tools: Parse int32 or int64 decimal values to big decimals with the proper scale
  • PARQUET-1475 - Fix lack of cause propagation in DirectCodecFactory.ParquetCompressionCodecException
  • PARQUET-1510 - Fix notEq for optional columns with null values
  • PARQUET-1527 - [parquet-tools] cat command throw java.lang.ClassCastException

Apache Pig

There are no notable fixed issues in this release.

Cloudera Search

The following issues are fixed in CDH 6.2.0:

  • SOLR-2834 - Handle CharacterFilters in Solr
  • SOLR-8207 - Collections with underscores in name no longer cause a crash the Cloud->Nodes UI
  • SOLR-8207 - Add "Nodes" view to the Admin UI "Cloud" tab, listing nodes and key metrics
  • SOLR-8207 - Nodes view support for shard_1_1_1 format and replica1, replica_1 format. Show core state in label if not 'active'
  • SOLR-12570 - OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields because pattern replacement doesn't work correctly
  • SOLR-12597 - Migrate API should fail requests that do not specify split.key parameter
  • SOLR-12649 - CloudSolrClient retries requests unnecessarily exception from server
  • SOLR-12670 - RecoveryStrategy logs wrong wait time when retrying recovery
  • SOLR-12679 - MiniSolrCloudCluster.stopJettySolrRunner should remove jetty from the internal list
  • SOLR-12679 - MiniSolrCloudCluster.startJettySolrRunner method should not add a duplicate jetty instance to the list
  • SOLR-12770 - Make it possible to configure a host whitelist for distributed search
  • SOLR-12776 - Setting of TMP in solr.cmd causes invisibility of Solr to JDK tools

Apache Sentry

The following issues are fixed in CDH 6.2.0:

  • SENTRY-1797 - SentryKerberosContext should use periodic executor instead of managing periodic execution via run() method.
  • SENTRY-2329 - Integrate sentry with Hadoop 3.1.1
  • SENTRY-2372 - SentryStore should not implement grantOptionCheck
  • SENTRY-2428 - Skip null partitions or partitions with null sds entries
  • SENTRY-2437 - When granting privileges a single transaction per grant causes long delays
  • SENTRY-2441 - When MAuthzPathsMapping is deleted all associated MPaths should be deleted automatically..
  • SENTRY-2477 - When requesting for deltas check if nn seq num is 1 more than latest sequence num
  • SENTRY-2488 - Add privilege cache to sentry hive bindings in DefaultAccessValidator
  • SENTRY-2490 - When building a full perm update for each object we only build 1 privilege per role
  • SENTRY-2492 - Consecutive ALL grants get deleted when multiple roles have ALL grants on that object
  • SENTRY-2493 - Sentry store api's for path mapping should handle empty/null paths.
  • SENTRY-2497 - show grant role results should handle case where URI doesn't have a defined scheme.
  • SENTRY-2498 - Exception while deleting paths that does't exist
  • SENTRY-2500 - CREATE on server does not provide HMS server side read authorization for get_all_tables(database_name)
  • SENTRY-2502 - Sentry NN plug-in stops fetching updates from sentry server.
  • SENTRY-2503 - Failed to revoke the privilege from impala-shell if the privilege added from beeline cli on multi-clusters

Apache Spark

The following issues are fixed in CDH 6.2.0:

  • SPARK-22148 - [SPARK-15815][SCHEDULER] Acquire new executors to avoid hang because of blacklisting
  • SPARK-23257 - [K8S] Kerberos Support for Spark on K8S
  • SPARK-23781 - [CORE] Merge token renewer functionality into HadoopDelegationTokenManager.
  • SPARK-23831 - Revert "[SQL] Add org.apache.derby to IsolatedClientLoader"
  • SPARK-24434 - [K8S] pod template files
  • SPARK-24553 - [UI][FOLLOWUP][2.4 BACKPORT] Fix unnecessary UI redirect
  • SPARK-24920 - [CORE] Allow sharing Netty's memory pool allocators
  • SPARK-24958 - [CORE] Add memory from procfs to executor metrics.
  • SPARK-25003 - [PYSPARK] Use SessionExtensions in Pyspark
  • SPARK-25023 - Clarify Spark security documentation
  • SPARK-25118 - [CORE] Persist Driver Logs in Client mode to Hdfs
  • SPARK-25222 - [K8S] Improve container status logging
  • SPARK-25451 - [SPARK-26100][CORE] Aggregated metrics table doesn't show the right number of the total tasks
  • SPARK-25501 - [SS] Add kafka delegation token support.
  • SPARK-25515 - [K8S] Adds a config option to keep executor pods for debugging
  • SPARK-25560 - [SQL] Allow FunctionInjection in SparkExtensions
  • SPARK-25682 - [K8S] Package example jars in same target for dev and distro images.
  • SPARK-25689 - [CORE] Follow up: don't get delegation tokens when kerberos not available.
  • SPARK-25689 - [YARN] Make driver, not AM, manage delegation tokens.
  • SPARK-25730 - [K8S] Delete executor pods from kubernetes after figuring out why they died
  • SPARK-25745 - [K8S] Improve docker-image-tool.sh script
  • SPARK-25778 - WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails ...
  • SPARK-25786 - [CORE] If the ByteBuffer.hasArray is false , it will throw UnsupportedOperationException for Kryo
  • SPARK-25815 - [K8S] Support kerberos in client mode, keytab-based token renewal.
  • SPARK-25828 - [K8S] Bumping Kubernetes-Client version to 4.1.0
  • SPARK-25837 - [CORE] Fix potential slowdown in AppStatusListener when cleaning up stages
  • SPARK-25875 - [K8S] Merge code to set up driver command into a single step.
  • SPARK-25876 - [K8S] Simplify kubernetes configuration types.
  • SPARK-25877 - [K8S] Move all feature logic to feature classes.
  • SPARK-25905 - [CORE] When getting a remote block, avoid forcing a conversion to a ChunkedByteBuffer
  • SPARK-25922 - [K8] Spark Driver/Executor "spark-app-selector" label mismatch
  • SPARK-25957 - [K8S] Make building alternate language binding docker images optional
  • SPARK-25960 - [K8S] Support subpath mounting with Kubernetes
  • SPARK-26002 - [SQL] Fix day of year calculation for Julian calendar days
  • SPARK-26011 - [SPARK-SUBMIT] Yarn mode pyspark app without python main resource does not honor "spark.jars.packages"
  • SPARK-26029 - [BUILD][2.4] Bump previousSparkVersion in MimaBuild.scala to be 2.3.0
  • SPARK-26094 - [CORE][STREAMING] createNonEcFile creates parent dirs.
  • SPARK-26109 - [WEBUI] Duration in the task summary metrics table and the task table are different
  • SPARK-26119 - [CORE][WEBUI] Task summary table should contain only successful tasks' metrics
  • SPARK-26186 - [SPARK-26184][CORE] Last updated time is not getting updated for the Inprogress application
  • SPARK-26194 - [K8S] Auto generate auth secret for k8s apps.
  • SPARK-26201 - Fix python broadcast with encryption
  • SPARK-26219 - [CORE][BRANCH-2.4] Executor summary should get updated for failure jobs in the history server UI
  • SPARK-26236 - [SS] Add kafka delegation token support documentation.
  • SPARK-26239 - File-based secret key loading for SASL.
  • SPARK-26256 - [K8S] Fix labels for pod deletion
  • SPARK-26267 - [SS] Retry when detecting incorrect offsets from Kafka
  • SPARK-26304 - [SS] Add default value to spark.kafka.sasl.kerberos.service.name parameter
  • SPARK-26307 - [SQL] Fix CTAS when INSERT a partitioned table using Hive serde
  • SPARK-26322 - [SS] Add spark.kafka.sasl.token.mechanism to ease delegation token configuration.
  • SPARK-26493 - [SQL] Allow multiple spark.sql.extensions
  • SPARK-26592 - [SS] Throw exception when kafka delegation token tried to obtain with proxy user
  • SPARK-26595 - [CORE] Allow credential renewal based on kerberos ticket cache.
  • SPARK-26694 - [CORE] Progress bar should be enabled by default for spark-shell
  • SPARK-26726 - Synchronize the amount of memory used by the broadcast variable to the UI display
  • SPARK-26745 - [SPARK-24959][SQL][BRANCH-2.4] Revert count optimization in JSON datasource by
  • SPARK-26753 - [CORE] Fixed custom log levels for spark-shell by using Filter instead of Threshold
  • SPARK-26873 - [SQL] Use a consistent timestamp to build Hadoop Job IDs.

Apache Sqoop

The following issues are fixed in CDH 6.2.0:

  • SQOOP-3237 - Mainframe FTP transfer option to insert custom FTP commands prior to transfer
  • SQOOP-3382 - Add parquet numeric support for Parquet in hdfs import
  • SQOOP-3396 - Add parquet numeric support for Parquet in Hive import

Apache Zookeeper

There are no notable fixed issues in this release.