Fixed Issues in CDH 6.0.1

See below for issues fixed in CDH 6.0.1, grouped by component:

Flume
Hadoop
Apache HBase
HDFS
Hive
Hue
Impala
Kudu
Oozie
Search
Spark
CVE-2019-10099: Apache Spark local files left unencrypted
YARN

Apache Flume

The following issues are fixed in CDH 6.0.1:

FLUME-3237 - Handling RuntimeExceptions coming from the JMS provider in JMSSource

Apache Hadoop

The following issues are fixed in CDH 6.0.1:

HADOOP-13597 - Fixed an issue where the KMS consumes memory and increases File Descriptors.
HADOOP-15311 - You can now configure the acceptor/selector count for HttpServer2.
HADOOP-15473 - Fixed an UnrecoverableKeyException caused by JDK-8189997.
HADOOP-15593 - Fixed NPE in UGI spawnAutoRenewalThreadForUserCreds.
HADOOP-15609 - The KMS now retries when an SSLHandshakeException occurs.
HADOOP-15638 - The KMS Accept Queue Size default changed from 500 to 128 in Hadoop 3.x.
HADOOP-15655 - Enhance KMS client retry behavior to retry on timeout.
HADOOP-15696 - Fixed an issue where the KMS experiences performance regression due to too many open file descriptors after Jetty migration.
HADOOP-15698 - Fixed an issue where the KMS log4j is not initialized properly at startup.
HADOOP-15708 - Fixed an issue where reading values from Configuration before adding deprecations makes it impossible to read the values with a deprecated key.

HBase

The following issues are fixed in CDH 6.0.1:

HBASE-19572 - RegionMover should use the configured default port number and not the one from HConstants
HBASE-19722 - Meta query statistics metrics source
HBASE-19764 - Fix Checkstyle errors in hbase-endpoint
HBASE-20244 - NoSuchMethodException when retrieving private method decryptEncryptedDataEncryptionKey from DFSClient
HBASE-20401 - Make MAX_WAIT and waitIfNotFinished in CleanerContext configurable
HBASE-20403 - Fix race between prefetch task and non-pread HFile reads
HBASE-20474 - Show non-RPC tasks on master/regionserver Web UI by default
HBASE-20538 - Upgrade our hadoop versions to 2.7.7 and 3.0.3
HBASE-20565 - ColumnRangeFilter combined with ColumnPaginationFilter can produce incorrect result
HBASE-20614 - REST scan API with incorrect filter text file throws HTTP 503 Service Unavailable error
HBASE-20642 - Clients should re-use the same nonce across DDL operations
HBASE-20648 - HBASE-19364 "Truncate_preserve fails with table when replica region > 1" for master branch
HBASE-20649 - Validate HFiles do not have PREFIX_TREE DataBlockEncoding; ADDEDNDUM ADD MISSING FILE
HBASE-20649 - Validate HFiles do not have PREFIX_TREE DataBlockEncoding
HBASE-20681 - Explicitly include hamcrest in binary tarball
HBASE-20691 - Change the default WAL storage policy back to "NONE""
HBASE-20697 - Can't cache All region locations of the specify table by calling table.getRegionLocator().getAllRegionLocations()
HBASE-20705 - Having RPC quota on a table now no longer prevents Space Quota to be recreate/removed
HBASE-20706 - Prevent MTP from trying to reopen non-OPEN regions
HBASE-20723 - Custom hbase.wal.dir results in data loss because we write recovered edits into a different place than where the recovering region server looks for them
HBASE-20745 - Log when master proc wal rolls
HBASE-20752 - Make sure the regions are truly reopened after ReopenTableRegionsProcedure
HBASE-20770 - WAL cleaner logs way too much; gets clogged when lots of work to do
HBASE-20772 - Controlled shutdown fills Master log with the disturbing message 'No matching procedure found for rit=OPEN, location=ZZZZ, table=YYYYY, region=XXXX transition to CLOSED'
HBASE-20777 - RpcConnection could still remain opened after we shutdown the NettyRpcServer
HBASE-20780 - ServerRpcConnection logging cleanup Get rid of one of the logging lines in ServerRpcConnection by amalgamating all into one new-style log line.
HBASE-20781 - Save recalculating families in a WALEdit batch of Cells
HBASE-20794 - add INFO level log to createTable operation
HBASE-20795 - Allow option in BBKVComparator.compare to do comparison without sequence id
HBASE-20806 - Split style journal for flushes and compactions
HBASE-20810 - Include the procedure id in the exception message in HBaseAdmin for better debugging
HBASE-20812 - Add defaults to Table Interface so implementors don't have to
HBASE-20813 - Removed RPC quotas when the associated table/Namespace is dropped off
HBASE-20817 - Infinite loop when executing ReopenTableRegionsProcedure
HBASE-20825 - Fix pre and post hooks of CloneSnapshot and RestoreSnapshot for Access checks
HBASE-20826 - Truncate really long RpcServer warnings unless TRACE is on
HBASE-20829 - Remove the addFront assertion in MasterProcedureScheduler.doAdd
HBASE-20833 - Modify pre-upgrade coprocessor validator to support table level coprocessors
HBASE-20839 - Fallback to FSHLog if we can not instantiated AsyncFSWAL when user does not specify AsyncFSWAL explicitly
HBASE-20853 - Polish "Add defaults to Table Interface so Implementors don't have to"
HBASE-20856 - PITA having to set WAL provider in two places
HBASE-20860 - Merged region's RIT state may not be cleaned after master restart
HBASE-20867 - RS may get killed while master restarts
HBASE-20869 - Endpoint-based Export use incorrect user to write to destination
HBASE-20875 - MemStoreLABImp::copyIntoCell uses 7% CPU when writing
HBASE-20878 - Data loss if merging regions while ServerCrashProcedure executing
HBASE-20882 - HBASE-20616 "TruncateTableProcedure is stuck in retry loop in TRUNCATE_TABLE_CREATE_FS_LAYOUT state" to branch-2.0
HBASE-20885 - Removed entry for RPC quota from hbase:quota when RPC quota is removed
HBASE-20887 - HBASE-20865 "CreateTableProcedure is stuck in retry loop in CREATE_TABLE_WRITE_FS_LAYOUT state"
HBASE-20903 - HBASE-20792 "info:servername and info:sn inconsistent for OPEN region" to branch-2.0
HBASE-20914 - Trim Master memory usage
HBASE-20921 - Possible NPE in ReopenTableRegionsProcedure
HBASE-20924 - "HBASE-20846 Restore procedure locks when master restarts"
HBASE-20935 - HStore.removeCompactedFiles should log in case it is unable to delete a file
HBASE-20939 - There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException
HBASE-20940 - HStore.cansplit should not allow split to happen if it has references
HBASE-20941 - Created and implemented HbckService in master
HBASE-20942 - Fix ArrayIndexOutOfBoundsException for RpcServer TRACE logging
HBASE-20975 - Lock may not be taken or released while rolling back procedure
HBASE-20978 - [amv2] Worker terminating UNNATURALLY during MoveRegionProcedure
HBASE-20981 - Rollback stateCount accounting thrown-off when exception out of rollbackState
HBASE-20989 - Minor, miscellaneous logging fixes
HBASE-21004 - to branch-2.0 HBASE-20708 "Remove the usage of RecoverMetaProcedure"
HBASE-21007 - Memory leak in HBase REST server
HBASE-21018 - RS crashed because AsyncFS was unable to update HDFS data encryption key
HBASE-21029 - Miscount of memstore's heap/offheap size if same cell was put
HBASE-21031 - Memory leak if replay edits failed during region opening
HBASE-21041 - Memstore's heap size will be decreased to minus zero after flush
HBASE-21047 - Object creation of StoreFileScanner thru constructor and close may leave refCount to -1
HBASE-21050 - Exclusive lock may be held by a SUCCESS state procedure forever
HBASE-21062 - Correctly use the defaultProvider value on the Providers enum when constructing a WALProvider
HBASE-21072 - Block out HBCK1 in hbase2
HBASE-21078 - [amv2] CODE-BUG NPE in RTP doing Unassign
HBASE-21083 - Introduce a mechanism to bypass the execution of a stuck procedure
HBASE-21088 - HStoreFile should be closed in HStore#hasReferences
HBASE-21120 - MoveRegionProcedure makes no progress; goes to STUCK

Region Server occasionally fails when HDFS data transport encryption is enabled

In rare cases, an HBase RegionServer on a Hadoop Data Transfer Encryption enabled cluster (dfs.encrypt.data.transfer = true) may crash because it is not able to update the encryption key.

Workaround: Restart the RegionServer.

Affected Versions: CDH 6.0.0

Fixed Versions: 6.0.1

Apache Issue: HBASE-21018

Cloudera Issue: CDH-71613

Prefetch sometimes doesn't work with encrypted file system

If HBase prefetch is enabled (hbase.rs.prefetchblocksonopen = true) on an encrypted HDFS cluster, HBase RegionServer may crash due to memory corruption.

Workaround: Disable HBase prefetch (hbase.rs.prefetchblocksonopen = false).

Affected Versions: CDH 6.0.0

Fixed Versions: 6.0.1

Apache Issue: HBASE-20403

Cloudera Issue: CDH-68666

Apache HDFS

The following issues are fixed in CDH 6.0.1:

HDFS-5040 - You can now see an audit log for admin commands and output the log of all DFS admin commands.
HDFS-10240 - Fixed an issue where the race between close/recoverLease leads to missing blocks.
HDFS-10453 - Fixed an issue where the ReplicationMonitor thread could get stuck for a long time due to the race between replication and delete of the same file in a large cluster.
HDFS-13051 - Fixed an issue where a deadlock occurs when rolleditlog rpc call happens and editPendingQ is full.
HDFS-13178 - Add a force option to DiskBalancer Execute command
HDFS-13181 - Add a configuration to DiskBalancer for valid plan hours
HDFS-13281 - Fixed an issue where the Namenode#createFile was not /.reserved/raw/ aware.
HDFS-13314 - NameNode optionally exits if it detects FsImage corruption
HDFS-13322 - FUSE lib now recognizes the change of the Kerberos ticket cache path if it was changed with the KRB5CCNAME environment variable during the same user session.
HDFS-13339 - Fixed an issue where volume reference cannot be released and may lead to deadlock when DataXceiver does a check volume
HDFS-13721 - Fixed an NPE in DataNode due to an uninitialized DiskBalancer.
HDFS-13727 - The DiskBalancer now logs a full stack trace if it exits with an unhandled exception.
HDFS-13813 - Added a check to see if a child inode exists in the global FSDirectory dir when saving (serializing) INodeDirectorySection.

Apache Hive

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

HIVE-13696 - Modify fair-scheduler.xml and automatically update/validate jobs submitted to fair-scheduler
HIVE-15387 - NPE in HiveServer2 webUI Historical SQL Operations section
HIVE-16483 - Hive on Spark should populate split-related configurations to HiveConf
HIVE-17213 - Hive on Spark: file merging doesn't work for union all
HIVE-18977 - Listing partitions returns different results with JDO and direct SQL
HIVE-19048 - Initscript errors are ignored
HIVE-19133 - HiveServer2 WebUI phase-wise performance metrics not showing correctly
HIVE-19202 - CBO failed due to NullPointerException in HiveAggregate.isBucketedInput()
HIVE-19251 - ObjectStore.getNextNotification with LIMIT should use less memory
HIVE-19259 - CREATE VIEW that uses UNION ALL fails with 'Table not found'
HIVE-19265 - Potential NPE returned instead of actual exception in Hive#copyFiles
HIVE-19424 - NPE In MetaDataFormatters
HIVE-19668 - HiveServer2 service hanging due to over 30% of the heap wasted by duplicate org.antlr.runtime.CommonToken's and duplicate strings
HIVE-19752 - PerfLogger integration for critical Hive-on-S3 paths
HIVE-19870 - HCatalog dynamic partition query can fail if the table path is managed by Sentry
HIVE-19891 - inserting into external tables with custom partition directories may cause data loss
HIVE-20183 - Inserting from bucketed table can cause data loss if the source table contains an empty bucket
HIVE-20226 - HMS getNextNotification throws exception when request maxEvents exceed table's max_rows
HIVE-20345 - Drop database may hang if the tables get deleted from a different call

Hive Jobs Are Submitted to a Single Queue When Sentry is Deployed

Hive jobs are not submitted into the correct YARN queue when Hive is using Sentry because Hive does not use the YARN API to resolve the user or group of the job's original submitter. This causes the job to be placed in a queue using the placement rules based on the Hive user. The HiveServer2 fair scheduler queue mapping used for "non-impersonation" mode does not handle the primary-secondary queue mappings correctly.

Workaround: If you are a Hive and Sentry user, do not upgrade to CDH 6.0.0. This issue will be fixed as soon as possible. If you must use Hive and Sentry in CDH 6.0.0, see YARN Dynamic Resource Pools Do Not Work with Hive When Sentry Is Enabled for additional workarounds.

Affected Version: CDH 6.0.0

Fixed Versions: CDH 6.0.1, CDH 6.1.0 and later

Cloudera Issue: CDH-51596

Hue

The following issue is fixed in CDH 6.0.1:

Queries from ImpalaDaemonApi failing when Impala is configured with webserver_htpassword

When webserver_htpassword_username and webserver_htpassword_password are used to authenticate the Impala web UIs, the Hue JobBrowser Impala Queries' page returns a 404 error, even with Kerberos authentication.

Workaround: None

Fixed Versions: 6.0.1 6.1.0

Cloudera Issue: CDH-71138

Apache Impala

The following issues are fixed in CDH 6.0.1:

IMPALA-4908 - NULL floats with different value fields now compare equal.
IMPALA-7014 - Disabled stacktrace symbolisation by default.
IMPALA-7078 - Improved memory consumption of Avro scans of wide tables.
IMPALA-7145 - Fixed a memory leak in OpenSSL when spill-to-disk encryption is enabled.
IMPALA-7225 - REFRESH PARTITION on a single partition no longer resets its row count to -1.
IMPALA-7330 - After LOAD DATA, Impala now only refreshes the affected partition.
IMPALA-7360 - Avro scanner sometimes skips blocks when skip marker is on the HDFS block boundary.
IMPALA-7559 - Disabled Parquet stat filtering for UTC-normalized timestamp columns.

Apache Kudu

The following issues are fixed in CDH 6.0.1:

KUDU-2312 - Fixed a crash that could occur on some systems when a query had more than 16 predicates.
KUDU-2509 - Fixed use-after-free in case of a WAL replay error.

Apache Oozie

The following issues are fixed in CDH 6.0.1:

OOZIE-3193 - Applications are not killed when submitted via subworkflow
OOZIE-3330 - and OOZIE-3331 - Spark options parsing bugfix

Cloudera Search

The following issues are fixed in CDH 6.0.1:

SOLR-11590 - Synchronize ZK connect/disconnect handling so that they are processed in linear order
SOLR-12343 - JSON Field Facet refinement can return incorrect counts/stats for sorted buckets
SOLR-12450 - Don't allow referal to external resources in various config files
SOLR-12516 - JSON range facets can incorrectly refine subfacets for buckets

CDH Upgrade fails to delete Solr data from HDFS

The CDH upgrade process fails to delete Solr data from HDFS and the recreated collections fail to be initialized due to the existing indexes.

Workaround: Perform the following steps after you run the CDH Upgrade wizard and before you finalize the HDFS upgrade:

Log in to the Cloudera Manager Admin Console.
Go to the Solr service page.
Stop the Solr service and dependent services. Click Actions > Stop.
Click Actions > Reinitialize Solr State for Upgrade.
Click Actions > Bootstrap Solr Configuration.
Start the Solr and dependent services. Click Actions > Start.
Click Actions > Bootstrap Solr Collections.

Affected Versions: CDH 6.0.0

Fixed Versions: Cloudera Manager 6.0.1

Cloudera Issue: OPSAPS-47502

Solr Service reports stale configurations even after restart

Solr reports stale configurations, and the Solr Server role fails to start with the following error: Role failed to start due to error: The archive already contains creds.localjceks. The issue occurs if your deployment has Solr and HDFS uses LDAP Group Mapping.

Workaround: If you have a CDH 5 cluster and use LDAP Group Mapping, do not upgrade to CDH 6.0.0. If you have a CDH 6.0.0 cluster, disable LDAP Group Mappings.

Affected Versions: Cloudera Manager 6.0.0 and CDH 6.0.0

Fixed Versions: Cloudera Manager 6.0.1

Cloudera Issue: OPSAPS-47321

Cloudera Search configuration migration script fails to detect incompatible SecureAdminHandlers request handler

The SecureAdminHandlers request handler is incompatible with Apache Solr 7, which is used in CDH 6. The Cloudera Search configuration migration script fails to detect this incompatibility.

Workaround: Remove SecureAdminHandlers request handlers from the solrconfig.xml files of any configuration set that uses them during the pre-upgrade configuration migration.

Affected Versions: CDH 6.0.0

Fixed Versions: CDH 6.0.1

Cloudera Issue: CDH-72239

Apache Spark

The following issues are fixed in CDH 6.0.1:

SPARK-21525 - [STREAMING] Check error code from supervisor RPC.
SPARK-23679 - [YARN] Setting RM_HA_URLS for AmIpFilter to avoid redirect failure in YARN mode
SPARK-25253 - [PYSPARK] Refactor local connection & auth code

CVE-2019-10099: Apache Spark local files left unencrypted

Certain operations in Spark leave local files unencrypted on disk, even when local file encryption is enabled with “spark.io.encryption.enabled”.

This includes cached blocks that are fetched to disk (controlled by spark.maxRemoteBlockSizeFetchToMem) in the following cases:

In SparkR when parallelize is used
In Pyspark when broadcast and parallelize are used
In Pyspark when python udfs is used

Products affected:

CDH
CDS Powered by Apache Spark

Affected versions:

CDH 5.15.1 and earlier
CDH 6.0.0
CDS 2.1.0 release 1 and release 2
CDS 2.2.0 release 1 and release 2
CDS 2.3.0 release 3

Users affected: All users who run Spark on CDH and CDS in a multi-user environment.

Date/time of detection: July 2018

Severity (Low/Medium/High): 6.3 Medium (CVSS AV:L/AC:H/PR:N/UI:R/S:U/C:H/I:H/A:N)

Impact: Unencrypted data accessible.

CVE: CVE-2019-10099

Immediate action required: Upgrade to a version of CDH containing the fix.

Workaround: Do not use of pyspark and the fetch-to-disk options.

Fixed versions:

CDH 5.15.2
CDH 5.16.0
CDH 6.0.1
CDS 2.1.0 release 3
CDS 2.2.0 release 3
CDS 2.3.0 release 4

For the latest update on this issue see the corresponding Knowledge article: TSB 20210-336: Apache Spark local files left unencrypted

Apache YARN

The following issues are fixed in CDH 6.0.1:

YARN-6966 - NodeManager metrics may return wrong negative values when NM restart.
YARN-7542 - Fix issue that causes some Running Opportunistic Containers to be recovered as PAUSED.
YARN-8436 - FSParentQueue: Comparison method violates its general contract.
YARN-8518 - test-container-executor test_is_empty() is broken
YARN-8605 - TestDominantResourceFairnessPolicy.testModWhileSorting is flaky.

Categories: CDH | Fixed Issues | Release Notes | Upstream | All Categories

New Features

Unsupported Features