Fixed Issues in CDH 6.0.1

See below for issues fixed in CDH 6.0.1, grouped by component:

Apache Flume

The following issues are fixed in CDH 6.0.1:

  • FLUME-3237 - Handling RuntimeExceptions coming from the JMS provider in JMSSource

Apache Hadoop

The following issues are fixed in CDH 6.0.1:

  • HADOOP-13597 - Fixed an issue where the KMS consumes memory and increases File Descriptors.
  • HADOOP-15311 - You can now configure the acceptor/selector count for HttpServer2.
  • HADOOP-15473 - Fixed an UnrecoverableKeyException caused by JDK-8189997.
  • HADOOP-15593 - Fixed NPE in UGI spawnAutoRenewalThreadForUserCreds.
  • HADOOP-15609 - The KMS now retries when an SSLHandshakeException occurs.
  • HADOOP-15638 - The KMS Accept Queue Size default changed from 500 to 128 in Hadoop 3.x.
  • HADOOP-15655 - Enhance KMS client retry behavior to retry on timeout.
  • HADOOP-15696 - Fixed an issue where the KMS experiences performance regression due to too many open file descriptors after Jetty migration.
  • HADOOP-15698 - Fixed an issue where the KMS log4j is not initialized properly at startup.
  • HADOOP-15708 - Fixed an issue where reading values from Configuration before adding deprecations makes it impossible to read the values with a deprecated key.

HBase

The following issues are fixed in CDH 6.0.1:

  • HBASE-19572 - RegionMover should use the configured default port number and not the one from HConstants
  • HBASE-19722 - Meta query statistics metrics source
  • HBASE-19764 - Fix Checkstyle errors in hbase-endpoint
  • HBASE-20244 - NoSuchMethodException when retrieving private method decryptEncryptedDataEncryptionKey from DFSClient
  • HBASE-20401 - Make MAX_WAIT and waitIfNotFinished in CleanerContext configurable
  • HBASE-20403 - Fix race between prefetch task and non-pread HFile reads
  • HBASE-20474 - Show non-RPC tasks on master/regionserver Web UI by default
  • HBASE-20538 - Upgrade our hadoop versions to 2.7.7 and 3.0.3
  • HBASE-20565 - ColumnRangeFilter combined with ColumnPaginationFilter can produce incorrect result
  • HBASE-20614 - REST scan API with incorrect filter text file throws HTTP 503 Service Unavailable error
  • HBASE-20642 - Clients should re-use the same nonce across DDL operations
  • HBASE-20648 - HBASE-19364 "Truncate_preserve fails with table when replica region > 1" for master branch
  • HBASE-20649 - Validate HFiles do not have PREFIX_TREE DataBlockEncoding; ADDEDNDUM ADD MISSING FILE
  • HBASE-20649 - Validate HFiles do not have PREFIX_TREE DataBlockEncoding
  • HBASE-20681 - Explicitly include hamcrest in binary tarball
  • HBASE-20691 - Change the default WAL storage policy back to "NONE""
  • HBASE-20697 - Can't cache All region locations of the specify table by calling table.getRegionLocator().getAllRegionLocations()
  • HBASE-20705 - Having RPC quota on a table now no longer prevents Space Quota to be recreate/removed
  • HBASE-20706 - Prevent MTP from trying to reopen non-OPEN regions
  • HBASE-20723 - Custom hbase.wal.dir results in data loss because we write recovered edits into a different place than where the recovering region server looks for them
  • HBASE-20745 - Log when master proc wal rolls
  • HBASE-20752 - Make sure the regions are truly reopened after ReopenTableRegionsProcedure
  • HBASE-20770 - WAL cleaner logs way too much; gets clogged when lots of work to do
  • HBASE-20772 - Controlled shutdown fills Master log with the disturbing message 'No matching procedure found for rit=OPEN, location=ZZZZ, table=YYYYY, region=XXXX transition to CLOSED'
  • HBASE-20777 - RpcConnection could still remain opened after we shutdown the NettyRpcServer
  • HBASE-20780 - ServerRpcConnection logging cleanup Get rid of one of the logging lines in ServerRpcConnection by amalgamating all into one new-style log line.
  • HBASE-20781 - Save recalculating families in a WALEdit batch of Cells
  • HBASE-20794 - add INFO level log to createTable operation
  • HBASE-20795 - Allow option in BBKVComparator.compare to do comparison without sequence id
  • HBASE-20806 - Split style journal for flushes and compactions
  • HBASE-20810 - Include the procedure id in the exception message in HBaseAdmin for better debugging
  • HBASE-20812 - Add defaults to Table Interface so implementors don't have to
  • HBASE-20813 - Removed RPC quotas when the associated table/Namespace is dropped off
  • HBASE-20817 - Infinite loop when executing ReopenTableRegionsProcedure
  • HBASE-20825 - Fix pre and post hooks of CloneSnapshot and RestoreSnapshot for Access checks
  • HBASE-20826 - Truncate really long RpcServer warnings unless TRACE is on
  • HBASE-20829 - Remove the addFront assertion in MasterProcedureScheduler.doAdd
  • HBASE-20833 - Modify pre-upgrade coprocessor validator to support table level coprocessors
  • HBASE-20839 - Fallback to FSHLog if we can not instantiated AsyncFSWAL when user does not specify AsyncFSWAL explicitly
  • HBASE-20853 - Polish "Add defaults to Table Interface so Implementors don't have to"
  • HBASE-20856 - PITA having to set WAL provider in two places
  • HBASE-20860 - Merged region's RIT state may not be cleaned after master restart
  • HBASE-20867 - RS may get killed while master restarts
  • HBASE-20869 - Endpoint-based Export use incorrect user to write to destination
  • HBASE-20875 - MemStoreLABImp::copyIntoCell uses 7% CPU when writing
  • HBASE-20878 - Data loss if merging regions while ServerCrashProcedure executing
  • HBASE-20882 - HBASE-20616 "TruncateTableProcedure is stuck in retry loop in TRUNCATE_TABLE_CREATE_FS_LAYOUT state" to branch-2.0
  • HBASE-20885 - Removed entry for RPC quota from hbase:quota when RPC quota is removed
  • HBASE-20887 - HBASE-20865 "CreateTableProcedure is stuck in retry loop in CREATE_TABLE_WRITE_FS_LAYOUT state"
  • HBASE-20903 - HBASE-20792 "info:servername and info:sn inconsistent for OPEN region" to branch-2.0
  • HBASE-20914 - Trim Master memory usage
  • HBASE-20921 - Possible NPE in ReopenTableRegionsProcedure
  • HBASE-20924 - "HBASE-20846 Restore procedure locks when master restarts"
  • HBASE-20935 - HStore.removeCompactedFiles should log in case it is unable to delete a file
  • HBASE-20939 - There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException
  • HBASE-20940 - HStore.cansplit should not allow split to happen if it has references
  • HBASE-20941 - Created and implemented HbckService in master
  • HBASE-20942 - Fix ArrayIndexOutOfBoundsException for RpcServer TRACE logging
  • HBASE-20975 - Lock may not be taken or released while rolling back procedure
  • HBASE-20978 - [amv2] Worker terminating UNNATURALLY during MoveRegionProcedure
  • HBASE-20981 - Rollback stateCount accounting thrown-off when exception out of rollbackState
  • HBASE-20989 - Minor, miscellaneous logging fixes
  • HBASE-21004 - to branch-2.0 HBASE-20708 "Remove the usage of RecoverMetaProcedure"
  • HBASE-21007 - Memory leak in HBase REST server
  • HBASE-21018 - RS crashed because AsyncFS was unable to update HDFS data encryption key
  • HBASE-21029 - Miscount of memstore's heap/offheap size if same cell was put
  • HBASE-21031 - Memory leak if replay edits failed during region opening
  • HBASE-21041 - Memstore's heap size will be decreased to minus zero after flush
  • HBASE-21047 - Object creation of StoreFileScanner thru constructor and close may leave refCount to -1
  • HBASE-21050 - Exclusive lock may be held by a SUCCESS state procedure forever
  • HBASE-21062 - Correctly use the defaultProvider value on the Providers enum when constructing a WALProvider
  • HBASE-21072 - Block out HBCK1 in hbase2
  • HBASE-21078 - [amv2] CODE-BUG NPE in RTP doing Unassign
  • HBASE-21083 - Introduce a mechanism to bypass the execution of a stuck procedure
  • HBASE-21088 - HStoreFile should be closed in HStore#hasReferences
  • HBASE-21120 - MoveRegionProcedure makes no progress; goes to STUCK

Region Server occasionally fails when HDFS data transport encryption is enabled

In rare cases, an HBase RegionServer on a Hadoop Data Transfer Encryption enabled cluster (dfs.encrypt.data.transfer = true) may crash because it is not able to update the encryption key.

Workaround: Restart the RegionServer.

Affected Versions: CDH 6.0.0

Fixed Versions: 6.0.1

Apache Issue: HBASE-21018

Cloudera Issue: CDH-71613

Prefetch sometimes doesn't work with encrypted file system

If HBase prefetch is enabled (hbase.rs.prefetchblocksonopen = true) on an encrypted HDFS cluster, HBase RegionServer may crash due to memory corruption.

Workaround: Disable HBase prefetch (hbase.rs.prefetchblocksonopen = false).

Affected Versions: CDH 6.0.0

Fixed Versions: 6.0.1

Apache Issue: HBASE-20403

Cloudera Issue: CDH-68666

Apache HDFS

The following issues are fixed in CDH 6.0.1:

  • HDFS-5040 - You can now see an audit log for admin commands and output the log of all DFS admin commands.
  • HDFS-10240 - Fixed an issue where the race between close/recoverLease leads to missing blocks.
  • HDFS-10453 - Fixed an issue where the ReplicationMonitor thread could get stuck for a long time due to the race between replication and delete of the same file in a large cluster.
  • HDFS-13051 - Fixed an issue where a deadlock occurs when rolleditlog rpc call happens and editPendingQ is full.
  • HDFS-13178 - Add a force option to DiskBalancer Execute command
  • HDFS-13181 - Add a configuration to DiskBalancer for valid plan hours
  • HDFS-13281 - Fixed an issue where the Namenode#createFile was not /.reserved/raw/ aware.
  • HDFS-13314 - NameNode optionally exits if it detects FsImage corruption
  • HDFS-13322 - FUSE lib now recognizes the change of the Kerberos ticket cache path if it was changed with the KRB5CCNAME environment variable during the same user session.
  • HDFS-13339 - Fixed an issue where volume reference cannot be released and may lead to deadlock when DataXceiver does a check volume
  • HDFS-13721 - Fixed an NPE in DataNode due to an uninitialized DiskBalancer.
  • HDFS-13727 - The DiskBalancer now logs a full stack trace if it exits with an unhandled exception.
  • HDFS-13813 - Added a check to see if a child inode exists in the global FSDirectory dir when saving (serializing) INodeDirectorySection.

Apache Hive

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • HIVE-13696 - Modify fair-scheduler.xml and automatically update/validate jobs submitted to fair-scheduler
  • HIVE-15387 - NPE in HiveServer2 webUI Historical SQL Operations section
  • HIVE-16483 - Hive on Spark should populate split-related configurations to HiveConf
  • HIVE-17213 - Hive on Spark: file merging doesn't work for union all
  • HIVE-18977 - Listing partitions returns different results with JDO and direct SQL
  • HIVE-19048 - Initscript errors are ignored
  • HIVE-19133 - HiveServer2 WebUI phase-wise performance metrics not showing correctly
  • HIVE-19202 - CBO failed due to NullPointerException in HiveAggregate.isBucketedInput()
  • HIVE-19251 - ObjectStore.getNextNotification with LIMIT should use less memory
  • HIVE-19259 - CREATE VIEW that uses UNION ALL fails with 'Table not found'
  • HIVE-19265 - Potential NPE returned instead of actual exception in Hive#copyFiles
  • HIVE-19424 - NPE In MetaDataFormatters
  • HIVE-19668 - HiveServer2 service hanging due to over 30% of the heap wasted by duplicate org.antlr.runtime.CommonToken's and duplicate strings
  • HIVE-19752 - PerfLogger integration for critical Hive-on-S3 paths
  • HIVE-19870 - HCatalog dynamic partition query can fail if the table path is managed by Sentry
  • HIVE-19891 - inserting into external tables with custom partition directories may cause data loss
  • HIVE-20183 - Inserting from bucketed table can cause data loss if the source table contains an empty bucket
  • HIVE-20226 - HMS getNextNotification throws exception when request maxEvents exceed table's max_rows
  • HIVE-20345 - Drop database may hang if the tables get deleted from a different call

Hive Jobs Are Submitted to a Single Queue When Sentry is Deployed

Hive jobs are not submitted into the correct YARN queue when Hive is using Sentry because Hive does not use the YARN API to resolve the user or group of the job's original submitter. This causes the job to be placed in a queue using the placement rules based on the Hive user. The HiveServer2 fair scheduler queue mapping used for "non-impersonation" mode does not handle the primary-secondary queue mappings correctly.

Workaround: If you are a Hive and Sentry user, do not upgrade to CDH 6.0.0. This issue will be fixed as soon as possible. If you must use Hive and Sentry in CDH 6.0.0, see YARN Dynamic Resource Pools Do Not Work with Hive When Sentry Is Enabled for additional workarounds.

Affected Version: CDH 6.0.0

Fixed Versions: CDH 6.0.1, CDH 6.1.0 and later

Cloudera Issue: CDH-51596

Hue

The following issue is fixed in CDH 6.0.1:

Queries from ImpalaDaemonApi failing when Impala is configured with webserver_htpassword

When webserver_htpassword_username and webserver_htpassword_password are used to authenticate the Impala web UIs, the Hue JobBrowser Impala Queries' page returns a 404 error, even with Kerberos authentication.

Workaround: None

Fixed Versions: 6.0.1 6.1.0

Cloudera Issue: CDH-71138

Apache Impala

The following issues are fixed in CDH 6.0.1:

  • IMPALA-4908 - NULL floats with different value fields now compare equal.
  • IMPALA-7014 - Disabled stacktrace symbolisation by default.
  • IMPALA-7078 - Improved memory consumption of Avro scans of wide tables.
  • IMPALA-7145 - Fixed a memory leak in OpenSSL when spill-to-disk encryption is enabled.
  • IMPALA-7225 - REFRESH PARTITION on a single partition no longer resets its row count to -1.
  • IMPALA-7330 - After LOAD DATA, Impala now only refreshes the affected partition.
  • IMPALA-7360 - Avro scanner sometimes skips blocks when skip marker is on the HDFS block boundary.
  • IMPALA-7559 - Disabled Parquet stat filtering for UTC-normalized timestamp columns.

Apache Kudu

The following issues are fixed in CDH 6.0.1:

  • KUDU-2312 - Fixed a crash that could occur on some systems when a query had more than 16 predicates.
  • KUDU-2509 - Fixed use-after-free in case of a WAL replay error.

Apache Oozie

The following issues are fixed in CDH 6.0.1:

  • OOZIE-3193 - Applications are not killed when submitted via subworkflow
  • OOZIE-3330 - and OOZIE-3331 - Spark options parsing bugfix

Cloudera Search

The following issues are fixed in CDH 6.0.1:

  • SOLR-11590 - Synchronize ZK connect/disconnect handling so that they are processed in linear order
  • SOLR-12343 - JSON Field Facet refinement can return incorrect counts/stats for sorted buckets
  • SOLR-12450 - Don't allow referal to external resources in various config files
  • SOLR-12516 - JSON range facets can incorrectly refine subfacets for buckets

CDH Upgrade fails to delete Solr data from HDFS

The CDH upgrade process fails to delete Solr data from HDFS and the recreated collections fail to be initialized due to the existing indexes.

Workaround: Perform the following steps after you run the CDH Upgrade wizard and before you finalize the HDFS upgrade:
  1. Log in to the Cloudera Manager Admin Console.
  2. Go to the Solr service page.
  3. Stop the Solr service and dependent services. Click Actions > Stop.
  4. Click Actions > Reinitialize Solr State for Upgrade.
  5. Click Actions > Bootstrap Solr Configuration.
  6. Start the Solr and dependent services. Click Actions > Start.
  7. Click Actions > Bootstrap Solr Collections.

Affected Versions: CDH 6.0.0

Fixed Versions: Cloudera Manager 6.0.1

Cloudera Issue: OPSAPS-47502

Solr Service reports stale configurations even after restart

Solr reports stale configurations, and the Solr Server role fails to start with the following error: Role failed to start due to error: The archive already contains creds.localjceks. The issue occurs if your deployment has Solr and HDFS uses LDAP Group Mapping.

Workaround: If you have a CDH 5 cluster and use LDAP Group Mapping, do not upgrade to CDH 6.0.0. If you have a CDH 6.0.0 cluster, disable LDAP Group Mappings.

Affected Versions: Cloudera Manager 6.0.0 and CDH 6.0.0

Fixed Versions: Cloudera Manager 6.0.1

Cloudera Issue: OPSAPS-47321

Cloudera Search configuration migration script fails to detect incompatible SecureAdminHandlers request handler

The SecureAdminHandlers request handler is incompatible with Apache Solr 7, which is used in CDH 6. The Cloudera Search configuration migration script fails to detect this incompatibility.

Workaround: Remove SecureAdminHandlers request handlers from the solrconfig.xml files of any configuration set that uses them during the pre-upgrade configuration migration.

Affected Versions: CDH 6.0.0

Fixed Versions: CDH 6.0.1

Cloudera Issue: CDH-72239

Apache Spark

The following issues are fixed in CDH 6.0.1:

  • SPARK-21525 - [STREAMING] Check error code from supervisor RPC.
  • SPARK-23679 - [YARN] Setting RM_HA_URLS for AmIpFilter to avoid redirect failure in YARN mode
  • SPARK-25253 - [PYSPARK] Refactor local connection & auth code

Apache YARN

The following issues are fixed in CDH 6.0.1:

  • YARN-6966 - NodeManager metrics may return wrong negative values when NM restart.
  • YARN-7542 - Fix issue that causes some Running Opportunistic Containers to be recovered as PAUSED.
  • YARN-8436 - FSParentQueue: Comparison method violates its general contract.
  • YARN-8518 - test-container-executor test_is_empty() is broken
  • YARN-8605 - TestDominantResourceFairnessPolicy.testModWhileSorting is flaky.