Issues Fixed in CDH 5.15.x

The following topics describe issues fixed in CDH 5.15.x, from newest to oldest release. You can also review What's New in CDH 5.15.x or Known Issues in CDH 5.

Issues Fixed in CDH 5.15.2
Issues Fixed in CDH 5.15.1
Issues Fixed in CDH 5.15.0

Issues Fixed in CDH 5.15.2

CVE-2019-10099: Apache Spark local files left unencrypted

Certain operations in Spark leave local files unencrypted on disk, even when local file encryption is enabled with “spark.io.encryption.enabled”.

This includes cached blocks that are fetched to disk (controlled by spark.maxRemoteBlockSizeFetchToMem) in the following cases:

In SparkR when parallelize is used
In Pyspark when broadcast and parallelize are used
In Pyspark when python udfs is used

Products affected:

CDH
CDS Powered by Apache Spark

Affected versions:

CDH 5.15.1 and earlier
CDH 6.0.0
CDS 2.1.0 release 1 and release 2
CDS 2.2.0 release 1 and release 2
CDS 2.3.0 release 3

Users affected: All users who run Spark on CDH and CDS in a multi-user environment.

Date/time of detection: July 2018

Severity (Low/Medium/High): 6.3 Medium (CVSS AV:L/AC:H/PR:N/UI:R/S:U/C:H/I:H/A:N)

Impact: Unencrypted data accessible.

CVE: CVE-2019-10099

Immediate action required: Upgrade to a version of CDH containing the fix.

Workaround: Do not use of pyspark and the fetch-to-disk options.

Fixed versions:

CDH 5.15.2
CDH 5.16.0
CDH 6.0.1
CDS 2.1.0 release 3
CDS 2.2.0 release 3
CDS 2.3.0 release 4

For the latest update on this issue see the corresponding Knowledge article: TSB 20210-336: Apache Spark local files left unencrypted

CVE-2018-1296 Permissive Apache Hadoop HDFS listXAttr Authorization Exposes Extended Attribute Key/Value Pairs

AHDFS exposes extended attribute key/value pairs during listXAttrs, verifying only path-level search access to the directory rather than path-level read permission to the referent.

Products affected: Apache HDFS

Releases affected:

CDH 5.4.0 - 5.15.1, 5.16.0
CDH 6.0.0, 6.0.1, 6.1.0

Users affected: Users who store sensitive data in extended attributes, such as users of HDFS encryption.

Date/time of detection: Dcember 12, 2017

Detected by: Rushabh Shah, Yahoo! Inc., Hadoop committer

Severity (Low/Medium/High): Medium

Impact: HDFS exposes extended attribute key/value pairs during listXAttrs, verifying only path-level search access to the directory rather than path-level read permission to the referent. This affects features that store sensitive data in extended attributes.

CVE: CVE-2018-1296

Immediate action required:

Upgrade: Update to a version of CDH containing the fix.
Workaround: If a file contains sensitive data in extended attributes, users and admins need to change the permission to prevent others from listing the directory that contains the file.

Addressed in release/refresh/patch:

CDH 5.15.2, 5.16.1
CDH 6.1.1, 6.2.0

Missing authorization in Apache Impala may allow data injection

A malicious user who is authenticated with Kerberos may have unauthorized access to internal services used by Impala to transfer intermediate data during query execution. If details of a running query (e.g. query ID, query plan) are available, a user can craft some RPC requests with custom software to inject data into a running query or end query execution prematurely, leading to wrong results of the query.

Cloudera Issue: CDH-72373 / TSB-338

CVE: CVE-2018-11785

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.15.2:

Apache Hadoop

HADOOP-15473 - Enhanced serialFilter in KeyProvider to avoid UnrecoverableKeyException caused by JDK-8189997.
HADOOP-15655 - Enhanced KMS client retry behavior.
HDFS-10240 - Fixed an issue where a race between close/recoverLease leads to missing blocks.
HDFS-12299 - Fixed a race between update pipeline and DataNode Re-Registration.
HDFS-13322 - Fixed an issue with fuse dfs that caused the UID to persist when switching between ticket caches.
HDFS-13486 - Backport HDFS-11817 to fix an issue where a faulty node can cause a lease leak and NPE on accessing data.
HDFS-13813 - Enhanced NameNode behavior to exit if dangling child inode is detected when saving FsImage.
MAPREDUCE-7053 - Fixed an issue where timed out tasks can fail to produce thread dump.
YARN-6966 - Fixed an issue where NodeManager metrics may return wrong negative values when NM restart.
YARN-6967 - Fixed an issue where an application attempt's diagnostic message size was not properly limited.
YARN-8436 - Fixed an issue with FSParentQueue where the comparison method violates its general contract.

Apache HBase

HBASE-19730 - HBASE-14497 Reverse Scan threw StackOverflow caused by readPt checking
HBASE-19924 - hbase rpc throttling does not work for multi() with request count rater.
HBASE-20493 - Port HBASE-19994 (Create a new class for RPC throttling exception, make it retryable) to branch-1
HBASE-20723 - Custom hbase.wal.dir results in data loss because we write recovered edits into a different place than where the recovering region server looks for them.
HBASE-20997 - rebuildUserRegions() does not build ReplicaMapping during master switchover
HBASE-21275 - Disable TRACE HTTP method for thrift http server

Apache Hive

HIVE-12981 - ThriftCLIService uses incompatible getShortName() implementation
HIVE-13394 - Analyze table fails in tez on empty partitions
HIVE-14236 - Partial backport ofto fix union-stats related errors
HIVE-14560 - Support exchange partition between s3 and hdfs tables
HIVE-14690 - Query fail when hive.exec.parallel=true, with conflicting session dir
HIVE-16483 - HoS should populate split related configurations to HiveConf
HIVE-17213 - HoS: file merging doesn't work for union all
HIVE-19259 - Create view on tables having union all fail with 'Table not found'
HIVE-20183 - Inserting from bucketed table can cause data loss, if the source table contains empty bucket
HIVE-20345 - Drop database may hang if the tables get deleted from a different call
HIVE-20678 - HiveHBaseTableOutputFormat should implement HiveOutputFormat to ensure compatibility
HIVE-20695 - HoS Query fails with hive.exec.parallel=true

Hue

HUE-8128 - [backend] Force debug logging in server logs does not get all debug
HUE-8398 - [editor] Fix broken result table after multiple queries in embedded mode
HUE-8399 - [editor] Various improvements for embedded mode
HUE-8399 - [editor] Fix various embedded mode issues in old version
HUE-8399 - [editor] Limit functions assist to Impala in embedded mode
HUE-8399 - [editor] Hide the right assistant in embedded mode
HUE-8451 - [notebook] Many "codec can't decode byte" errors on pig execution if browser language=jp
HUE-8458 - [frontend] Fix issue with async loading of js resources in the dashboard
HUE-8458 - [frontend] Evaluate the js resources while others are being fetched
HUE-8458 - [frontend] Load new scripts using $.get and eval instead of appending <script> tags
HUE-8464 - [core] Fix SAML encryption missing key file passphrase
HUE-8467 - [jobbrowser] Support impala digest auth for queries
HUE-8468 - [frontend] Append a style tag to head instead of modifying stylesheets for dynamic styles in embedded mode
HUE-8475 - [report] Protect against pivot conflicting with nested facets
HUE-8487 - [useradmin] Fix Add Sync LDAP user fails when using DN with special character
HUE-8505 - [core] Close impala session on logout
HUE-8558 - [jb] Add tracking URL to Spark Jobs and remove url and killUrl
HUE-8564 - [useradmin] Fix last activity update for notebook/api/check_status
HUE-8564 - [useradmin] Fix last activity update for jobbrowser/api/jobs requests

Apache Impala

IMPALA-6907 - Now Impala correctly closes all stale connections to removed impala cluster members.
IMPALA-7225 - Fixed an issue where the REFRESH...PARTITION statement caused statistics for the refreshed partition to be automatically reset to -1 (unknown) . With the fix, statistics will be changed only if an explicit COMPUTE STATS statement is issued for an object.
IMPALA-7272 - Fixed a crash caused by a memory management problem when the query execution requires finding strings inside a range defined by the lesser-than and greater-than comparisons.
IMPALA-7360 - Fixed an issue where Impala could incorrectly skip data if a record separator in a sequence-based file (Avro, RC or sequence file) straddled an HDFS block boundary.
IMPALA-7537 - Fixed a security issue where REVOKE ALL ON SERVER did not have a permanent effect if the ALL permission was granted using the WITH GRANT option. Running INVALIDATE METADATA no longer causes the permission to reappear.
IMPALA-7585 - Fixed an issue in KRPC, which can cause slow or hung queries on non-secured clusters.

Apache Kudu

KUDU-2463 - Fixed an issue in which incorrect results would be returned in scans following a server restart.
KUDU-2509 - Fixed an issue that might result in a crash of a tablet server in case of a WAL replay error while bootstrapping a tablet.
KUDU-2580 - Fixed authentication token reacquisition in the C++ client.

Apache Oozie

OOZIE-2457 - Oozie log parsing regex consume more than 90% cpu
OOZIE-3354 - [core] [SSH action] SSH action gets hung
OOZIE-3370 - Property filtering is not consistent across job submission

Apache Sentry

SENTRY-1944 - Optimize DelegateSentryStore.getGroupsByRoles() and update SentryGenericPolicyProcessor to retrieve roles to group mapping in a single transaction
SENTRY-2194 - Upgrade Sentry hadoop-version dependency to 2.7.5
SENTRY-2214 - Sentry should not allow URI grants to EMPTY or NULL locations
SENTRY-2309 - ModifiedCatch NPE thrown when fetching Partitions with no corresponding SDS entry
SENTRY-2332 - Load hadoop default configuration when starting sentry service
SENTRY-2403 - Incorrect naming in RollingFileWithoutDeleteAppender
SENTRY-2406 - Make sure inputHierarchy and outputHierarchy have unique values
SENTRY-2428 - Skip null partitions or partitions with null sds entries

Apache Spark

SPARK-25253 - [PYSPARK] Refactor local connection & auth code
SPARK-25318 - Add exception handling when wrapping the input stream during the the fetch or stage retry in response to a corrupted block

Apache Zookeeper

ZOOKEEPER-706 - Large numbers of watches can cause session re-establishment to fail
ZOOKEEPER-1382 - Zookeeper server holds onto dead/expired session ids in the watch data structures

Issues Fixed in CDH 5.15.1

CDH 5.15.1 fixes the following issues.

Hadoop YARN Privilege Escalation CVE-2016-6811
Apache HBase potential privilege escalation for user of HBase “Thrift 1” API Server over HTTP CVE-2018-8025
Apache Hive CREATE TABLE EXTERNAL commands run very slowly on Amazon S3
Cloudera Search restore operation puts shard replicas on same host
Zip Slip Vulnerability CVE-2018-8009
hdfs snapshotDiff /.reserved/raw/... fails on snapshottable directories
Upstream Issues Fixed

Hadoop YARN Privilege Escalation CVE-2016-6811

Fixed a vulnerability in Hadoop YARN that allows a user who can escalate to the yarn user the ability to possibly run arbitrary commands as the root user. CVE-2016-6811

Apache HBase potential privilege escalation for user of HBase “Thrift 1” API Server over HTTP CVE-2018-8025

CVE-2018-8025 describes an issue in Apache HBase that affects the optional "Thrift 1" API server when running over HTTP. There is a race-condition that could lead to authenticated sessions being incorrectly applied to users, e.g. one authenticated user would be considered a different user or an unauthenticated user would be treated as an authenticated user.

Products affected: HBase Thrift Server

Releases affected:

CDH 5.4.x - 5.12.x
CDH 5.13.0, 5.13.1, 5.13.2, 5.13.3
CDH 5.14.0, 5.14.2, 5.14.3
CDH 5.15.0

Fixed versions: CDH 5.14.4, 5.15.1

Users affected: Users with the HBase Thrift 1 service role installed and configured to work in “thrift over HTTP” mode. For example, those using Hue with HBase impersonation enabled.

Severity: High

Potential privilege escalation.

CVE: CVE-2018-8025

Immediate action required: Upgrade to a CDH version with the fix, or, disable the HBase Thrift-over-HTTP service. Disabling the HBase Thrift-over-HTTP service will render Hue impersonation inoperable and all HBase access via Hue will be performed using the “hue” user instead of the authenticated user.

Knowledge article: For the latest update on this issue see the corresponding Knowledge article - TSB: 2018-315: Potential privilege escalation for user of HBase “Thrift 1” API Server over HTTP

Apache Hive CREATE TABLE EXTERNAL commands run very slowly on Amazon S3

When you run a CREATE EXTERNAL TABLE command on Amazon S3 storage, the command runs extremely slowly and might not complete. This usually occurs for tables that have pre-existing, nested data. There is no known workaround.

Affected Versions: CDH 5.15.0

Cloudera Bug: CDH-68833

Workaround: None

Fixed versions: CDH 5.15.1 and later

Cloudera Search restore operation puts shard replicas on same host

Restoring an Apache Solr collection sometimes places all shard replicas on the same host.

Cloudera Issue: CDH-68828

Zip Slip Vulnerability CVE-2018-8009

“Zip Slip” is a widespread arbitrary file overwrite critical vulnerability, which typically results in remote command execution. It was discovered and responsibly disclosed by the Snyk Security team ahead of a public disclosure on June 5, 2018, and affects thousands of projects.

Cloudera has analyzed our use of zip-related software, and has determined that only Apache Hadoop is vulnerable to this class of vulnerability in CDH 5. This has been fixed in upstream Hadoop as CVE-2018-8009.

hdfs snapshotDiff /.reserved/raw/... fails on snapshottable directories

Fixed an issue where the hdfs snapshotDiff command fails when the command is supplied with the raw path.

Cloudera Issue: CDH-66029

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.15.1:

Apache Flume

FLUME-2786, FLUME-3056, FLUME-3117 - Application enters a deadlock when stopped while handleConfigurationEvent
FLUME-2894 - Flume components should stop in the correct order
FLUME-2976 - Exception when JMS source tries to connect to a Weblogic server without authentication
FLUME-3222 - Fix for NoSuchFileException thrown when files are being deleted

Apache Hadoop

HADOOP-13024 - Distcp with -delete feature on raw data not implemented
HADOOP-13972 - ADLS to support per-store configuration
HADOOP-15186 - Allow Azure Data Lake SDK dependency version to be set on the command line
HADOOP-15317 - Improve NetworkTopology chooseRandom's loop
HADOOP-15342 - Updating ADLS connector to use SDK version 2.2.7
HADOOP-15356 - Make the HTTP timeout configurable in ADLS connector
HADOOP-15434 - Upgrade to ADLS SDK that exposes current timeout
HADOOP-15466 - Correct units in adl.http.timeout property
HDFS-9229 - Expose size of NameNode directory as a metric
HDFS-11751 - DFSZKFailoverController daemon exits with wrong status code
HDFS-11993 - Add log info when connect to datanode socket address failed
HDFS-12683 - DFSZKFailOverController re-order logic for logging Exception
HDFS-12710 - HTTPFS HTTP max header size env variable is not respected in branch-2
HDFS-12981 - Running renameSnapshot on a non-existent snapshot to itself should throw error
HDFS-13281 - Namenode#createFile should be /.reserved/raw/ aware
HDFS-13314 - NameNode should optionally exit if it detects FsImage corruption
MAPREDUCE-7094 - LocalDistributedCacheManager leaves classloaders open, which leaks File Descriptorss
YARN-4227 - Ignore expired containers from removed nodes in FairScheduler
YARN-4325 - NodeManager log handlers fail to send finished/failed events in some cases
YARN-4677 - RMNodeResourceUpdateEvent update from scheduler can lead to a race condition
YARN-5121 - Fix container-executor portability issues

Apache HBase

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

HBASE-20229 - ConnectionImplementation.locateRegions() returns duplicated entries when region replication is on
HBASE-20293 - get_splits returns duplicate split points when region replication is on
HBASE-20664 - Reduce the broad scope of outToken in ThriftHttpServlet

Apache Hive

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

HIVE-9915 - Allow specifying file format for managed tables
HIVE-15580 - Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark
HIVE-15682 - Eliminate per-row based dummy iterator creation
HIVE-15683 - Make configurable how Hive on Spark shuffles data for GROUP BY
HIVE-16080 - Add parquet to possible values for hive.default.fileformat and hive.default.fileformat.managed
HIVE-16659 - Query plan should reflect hive.spark.use.groupby.shuffle
HIVE-19041 - Thrift deserialization of Partition objects should intern fields
HIVE-19231 - Beeline generates garbled output when using UnsupportedTerminal
HIVE-19424 - NPE In MetaDataFormatters
HIVE-19668 - Over 30% of the heap is wasted by duplicate org.antlr.runtime.CommonTokens and duplicate strings
HIVE-19700 - Workaround for JLine issue with UnsupportedTerminal
HIVE-19870 - HCatalog dynamic partition query can fail if the table path is managed by Sentry

Hue

HUE-6697 - [jb] Prevent reset of job page tabs when job is running
HUE-7946 - [jb] Link to subworkflow on workflow dashboard page 404s in Hue 4
HUE-7989 - [useradmin] Provide better UI message and message in logs when ldap server down
HUE-8053 - Fix unit test
HUE-8053 - [useradmin] LDAP authentication with sync_groups_on_login=true fails with KeyError exception
HUE-8055 - [desktop] Support multiple LDAP servers in LDAP Test command
HUE-8063 - [fb] Better error message when HTTPFS is not working
HUE-8137 - [core] Refresh translations files for 4.2
HUE-8159 - [oozie] Unable to create Workflow/Schedule using Java document Action
HUE-8170 - [useradmin] Fix LDAP sync (ldap_access.py) certificate validation logic
HUE-8173 - [core] Add a warning log when a users switched to the old Hue 3 UI
HUE-8173 - [core] Add a log line when a user loads a page not via the load balancer even if we have one
HUE-8191 - [jb] UnicodeEncodeError occurs in Job Browser when browser language is not English
HUE-8202 - [jb] Fix mutual authentication failed with Isilon
HUE-8207 - [indexer] Issue previewing input file with unicode data
HUE-8217 - [dashboard] Fix HTML resultset widget templating
HUE-8232 - [oozie] Fix 500 error if coordinator associated workflow has been deleted
HUE-8236 - [core] Correct Hue config value use integer instead of math expression
HUE-8253 - [editor] Support downloading Query results with query names(file names) other than ISO-8859-1 charset
HUE-8253 - [editor] Fix the broken unit test in 5.15 due to conflict after backporting
HUE-8274 - [s3] Moving a folder using drag and drop deletes the folder itself
HUE-8280 - [fb] Move action button does not prevent move to itself
HUE-8280 - [fb] Fix the missed conflict issue after backporting
HUE-8282 - [fb] Check if EC2 instance before check IAM metadata
HUE-8305 - [useradmin] Optimize performance on checking Hue permissions if user is in many groups
HUE-8310 - Broken template for multiple custom apps
HUE-8313 - [core] Remove hardcoding to ImpersonationBackend when using embedded mode
HUE-8314 - [core] Fix SAML encryption missing config
HUE-8320 - [core] After copying a workflow using Hue 4 button, saving the copied workflow fails
HUE-8344 - [hbase] Hbase old version of data can not display in Hue
HUE-8350 - [solr] indexer app permission is not being acknowledged in HUE
HUE-8350 - [solr] indexer app permission is not being acknowledged in HUE
HUE-8370 - [pig] Imported old version Pig script missing properties fields
HUE-8392 - [oozie] Cannot add more actions using drag & drop from actions bar in the Oozie editor after adding around 3 actions
HUE-8404 - [useradmin] Fix multibackend invalid password removes drop down to select Local
HUE-8409 - [core] When idle session timeout is enabled it causes issues with Spnego
HUE-8440 - [jb] Link for Spark logs in Properties tab of Job Browser is incorrect
HUE-8455 - [pig] Oozie editor fails with 'hadoopProperties' for pig script saved in Hue 3

Apache Impala

IMPALA-6687 - Fixed INSERT with mixed case in partition column names.
IMPALA-6822 - Added a query option to control shuffling by distinct expressions.
IMPALA-6847 - Added a work around high memory estimates for admission control.
IMPALA-6908 - IsConnResetTException() should include ECONNRESET.
IMPALA-6934 - Corrected the wrong results with EXISTS subqueries that contain ORDER BY, LIMIT, and OFFSET.
IMPALA-7014 - Disabled the stacktrace symbolisation by default.
IMPALA-7078 - Reduced the queue size based on num_scanner_threads.
IMPALA-7078 - Improved memory consumption of wide Avro scans.
IMPALA-7288 - Fixed the Codegen crash in FinalizeModule.
IMPALA-7298 - Impala no longer passes IP address as hostname in Kerberos principal.

Apache Kudu

KUDU-2367 - Fixed an issue where a permanently failed tablet replica was not properly identified, which could cause the tablet not to re-replicate in very small clusters.
KUDU-2377 - Fixed an issue that caused Kudu servers to fail to start when RLIMIT_NPROC=-1.
KUDU-2378 - Fixed unaligned loads of int128 from rows.
KUDU-2379 - Fixed an issue that caused secure Spark jobs to fail.
KUDU-2416 - Fixed PartialRow.setMin.
KUDU-2443 - Fixed replica movement and replacement for RF=1.
KUDU-2447 - Fixed the tablet server crash with the error, "NONE predicate can not be pushed into key".
KUDU-2478 - Restored Python 2.6 compatibility.
Added the ability to adjust scan timeouts in Spark.
Increased the timeout to begin tablet copies, which improves Kudu's re-replication time when the cluster is busy.
Fixed a NullPointerException thrown when calling ColumnSchema#toString on non-decimal types.
Greatly improved the performance of many types of queries on tables from which many rows have been deleted.
Fixed an issue that caused partition pruning to be too conservative for queries from the Java client that use Decimal predicates.

Apache Oozie

OOZIE-2491 - oozie acl cannot specify group,it does not work
OOZIE-3134 - Potential inconsistency between the in-memory SLA map and the Oozie database
OOZIE-3260 - [sla] Remove stale item above max retries on JPA related errors from in-memory SLA map

Apache Parquet

PARQUET-1246 - Ignore float/double statistics in case of NaN

Apache Sentry

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

SENTRY-1209 - Sentry does not block Hive's cross-schema table renames
SENTRY-2020 - Fix testConsumeCycleWithInsufficientPrivileges test failure in kafka e2e tests
SENTRY-2144 - Table Rename Cross Database should update permission correctly
SENTRY-2165 - NotificationProcesser process notification methods have logs wrongly flagged as ERROR
SENTRY-2183 - Increase default sentry-hdfs rpc timeout to 20 mins
SENTRY-2184 - Performance Issue: MPath is queried for each MAuthzPathsMapping in full snapshot
SENTRY-2226 - Support Hive operation ALTER TABLE EXCHANGE.
SENTRY-2269 - Make SentryStore pluggable
SENTRY-2299 - NPE In Sentry HDFS Sync Plugin
SENTRY-2310 - Sentry is not be able to fetch full update subsequently, when there is HMS restart in the snapshot process

Apache Solr

SOLR-12290 - Do not close any servlet streams and improve our servlet stream closing prevention code for users and devs.
SOLR-12293 - Updates need to use their own connection pool to maintain connection reuse and prevent spurious recoveries.

Apache Spark

SPARK-12504 - [SQL] Masking credentials in the sql plan explain output for JDBC data sources
SPARK-12652 - [PYSPARK] Upgrade Py4J to 0.9.1
SPARK-13709 - [SQL] Initialize deserializer with both table and partition properties when reading partitioned tables
SPARK-13807 - De-duplicate `Python*Helper` instantiation code in PySpark streaming
SPARK-13848 - [SPARK-5185] Update to Py4J 0.9.2 in order to fix classloading issue
SPARK-15061 - [PYSPARK] Upgrade to Py4J 0.10.1
SPARK-16781 - [PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment
SPARK-17960 - [PYSPARK][UPGRADE TO PY4J 0.10.4]
SPARK-19822 - [TEST] CheckpointSuite.testCheckpointedOperation: should not filter checkpointFilesOfLatestTime with the PATH string
SPARK-20862 - [MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel
SPARK-21278 - [PYSPARK] Upgrade to Py4J 0.10.6
SPARK-22429 - [STREAMING] Streaming checkpointing code does not retry after failure
SPARK-23852 - [SQL] Add test that fails if PARQUET-1217 is not fixed

Apache Sqoop

SQOOP-2567 - SQOOP import for Oracle fails with invalid precision/scale for decimal
SQOOP-3082 - Sqoop import fails after TCP connection reset if split by datetime column

Apache Zookeeper

ZOOKEEPER-2375 - Prevent multiple initialization of login object in each ZooKeeperSaslClient instance

Issues Fixed in CDH 5.15.0

CDH 5.15.0 fixes the following issues.

Impala/Sentry security roles mismatch after Catalog Server restart
Apache Hive vulnerabilities CVE-2018-1282 and CVE-2018-1284
Apache Oozie Server vulnerability CVE-2017-15712
Apache Hadoop MapReduce Job History Server (JHS) vulnerability CVE-2017-15713
Upstream Issues Fixed

Impala/Sentry security roles mismatch after Catalog Server restart

This issue occurs when Impala’s Catalog Server is restarted without also restarting all the Impala Daemons.

Impala uses generated numeric identifiers for roles. These identifiers are regenerated during catalogd restarts, and the same role can get a different identifier, possibly used by a different role before restart. An impalad’s metadata cache can contain old id + role pairs, and when it is updated with privileges with new role ids from the catalog, the privilege will be added to the wrong role; the one that previously had the same role id.

Products affected: Apache Impala

Releases affected:

CDH 5.14.4 and all prior releases

Users affected: Impala users with authorization enabled.

Date/time of detection: 5th October, 2018

Severity (Low/Medium/High): 3.8 "Low"; CVSS:3.0/AV:N/AC:H/PR:H/UI:R/S:U/C:L/I:L/A:L

Impact: Users may get privileges of unrelated users.

CVE: CVE-2019-16381

Immediate action required: Update to a version of CDH containing the fix.

Addressed in release/refresh/patch: CDH 5.15.0

Apache Hive vulnerabilities CVE-2018-1282 and CVE-2018-1284

This security bulletin, TSB 2018-299, covers two vulnerabilities that have been addressed in CDH 5.15.0 and later:

CVE-2018-1282: JDBC driver is susceptible to SQL injection attack if the input parameters are not properly cleaned
CVE-2018-1284: Hive UDF series UDFXPathXXXX allows users to pass carefully crafted XML to access arbitrary files

For further details, see the original Known Issue for Hive.

Apache Oozie Server vulnerability CVE-2017-15712

A vulnerability in the Oozie Server allows a cluster user to read private files owned by the user running the Oozie Server process.

Products affected: Oozie

Releases affected: All releases prior to CDH 5.12.0, CDH 5.12.0, CDH 5.12.1, CDH 5.12.2, CDH 5.13.0, CDH 5.13.1, CDH 5.14.0

Users affected: Users running the Oozie Server

Date/time of detection: November 13, 2017

Detected by: Daryn Sharp and Jason Lowe of Oath (formerly Yahoo! Inc)

Severity: (Low/Medium/High) High

Impact: The vulnerability allows a cluster user to read private files owned by the user running the Oozie Server process. The malicious user can construct a workflow XML file containing XML directives and configuration that reference sensitive files on the Oozie server host.

CVE: CVE-2017-15712

Immediate action required: Upgrade to release where the issue is fixed.

Addressed in release/refresh/patch: CDH 5.13.2 and higher, 5.14.2 and higher, 5.15.0 and higher

Apache Hadoop MapReduce Job History Server (JHS) vulnerability CVE-2017-15713

A vulnerability in Hadoop’s Job History Server allows a cluster user to expose private files owned by the user running the MapReduce Job History Server (JHS) process. See http://seclists.org/oss-sec/2018/q1/79 for reference.

Products affected: Apache Hadoop MapReduce

Releases affected: All releases prior to CDH 5.12.0. CDH 5.12.0, CDH 5.12.1, CDH 5.12.2, CDH 5.13.0, CDH 5.13.1, CDH 5.14.0

Users affected: Users running the MapReduce Job History Server (JHS) daemon

Date/time of detection: November 8, 2017

Detected by: Man Yue Mo of lgtm.com

Severity (Low/Medium/High): High

Impact: The vulnerability allows a cluster user to expose private files owned by the user running the MapReduce Job History Server (JHS) process. The malicious user can construct a configuration file containing XML directives that reference sensitive files on the MapReduce Job History Server (JHS) host.

CVE: CVE-2017-15713

Immediate action required: Upgrade to a release where the issue is fixed.

Addressed in release/refresh/patch: CDH 5.13.2 and higher, 5.14.2 and higher, 5.15.0 and higher

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.15.0:

Apache Avro

AVRO-2109 - Reset buffers in case of IOException

Apache Flume

FLUME-2442 - Need an alternative to providing clear text passwords in flume config
FLUME-2917 - Provide netcat UDP source as alternative to TCP

Apache Hadoop

HADOOP-12886 - Exclude weak ciphers in SSLFactory through ssl-server.xml
HADOOP-14966 - Handle JDK-8071638 for hadoop-common
HADOOP-15085 - Output streams closed with IOUtils suppressing write errors
HADOOP-15113 - NPE in S3A getFileStatus: null instrumentation on using closed instance.
HADOOP-15149 - CryptoOutputStream should implement StreamCapabilities.
HADOOP-15161 - Streaming and common statistics missing from S3A metrics
HADOOP-15185 - Update ADLS connector to use the current version of ADLS SDK.
HADOOP-15206 - BZip2 drops and duplicates records when input split size is small
HDFS-1172 - Blocks in newly completed files are considered under-replicated too quickly
HDFS-7764 - DirectoryScanner shouldn't abort the scan if one directory had an error
HDFS-8693 - refreshNamenodes does not support adding a new standby to a running DN
HDFS-9023 - When NN is not able to identify DN for replication, reason behind it can be logged.
HDFS-10453 - ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
HDFS-10690 - Optimize insertion/removal of replica in ShortCircuitCache
HDFS-11187 - Optimize disk access for last partial chunk checksum of Finalized replica
HDFS-11494 - Log message when DN is not selected for block replication
HDFS-11576 - Block recovery will fail indefinitely if recovery time > heartbeat interval
HDFS-11847 - Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning.
HDFS-11848 - Enhance dfsadmin listOpenFiles command to list files under a given path
HDFS-12318 - Fix IOException condition for openInfo in DFSInputStream
HDFS-12323 - NameNode terminates after full GC thinking QJM unresponsive if full GC is much longer than timeout
HDFS-12832 - INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit.
HDFS-12881 - Output streams closed with IOUtils suppressing write errors
HDFS-12910 - Secure Datanode Starter should log the port when it fails to bind
HDFS-13112 - Token expiration edits may cause log corruption or deadlock
HDFS-13170 - Port webhdfs unmaskedpermission parameter to HTTPFS
MAPREDUCE-6711 - JobImpl fails to handle preemption events on state COMMITTING

Apache HBase

Code Changes Might Be Required

The following fixed issues might require changes to your HBase code or your configuration:

HBASE-15128 - Disable region splits and merges switch in master.
This might require code changes if org.apache.hadoop.hbase.client.Admin was subclassed.
HBASE-15866 - Split hbase.rpc.timeout into *.read.timeout and *.write.timeout
This might require code changes if you want to specify different values for RPC read and write timeouts.
HBASE-16008 - A robust way deal with early termination of HBCK.
This might require code changes if org.apache.hadoop.hbase.client.Admin was subclassed.

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

HBASE-14252 - RegionServers fail to start when setting hbase.ipc.server.callqueue.scan.ratio to 0
HBASE-19163 - "Maximum lock count exceeded" from region server's batch processing
HBASE-19440 - Not able to enable balancer with RSGroups once disabled
HBASE-19886 - Display maintenance mode in shell, web UI

Apache Hive

Code Changes Might Be Required

The following fixed issues might require changes to your HiveQL code or your configuration:

HIVE-16324 - Truncate table should not work when EXTERNAL property of table is set to true (lower case). This fix changes how the EXTERNAL table property is interpreted by Hive. After this fix, if your current deployment sets the EXTERNAL table property to true (lower case), the external table will no longer be truncated. Read the Jira for further details.
HIVE-18879 - Disallow embedded elements in the UDFXPathUtil class. This fix might change how your XML parser works if you use embedded elements with the xpath UDFs. For example, if you use embedded elements with the following UDFs in your queries: xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number, or xpath_string.

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

HIVE-8472 - Add SET LOCATION option to the ALTER DATABASE command
HIVE-10495 - Hive index creation code throws a NullPointerException if index table is null
HIVE-14786 - Option added to Beeline to display binary column data as a string or a byte array (--convertBinaryArrayToString=[true | false])
HIVE-14792 - Optimization added to minimize AvroSerde reads of the remote schema-file
HIVE-15329 - Fix for when a NullPointerException occurs during table creation
HIVE-15543 - Fix to prevent Hive fetching Spark memory/cores to decide parallelism when Spark dynamic allocation is enabled
HIVE-16601 - Display Session Id and Query Name / Id in Spark UI
HIVE-16663 - Added string caching for rows to Beeline
HIVE-16890 - Remove the superfluous wrapper from HiveVarcharWritable
HIVE-18228 - Azure credential properties added to the HiveConf hidden list
HIVE-18788 - Cleans up inputs in JDBC PreparedStatement (HivePreparedStatement) to fix SQL injection vulnerabilities

Hue

HUE-7860 - [core] Integrate non IO blocking Python Webserver
HUE-7913 - [autocomplete] Add variable locations to the autocomplete parser
HUE-7913 - [autocomplete] Report quoted variable locations with possible column references
HUE-7915 - [assist] Increase the frontend cache TTL to 10 days
HUE-7942 - [editor] FIX variables with incorrect placeholder.
HUE-7943 - [editor] Variable list are not being refreshed
HUE-8043 - [editor] Cancel a query using ctrl-enter shortcut

Apache Impala

IMPALA-4315 - Allow USE and SHOW TABLES if there is at least one table in a database where the user has table or column privileges.
IMPALA-4323 - The SET ROW FORMAT clause was added to the ALTER TABLE statement for the TEXT or SEQUENCE file formats.
IMPALA-4886 - Table metrics are available in the catalog web UI.
IMPALA-5654 - Disallows explicitly setting the Kudu table name property for managed Kudu tables in a CREATE TABLE and ALTER TABLE statements, e.g. CREATE TABLE t (i INT) STORED AS KUDU TBLPROPERTIES('kudu.table_name'='some_name').
IMPALA-6549 - The file handle cache, controlled by the max_cached_file_handles flag , is enabled by default.

Apache Kudu

KUDU-1613 - Fixed a scenario where the on-disk data of a tablet server was completely erased and a new tablet server was started on the same host. This issue could prevent tablet replicas previously hosted on the server from being evicted and re-replicated. Tablets now immediately evict replicas that respond with a different server UUID than expected.
KUDU-1927 - Fixed a rare race condition when connecting to masters during their startup which might cause a client to get a response without a CA certificate and/or authentication token. This would cause the client to fail to authenticate with other servers in the cluster. The leader master now always sends a CA certificate and an authentication token (when applicable) to a Kudu client with a successful ConnectToMaster response.
KUDU-2262 - The Kudu Java client now will retry a connection if no master is discovered as a leader, and the user has a valid authentication token. This avoids failure in recoverable cases when masters are in the process of the very first leader election after starting up.
KUDU-2264 -The Java client will now automatically attempt to re-acquire Kerberos credentials from the ticket cache when the prior credentials are about to expire. This allows client instances to persist longer than the expiration time of a single Kerberos ticket so long as some other process renews the credentials in the ticket cache. Documentation on interacting with Kerberos authentication has been added to the Javadoc for the AsyncKuduClient class.
KUDU-2265 - Follower masters are now able to verify authentication tokens even if they have never been a leader. Prior to this fix, if a follower master had never been a leader, clients would be unable to authenticate to that master, resulting in spurious error messages being logged.
KUDU-2295 - Fixed a tablet server crash when a tablet replica is deleted during a scan.
KUDU-2312 - The evaluation order of predicates in scans with multiple predicates has been made deterministic. Due to a bug, this was not necessarily the case previously. Predicates are applied in most to least selective order, with ties broken by column index. The evaluation order may change in the future, particularly when better column statistics are made available internally.
KUDU-2331 - Previously, the kudu tablet change_config move_replica tool required all tablet servers in the cluster to be available when performing a move. This restriction has been relaxed: only the tablet server that will receive a replica of the tablet being moved and the hosts of the tablet’s existing replicas need to be available for the move to occur.
KUDU-2343 - Fixed a bug in the Java client which prevented the client from locating the new leader master after a leader failover in the case that the previous leader either remained online or restarted quickly. This bug resulted in the client timing out operations with errors indicating that there was no leader master.
KUDU-2259 - The Unix process username of the client is now included inside the exported security credentials, so that the effective username of clients who import credentials and subsequently use unauthenticated (SASL PLAIN) connections matches the client who exported the security credentials. For example, this is useful to let the Spark executors know which username to use if the Spark driver has no authentication token. This change only affects clusters with encryption disabled using --rpc-encryption=disabled.

Apache Oozie

OOZIE-3173 - Coordinator job with frequency using cron syntax creates only one action in catchup mode
OOZIE-3183 - Better logging for SshActionExecutor and extended HA capability when calling to remote host

Apache Spark

SPARK-12297 - Convert Impala-written timestamps from UTC to local TZ ()
SPARK-22188 - [CORE] Adding security headers for preventing XSS, MitM and MIME sniffing
SPARK-23660 - Fix exception in yarn cluster mode when application ended fast

Apache Sqoop

SQOOP-3153 - Sqoop export with --as-<spec_file_format> will now display a verbose error message as these options are only valid for imports

Categories: CDH | Fixed Issues | Release Notes | Upstream | All Categories

Issues Fixed in CDH 5.16.x

Issues Fixed in CDH 5.14.x