Issues Fixed in CDH 5.15.x

The following topics describe issues fixed in CDH 5.15.x, from newest to oldest release. You can also review What's New in CDH 5.15.x or Known Issues in CDH 5.

Issues Fixed in CDH 5.15.2

CVE-2019-10099: Apache Spark local files left unencrypted

Certain operations in Spark leave local files unencrypted on disk, even when local file encryption is enabled with “spark.io.encryption.enabled”.

This includes cached blocks that are fetched to disk (controlled by spark.maxRemoteBlockSizeFetchToMem) in the following cases:

  • In SparkR when parallelize is used
  • In Pyspark when broadcast and parallelize are used
  • In Pyspark when python udfs is used
Products affected:
  • CDH
  • CDS Powered by Apache Spark
Affected versions:
  • CDH 5.15.1 and earlier
  • CDH 6.0.0
  • CDS 2.1.0 release 1 and release 2
  • CDS 2.2.0 release 1 and release 2
  • CDS 2.3.0 release 3

Users affected: All users who run Spark on CDH and CDS in a multi-user environment.

Date/time of detection: July 2018

Severity (Low/Medium/High): 6.3 Medium (CVSS AV:L/AC:H/PR:N/UI:R/S:U/C:H/I:H/A:N)

Impact: Unencrypted data accessible.

CVE: CVE-2019-10099

Immediate action required: Upgrade to a version of CDH containing the fix.

Workaround: Do not use of pyspark and the fetch-to-disk options.

Fixed versions:
  • CDH 5.15.2
  • CDH 5.16.0
  • CDH 6.0.1
  • CDS 2.1.0 release 3
  • CDS 2.2.0 release 3
  • CDS 2.3.0 release 4
For the latest update on this issue see the corresponding Knowledge article: TSB 20210-336: Apache Spark local files left unencrypted

CVE-2018-1296 Permissive Apache Hadoop HDFS listXAttr Authorization Exposes Extended Attribute Key/Value Pairs

AHDFS exposes extended attribute key/value pairs during listXAttrs, verifying only path-level search access to the directory rather than path-level read permission to the referent.

Products affected: Apache HDFS

Releases affected:
  • CDH 5.4.0 - 5.15.1, 5.16.0
  • CDH 6.0.0, 6.0.1, 6.1.0

Users affected: Users who store sensitive data in extended attributes, such as users of HDFS encryption.

Date/time of detection: Dcember 12, 2017

Detected by: Rushabh Shah, Yahoo! Inc., Hadoop committer

Severity (Low/Medium/High): Medium

Impact: HDFS exposes extended attribute key/value pairs during listXAttrs, verifying only path-level search access to the directory rather than path-level read permission to the referent. This affects features that store sensitive data in extended attributes.

CVE: CVE-2018-1296

Immediate action required:
  • Upgrade: Update to a version of CDH containing the fix.
  • Workaround: If a file contains sensitive data in extended attributes, users and admins need to change the permission to prevent others from listing the directory that contains the file.
Addressed in release/refresh/patch:
  • CDH 5.15.2, 5.16.1
  • CDH 6.1.1, 6.2.0

Missing authorization in Apache Impala may allow data injection

A malicious user who is authenticated with Kerberos may have unauthorized access to internal services used by Impala to transfer intermediate data during query execution. If details of a running query (e.g. query ID, query plan) are available, a user can craft some RPC requests with custom software to inject data into a running query or end query execution prematurely, leading to wrong results of the query.

Cloudera Issue: CDH-72373 / TSB-338

CVE: CVE-2018-11785

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.15.2:

Apache Hadoop

  • HADOOP-15473 - Enhanced serialFilter in KeyProvider to avoid UnrecoverableKeyException caused by JDK-8189997.
  • HADOOP-15655 - Enhanced KMS client retry behavior.
  • HDFS-10240 - Fixed an issue where a race between close/recoverLease leads to missing blocks.
  • HDFS-12299 - Fixed a race between update pipeline and DataNode Re-Registration.
  • HDFS-13322 - Fixed an issue with fuse dfs that caused the UID to persist when switching between ticket caches.
  • HDFS-13486 - Backport HDFS-11817 to fix an issue where a faulty node can cause a lease leak and NPE on accessing data.
  • HDFS-13813 - Enhanced NameNode behavior to exit if dangling child inode is detected when saving FsImage.
  • MAPREDUCE-7053 - Fixed an issue where timed out tasks can fail to produce thread dump.
  • YARN-6966 - Fixed an issue where NodeManager metrics may return wrong negative values when NM restart.
  • YARN-6967 - Fixed an issue where an application attempt's diagnostic message size was not properly limited.
  • YARN-8436 - Fixed an issue with FSParentQueue where the comparison method violates its general contract.

Apache HBase

  • HBASE-19730 - HBASE-14497 Reverse Scan threw StackOverflow caused by readPt checking
  • HBASE-19924 - hbase rpc throttling does not work for multi() with request count rater.
  • HBASE-20493 - Port HBASE-19994 (Create a new class for RPC throttling exception, make it retryable) to branch-1
  • HBASE-20723 - Custom hbase.wal.dir results in data loss because we write recovered edits into a different place than where the recovering region server looks for them.
  • HBASE-20997 - rebuildUserRegions() does not build ReplicaMapping during master switchover
  • HBASE-21275 - Disable TRACE HTTP method for thrift http server

Apache Hive

  • HIVE-12981 - ThriftCLIService uses incompatible getShortName() implementation
  • HIVE-13394 - Analyze table fails in tez on empty partitions
  • HIVE-14236 - Partial backport ofto fix union-stats related errors
  • HIVE-14560 - Support exchange partition between s3 and hdfs tables
  • HIVE-14690 - Query fail when hive.exec.parallel=true, with conflicting session dir
  • HIVE-16483 - HoS should populate split related configurations to HiveConf
  • HIVE-17213 - HoS: file merging doesn't work for union all
  • HIVE-19259 - Create view on tables having union all fail with 'Table not found'
  • HIVE-20183 - Inserting from bucketed table can cause data loss, if the source table contains empty bucket
  • HIVE-20345 - Drop database may hang if the tables get deleted from a different call
  • HIVE-20678 - HiveHBaseTableOutputFormat should implement HiveOutputFormat to ensure compatibility
  • HIVE-20695 - HoS Query fails with hive.exec.parallel=true

Hue

  • HUE-8128 - [backend] Force debug logging in server logs does not get all debug
  • HUE-8398 - [editor] Fix broken result table after multiple queries in embedded mode
  • HUE-8399 - [editor] Various improvements for embedded mode
  • HUE-8399 - [editor] Fix various embedded mode issues in old version
  • HUE-8399 - [editor] Limit functions assist to Impala in embedded mode
  • HUE-8399 - [editor] Hide the right assistant in embedded mode
  • HUE-8451 - [notebook] Many "codec can't decode byte" errors on pig execution if browser language=jp
  • HUE-8458 - [frontend] Fix issue with async loading of js resources in the dashboard
  • HUE-8458 - [frontend] Evaluate the js resources while others are being fetched
  • HUE-8458 - [frontend] Load new scripts using $.get and eval instead of appending <script> tags
  • HUE-8464 - [core] Fix SAML encryption missing key file passphrase
  • HUE-8467 - [jobbrowser] Support impala digest auth for queries
  • HUE-8468 - [frontend] Append a style tag to head instead of modifying stylesheets for dynamic styles in embedded mode
  • HUE-8475 - [report] Protect against pivot conflicting with nested facets
  • HUE-8487 - [useradmin] Fix Add Sync LDAP user fails when using DN with special character
  • HUE-8505 - [core] Close impala session on logout
  • HUE-8558 - [jb] Add tracking URL to Spark Jobs and remove url and killUrl
  • HUE-8564 - [useradmin] Fix last activity update for notebook/api/check_status
  • HUE-8564 - [useradmin] Fix last activity update for jobbrowser/api/jobs requests

Apache Impala

  • IMPALA-6907 - Now Impala correctly closes all stale connections to removed impala cluster members.
  • IMPALA-7225 - Fixed an issue where the REFRESH...PARTITION statement caused statistics for the refreshed partition to be automatically reset to -1 (unknown) . With the fix, statistics will be changed only if an explicit COMPUTE STATS statement is issued for an object.
  • IMPALA-7272 - Fixed a crash caused by a memory management problem when the query execution requires finding strings inside a range defined by the lesser-than and greater-than comparisons.
  • IMPALA-7360 - Fixed an issue where Impala could incorrectly skip data if a record separator in a sequence-based file (Avro, RC or sequence file) straddled an HDFS block boundary.
  • IMPALA-7537 - Fixed a security issue where REVOKE ALL ON SERVER did not have a permanent effect if the ALL permission was granted using the WITH GRANT option. Running INVALIDATE METADATA no longer causes the permission to reappear.
  • IMPALA-7585 - Fixed an issue in KRPC, which can cause slow or hung queries on non-secured clusters.

Apache Kudu

  • KUDU-2463 - Fixed an issue in which incorrect results would be returned in scans following a server restart.
  • KUDU-2509 - Fixed an issue that might result in a crash of a tablet server in case of a WAL replay error while bootstrapping a tablet.
  • KUDU-2580 - Fixed authentication token reacquisition in the C++ client.

Apache Oozie

  • OOZIE-2457 - Oozie log parsing regex consume more than 90% cpu
  • OOZIE-3354 - [core] [SSH action] SSH action gets hung
  • OOZIE-3370 - Property filtering is not consistent across job submission

Apache Sentry

  • SENTRY-1944 - Optimize DelegateSentryStore.getGroupsByRoles() and update SentryGenericPolicyProcessor to retrieve roles to group mapping in a single transaction
  • SENTRY-2194 - Upgrade Sentry hadoop-version dependency to 2.7.5
  • SENTRY-2214 - Sentry should not allow URI grants to EMPTY or NULL locations
  • SENTRY-2309 - ModifiedCatch NPE thrown when fetching Partitions with no corresponding SDS entry
  • SENTRY-2332 - Load hadoop default configuration when starting sentry service
  • SENTRY-2403 - Incorrect naming in RollingFileWithoutDeleteAppender
  • SENTRY-2406 - Make sure inputHierarchy and outputHierarchy have unique values
  • SENTRY-2428 - Skip null partitions or partitions with null sds entries

Apache Spark

  • SPARK-25253 - [PYSPARK] Refactor local connection & auth code
  • SPARK-25318 - Add exception handling when wrapping the input stream during the the fetch or stage retry in response to a corrupted block

Apache Zookeeper

  • ZOOKEEPER-706 - Large numbers of watches can cause session re-establishment to fail
  • ZOOKEEPER-1382 - Zookeeper server holds onto dead/expired session ids in the watch data structures

Issues Fixed in CDH 5.15.1

Hadoop YARN Privilege Escalation CVE-2016-6811

Fixed a vulnerability in Hadoop YARN that allows a user who can escalate to the yarn user the ability to possibly run arbitrary commands as the root user. CVE-2016-6811

Apache HBase potential privilege escalation for user of HBase “Thrift 1” API Server over HTTP CVE-2018-8025

CVE-2018-8025 describes an issue in Apache HBase that affects the optional "Thrift 1" API server when running over HTTP. There is a race-condition that could lead to authenticated sessions being incorrectly applied to users, e.g. one authenticated user would be considered a different user or an unauthenticated user would be treated as an authenticated user.

Products affected: HBase Thrift Server

Releases affected:
  • CDH 5.4.x - 5.12.x
  • CDH 5.13.0, 5.13.1, 5.13.2, 5.13.3
  • CDH 5.14.0, 5.14.2, 5.14.3
  • CDH 5.15.0

Fixed versions: CDH 5.14.4, 5.15.1

Users affected: Users with the HBase Thrift 1 service role installed and configured to work in “thrift over HTTP” mode. For example, those using Hue with HBase impersonation enabled.

Severity: High

Potential privilege escalation.

CVE: CVE-2018-8025

Immediate action required: Upgrade to a CDH version with the fix, or, disable the HBase Thrift-over-HTTP service. Disabling the HBase Thrift-over-HTTP service will render Hue impersonation inoperable and all HBase access via Hue will be performed using the “hue” user instead of the authenticated user.

Knowledge article: For the latest update on this issue see the corresponding Knowledge article - TSB: 2018-315: Potential privilege escalation for user of HBase “Thrift 1” API Server over HTTP

Apache Hive CREATE TABLE EXTERNAL commands run very slowly on Amazon S3

When you run a CREATE EXTERNAL TABLE command on Amazon S3 storage, the command runs extremely slowly and might not complete. This usually occurs for tables that have pre-existing, nested data. There is no known workaround.

Affected Versions: CDH 5.15.0

Cloudera Bug: CDH-68833

Workaround: None

Fixed versions: CDH 5.15.1 and later

Cloudera Search restore operation puts shard replicas on same host

Restoring an Apache Solr collection sometimes places all shard replicas on the same host.

Cloudera Issue: CDH-68828

Zip Slip Vulnerability CVE-2018-8009

“Zip Slip” is a widespread arbitrary file overwrite critical vulnerability, which typically results in remote command execution. It was discovered and responsibly disclosed by the Snyk Security team ahead of a public disclosure on June 5, 2018, and affects thousands of projects.

Cloudera has analyzed our use of zip-related software, and has determined that only Apache Hadoop is vulnerable to this class of vulnerability in CDH 5. This has been fixed in upstream Hadoop as CVE-2018-8009.

hdfs snapshotDiff /.reserved/raw/... fails on snapshottable directories

Fixed an issue where the hdfs snapshotDiff command fails when the command is supplied with the raw path.

Cloudera Issue: CDH-66029

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.15.1:

Apache Flume

  • FLUME-2786, FLUME-3056, FLUME-3117 - Application enters a deadlock when stopped while handleConfigurationEvent
  • FLUME-2894 - Flume components should stop in the correct order
  • FLUME-2976 - Exception when JMS source tries to connect to a Weblogic server without authentication
  • FLUME-3222 - Fix for NoSuchFileException thrown when files are being deleted

Apache Hadoop

  • HADOOP-13024 - Distcp with -delete feature on raw data not implemented
  • HADOOP-13972 - ADLS to support per-store configuration
  • HADOOP-15186 - Allow Azure Data Lake SDK dependency version to be set on the command line
  • HADOOP-15317 - Improve NetworkTopology chooseRandom's loop
  • HADOOP-15342 - Updating ADLS connector to use SDK version 2.2.7
  • HADOOP-15356 - Make the HTTP timeout configurable in ADLS connector
  • HADOOP-15434 - Upgrade to ADLS SDK that exposes current timeout
  • HADOOP-15466 - Correct units in adl.http.timeout property
  • HDFS-9229 - Expose size of NameNode directory as a metric
  • HDFS-11751 - DFSZKFailoverController daemon exits with wrong status code
  • HDFS-11993 - Add log info when connect to datanode socket address failed
  • HDFS-12683 - DFSZKFailOverController re-order logic for logging Exception
  • HDFS-12710 - HTTPFS HTTP max header size env variable is not respected in branch-2
  • HDFS-12981 - Running renameSnapshot on a non-existent snapshot to itself should throw error
  • HDFS-13281 - Namenode#createFile should be /.reserved/raw/ aware
  • HDFS-13314 - NameNode should optionally exit if it detects FsImage corruption
  • MAPREDUCE-7094 - LocalDistributedCacheManager leaves classloaders open, which leaks File Descriptorss
  • YARN-4227 - Ignore expired containers from removed nodes in FairScheduler
  • YARN-4325 - NodeManager log handlers fail to send finished/failed events in some cases
  • YARN-4677 - RMNodeResourceUpdateEvent update from scheduler can lead to a race condition
  • YARN-5121 - Fix container-executor portability issues

Apache HBase

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • HBASE-20229 - ConnectionImplementation.locateRegions() returns duplicated entries when region replication is on
  • HBASE-20293 - get_splits returns duplicate split points when region replication is on
  • HBASE-20664 - Reduce the broad scope of outToken in ThriftHttpServlet

Apache Hive

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • HIVE-9915 - Allow specifying file format for managed tables
  • HIVE-15580 - Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark
  • HIVE-15682 - Eliminate per-row based dummy iterator creation
  • HIVE-15683 - Make configurable how Hive on Spark shuffles data for GROUP BY
  • HIVE-16080 - Add parquet to possible values for hive.default.fileformat and hive.default.fileformat.managed
  • HIVE-16659 - Query plan should reflect hive.spark.use.groupby.shuffle
  • HIVE-19041 - Thrift deserialization of Partition objects should intern fields
  • HIVE-19231 - Beeline generates garbled output when using UnsupportedTerminal
  • HIVE-19424 - NPE In MetaDataFormatters
  • HIVE-19668 - Over 30% of the heap is wasted by duplicate org.antlr.runtime.CommonTokens and duplicate strings
  • HIVE-19700 - Workaround for JLine issue with UnsupportedTerminal
  • HIVE-19870 - HCatalog dynamic partition query can fail if the table path is managed by Sentry

Hue

  • HUE-6697 - [jb] Prevent reset of job page tabs when job is running
  • HUE-7946 - [jb] Link to subworkflow on workflow dashboard page 404s in Hue 4
  • HUE-7989 - [useradmin] Provide better UI message and message in logs when ldap server down
  • HUE-8053 - Fix unit test
  • HUE-8053 - [useradmin] LDAP authentication with sync_groups_on_login=true fails with KeyError exception
  • HUE-8055 - [desktop] Support multiple LDAP servers in LDAP Test command
  • HUE-8063 - [fb] Better error message when HTTPFS is not working
  • HUE-8137 - [core] Refresh translations files for 4.2
  • HUE-8159 - [oozie] Unable to create Workflow/Schedule using Java document Action
  • HUE-8170 - [useradmin] Fix LDAP sync (ldap_access.py) certificate validation logic
  • HUE-8173 - [core] Add a warning log when a users switched to the old Hue 3 UI
  • HUE-8173 - [core] Add a log line when a user loads a page not via the load balancer even if we have one
  • HUE-8191 - [jb] UnicodeEncodeError occurs in Job Browser when browser language is not English
  • HUE-8202 - [jb] Fix mutual authentication failed with Isilon
  • HUE-8207 - [indexer] Issue previewing input file with unicode data
  • HUE-8217 - [dashboard] Fix HTML resultset widget templating
  • HUE-8232 - [oozie] Fix 500 error if coordinator associated workflow has been deleted
  • HUE-8236 - [core] Correct Hue config value use integer instead of math expression
  • HUE-8253 - [editor] Support downloading Query results with query names(file names) other than ISO-8859-1 charset
  • HUE-8253 - [editor] Fix the broken unit test in 5.15 due to conflict after backporting
  • HUE-8274 - [s3] Moving a folder using drag and drop deletes the folder itself
  • HUE-8280 - [fb] Move action button does not prevent move to itself
  • HUE-8280 - [fb] Fix the missed conflict issue after backporting
  • HUE-8282 - [fb] Check if EC2 instance before check IAM metadata
  • HUE-8305 - [useradmin] Optimize performance on checking Hue permissions if user is in many groups
  • HUE-8310 - Broken template for multiple custom apps
  • HUE-8313 - [core] Remove hardcoding to ImpersonationBackend when using embedded mode
  • HUE-8314 - [core] Fix SAML encryption missing config
  • HUE-8320 - [core] After copying a workflow using Hue 4 button, saving the copied workflow fails
  • HUE-8344 - [hbase] Hbase old version of data can not display in Hue
  • HUE-8350 - [solr] indexer app permission is not being acknowledged in HUE
  • HUE-8350 - [solr] indexer app permission is not being acknowledged in HUE
  • HUE-8370 - [pig] Imported old version Pig script missing properties fields
  • HUE-8392 - [oozie] Cannot add more actions using drag & drop from actions bar in the Oozie editor after adding around 3 actions
  • HUE-8404 - [useradmin] Fix multibackend invalid password removes drop down to select Local
  • HUE-8409 - [core] When idle session timeout is enabled it causes issues with Spnego
  • HUE-8440 - [jb] Link for Spark logs in Properties tab of Job Browser is incorrect
  • HUE-8455 - [pig] Oozie editor fails with 'hadoopProperties' for pig script saved in Hue 3

Apache Impala

  • IMPALA-6687 - Fixed INSERT with mixed case in partition column names.
  • IMPALA-6822 - Added a query option to control shuffling by distinct expressions.
  • IMPALA-6847 - Added a work around high memory estimates for admission control.
  • IMPALA-6908 - IsConnResetTException() should include ECONNRESET.
  • IMPALA-6934 - Corrected the wrong results with EXISTS subqueries that contain ORDER BY, LIMIT, and OFFSET.
  • IMPALA-7014 - Disabled the stacktrace symbolisation by default.
  • IMPALA-7078 - Reduced the queue size based on num_scanner_threads.
  • IMPALA-7078 - Improved memory consumption of wide Avro scans.
  • IMPALA-7288 - Fixed the Codegen crash in FinalizeModule.
  • IMPALA-7298 - Impala no longer passes IP address as hostname in Kerberos principal.

Apache Kudu

  • KUDU-2367 - Fixed an issue where a permanently failed tablet replica was not properly identified, which could cause the tablet not to re-replicate in very small clusters.
  • KUDU-2377 - Fixed an issue that caused Kudu servers to fail to start when RLIMIT_NPROC=-1.
  • KUDU-2378 - Fixed unaligned loads of int128 from rows.
  • KUDU-2379 - Fixed an issue that caused secure Spark jobs to fail.
  • KUDU-2416 - Fixed PartialRow.setMin.
  • KUDU-2443 - Fixed replica movement and replacement for RF=1.
  • KUDU-2447 - Fixed the tablet server crash with the error, "NONE predicate can not be pushed into key".
  • KUDU-2478 - Restored Python 2.6 compatibility.
  • Added the ability to adjust scan timeouts in Spark.
  • Increased the timeout to begin tablet copies, which improves Kudu's re-replication time when the cluster is busy.
  • Fixed a NullPointerException thrown when calling ColumnSchema#toString on non-decimal types.
  • Greatly improved the performance of many types of queries on tables from which many rows have been deleted.
  • Fixed an issue that caused partition pruning to be too conservative for queries from the Java client that use Decimal predicates.

Apache Oozie

  • OOZIE-2491 - oozie acl cannot specify group,it does not work
  • OOZIE-3134 - Potential inconsistency between the in-memory SLA map and the Oozie database
  • OOZIE-3260 - [sla] Remove stale item above max retries on JPA related errors from in-memory SLA map

Apache Parquet

  • PARQUET-1246 - Ignore float/double statistics in case of NaN

Apache Sentry

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • SENTRY-1209 - Sentry does not block Hive's cross-schema table renames
  • SENTRY-2020 - Fix testConsumeCycleWithInsufficientPrivileges test failure in kafka e2e tests
  • SENTRY-2144 - Table Rename Cross Database should update permission correctly
  • SENTRY-2165 - NotificationProcesser process notification methods have logs wrongly flagged as ERROR
  • SENTRY-2183 - Increase default sentry-hdfs rpc timeout to 20 mins
  • SENTRY-2184 - Performance Issue: MPath is queried for each MAuthzPathsMapping in full snapshot
  • SENTRY-2226 - Support Hive operation ALTER TABLE EXCHANGE.
  • SENTRY-2269 - Make SentryStore pluggable
  • SENTRY-2299 - NPE In Sentry HDFS Sync Plugin
  • SENTRY-2310 - Sentry is not be able to fetch full update subsequently, when there is HMS restart in the snapshot process

Apache Solr

  • SOLR-12290 - Do not close any servlet streams and improve our servlet stream closing prevention code for users and devs.
  • SOLR-12293 - Updates need to use their own connection pool to maintain connection reuse and prevent spurious recoveries.

Apache Spark

  • SPARK-12504 - [SQL] Masking credentials in the sql plan explain output for JDBC data sources
  • SPARK-12652 - [PYSPARK] Upgrade Py4J to 0.9.1
  • SPARK-13709 - [SQL] Initialize deserializer with both table and partition properties when reading partitioned tables
  • SPARK-13807 - De-duplicate `Python*Helper` instantiation code in PySpark streaming
  • SPARK-13848 - [SPARK-5185] Update to Py4J 0.9.2 in order to fix classloading issue
  • SPARK-15061 - [PYSPARK] Upgrade to Py4J 0.10.1
  • SPARK-16781 - [PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment
  • SPARK-17960 - [PYSPARK][UPGRADE TO PY4J 0.10.4]
  • SPARK-19822 - [TEST] CheckpointSuite.testCheckpointedOperation: should not filter checkpointFilesOfLatestTime with the PATH string
  • SPARK-20862 - [MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel
  • SPARK-21278 - [PYSPARK] Upgrade to Py4J 0.10.6
  • SPARK-22429 - [STREAMING] Streaming checkpointing code does not retry after failure
  • SPARK-23852 - [SQL] Add test that fails if PARQUET-1217 is not fixed

Apache Sqoop

  • SQOOP-2567 - SQOOP import for Oracle fails with invalid precision/scale for decimal
  • SQOOP-3082 - Sqoop import fails after TCP connection reset if split by datetime column

Apache Zookeeper

  • ZOOKEEPER-2375 - Prevent multiple initialization of login object in each ZooKeeperSaslClient instance

Issues Fixed in CDH 5.15.0

Impala/Sentry security roles mismatch after Catalog Server restart

This issue occurs when Impala’s Catalog Server is restarted without also restarting all the Impala Daemons.

Impala uses generated numeric identifiers for roles. These identifiers are regenerated during catalogd restarts, and the same role can get a different identifier, possibly used by a different role before restart. An impalad’s metadata cache can contain old id + role pairs, and when it is updated with privileges with new role ids from the catalog, the privilege will be added to the wrong role; the one that previously had the same role id.

Products affected: Apache Impala

Releases affected:
  • CDH 5.14.4 and all prior releases

Users affected: Impala users with authorization enabled.

Date/time of detection: 5th October, 2018

Severity (Low/Medium/High): 3.8 "Low"; CVSS:3.0/AV:N/AC:H/PR:H/UI:R/S:U/C:L/I:L/A:L

Impact: Users may get privileges of unrelated users.

CVE: CVE-2019-16381

Immediate action required: Update to a version of CDH containing the fix.

Addressed in release/refresh/patch: CDH 5.15.0

Apache Hive vulnerabilities CVE-2018-1282 and CVE-2018-1284

This security bulletin, TSB 2018-299, covers two vulnerabilities that have been addressed in CDH 5.15.0 and later:
  • CVE-2018-1282: JDBC driver is susceptible to SQL injection attack if the input parameters are not properly cleaned
  • CVE-2018-1284: Hive UDF series UDFXPathXXXX allows users to pass carefully crafted XML to access arbitrary files

For further details, see the original Known Issue for Hive.

Apache Oozie Server vulnerability CVE-2017-15712

A vulnerability in the Oozie Server allows a cluster user to read private files owned by the user running the Oozie Server process.

Products affected: Oozie

Releases affected: All releases prior to CDH 5.12.0, CDH 5.12.0, CDH 5.12.1, CDH 5.12.2, CDH 5.13.0, CDH 5.13.1, CDH 5.14.0

Users affected: Users running the Oozie Server

Date/time of detection: November 13, 2017

Detected by: Daryn Sharp and Jason Lowe of Oath (formerly Yahoo! Inc)

Severity: (Low/Medium/High) High

Impact: The vulnerability allows a cluster user to read private files owned by the user running the Oozie Server process. The malicious user can construct a workflow XML file containing XML directives and configuration that reference sensitive files on the Oozie server host.

CVE: CVE-2017-15712

Immediate action required: Upgrade to release where the issue is fixed.

Addressed in release/refresh/patch: CDH 5.13.2 and higher, 5.14.2 and higher, 5.15.0 and higher

Apache Hadoop MapReduce Job History Server (JHS) vulnerability CVE-2017-15713

A vulnerability in Hadoop’s Job History Server allows a cluster user to expose private files owned by the user running the MapReduce Job History Server (JHS) process. See http://seclists.org/oss-sec/2018/q1/79 for reference.

Products affected: Apache Hadoop MapReduce

Releases affected: All releases prior to CDH 5.12.0. CDH 5.12.0, CDH 5.12.1, CDH 5.12.2, CDH 5.13.0, CDH 5.13.1, CDH 5.14.0

Users affected: Users running the MapReduce Job History Server (JHS) daemon

Date/time of detection: November 8, 2017

Detected by: Man Yue Mo of lgtm.com

Severity (Low/Medium/High): High

Impact: The vulnerability allows a cluster user to expose private files owned by the user running the MapReduce Job History Server (JHS) process. The malicious user can construct a configuration file containing XML directives that reference sensitive files on the MapReduce Job History Server (JHS) host.

CVE: CVE-2017-15713

Immediate action required: Upgrade to a release where the issue is fixed.

Addressed in release/refresh/patch: CDH 5.13.2 and higher, 5.14.2 and higher, 5.15.0 and higher

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.15.0:

Apache Avro

  • AVRO-2109 - Reset buffers in case of IOException

Apache Flume

  • FLUME-2442 - Need an alternative to providing clear text passwords in flume config
  • FLUME-2917 - Provide netcat UDP source as alternative to TCP

Apache Hadoop

  • HADOOP-12886 - Exclude weak ciphers in SSLFactory through ssl-server.xml
  • HADOOP-14966 - Handle JDK-8071638 for hadoop-common
  • HADOOP-15085 - Output streams closed with IOUtils suppressing write errors
  • HADOOP-15113 - NPE in S3A getFileStatus: null instrumentation on using closed instance.
  • HADOOP-15149 - CryptoOutputStream should implement StreamCapabilities.
  • HADOOP-15161 - Streaming and common statistics missing from S3A metrics
  • HADOOP-15185 - Update ADLS connector to use the current version of ADLS SDK.
  • HADOOP-15206 - BZip2 drops and duplicates records when input split size is small
  • HDFS-1172 - Blocks in newly completed files are considered under-replicated too quickly
  • HDFS-7764 - DirectoryScanner shouldn't abort the scan if one directory had an error
  • HDFS-8693 - refreshNamenodes does not support adding a new standby to a running DN
  • HDFS-9023 - When NN is not able to identify DN for replication, reason behind it can be logged.
  • HDFS-10453 - ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
  • HDFS-10690 - Optimize insertion/removal of replica in ShortCircuitCache
  • HDFS-11187 - Optimize disk access for last partial chunk checksum of Finalized replica
  • HDFS-11494 - Log message when DN is not selected for block replication
  • HDFS-11576 - Block recovery will fail indefinitely if recovery time > heartbeat interval
  • HDFS-11847 - Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning.
  • HDFS-11848 - Enhance dfsadmin listOpenFiles command to list files under a given path
  • HDFS-12318 - Fix IOException condition for openInfo in DFSInputStream
  • HDFS-12323 - NameNode terminates after full GC thinking QJM unresponsive if full GC is much longer than timeout
  • HDFS-12832 - INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit.
  • HDFS-12881 - Output streams closed with IOUtils suppressing write errors
  • HDFS-12910 - Secure Datanode Starter should log the port when it fails to bind
  • HDFS-13112 - Token expiration edits may cause log corruption or deadlock
  • HDFS-13170 - Port webhdfs unmaskedpermission parameter to HTTPFS
  • MAPREDUCE-6711 - JobImpl fails to handle preemption events on state COMMITTING

Apache HBase

Code Changes Might Be Required

The following fixed issues might require changes to your HBase code or your configuration:

  • HBASE-15128 - Disable region splits and merges switch in master.

    This might require code changes if org.apache.hadoop.hbase.client.Admin was subclassed.

  • HBASE-15866 - Split hbase.rpc.timeout into *.read.timeout and *.write.timeout

    This might require code changes if you want to specify different values for RPC read and write timeouts.

  • HBASE-16008 - A robust way deal with early termination of HBCK.

    This might require code changes if org.apache.hadoop.hbase.client.Admin was subclassed.

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • HBASE-14252 - RegionServers fail to start when setting hbase.ipc.server.callqueue.scan.ratio to 0
  • HBASE-19163 - "Maximum lock count exceeded" from region server's batch processing
  • HBASE-19440 - Not able to enable balancer with RSGroups once disabled
  • HBASE-19886 - Display maintenance mode in shell, web UI

Apache Hive

Code Changes Might Be Required

The following fixed issues might require changes to your HiveQL code or your configuration:

  • HIVE-16324 - Truncate table should not work when EXTERNAL property of table is set to true (lower case). This fix changes how the EXTERNAL table property is interpreted by Hive. After this fix, if your current deployment sets the EXTERNAL table property to true (lower case), the external table will no longer be truncated. Read the Jira for further details.
  • HIVE-18879 - Disallow embedded elements in the UDFXPathUtil class. This fix might change how your XML parser works if you use embedded elements with the xpath UDFs. For example, if you use embedded elements with the following UDFs in your queries: xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number, or xpath_string.
Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • HIVE-8472 - Add SET LOCATION option to the ALTER DATABASE command
  • HIVE-10495 - Hive index creation code throws a NullPointerException if index table is null
  • HIVE-14786 - Option added to Beeline to display binary column data as a string or a byte array (--convertBinaryArrayToString=[true | false])
  • HIVE-14792 - Optimization added to minimize AvroSerde reads of the remote schema-file
  • HIVE-15329 - Fix for when a NullPointerException occurs during table creation
  • HIVE-15543 - Fix to prevent Hive fetching Spark memory/cores to decide parallelism when Spark dynamic allocation is enabled
  • HIVE-16601 - Display Session Id and Query Name / Id in Spark UI
  • HIVE-16663 - Added string caching for rows to Beeline
  • HIVE-16890 - Remove the superfluous wrapper from HiveVarcharWritable
  • HIVE-18228 - Azure credential properties added to the HiveConf hidden list
  • HIVE-18788 - Cleans up inputs in JDBC PreparedStatement (HivePreparedStatement) to fix SQL injection vulnerabilities

Hue

  • HUE-7860 - [core] Integrate non IO blocking Python Webserver
  • HUE-7913 - [autocomplete] Add variable locations to the autocomplete parser
  • HUE-7913 - [autocomplete] Report quoted variable locations with possible column references
  • HUE-7915 - [assist] Increase the frontend cache TTL to 10 days
  • HUE-7942 - [editor] FIX variables with incorrect placeholder.
  • HUE-7943 - [editor] Variable list are not being refreshed
  • HUE-8043 - [editor] Cancel a query using ctrl-enter shortcut

Apache Impala

  • IMPALA-4315 - Allow USE and SHOW TABLES if there is at least one table in a database where the user has table or column privileges.
  • IMPALA-4323 - The SET ROW FORMAT clause was added to the ALTER TABLE statement for the TEXT or SEQUENCE file formats.
  • IMPALA-4886 - Table metrics are available in the catalog web UI.
  • IMPALA-5654 - Disallows explicitly setting the Kudu table name property for managed Kudu tables in a CREATE TABLE and ALTER TABLE statements, e.g. CREATE TABLE t (i INT) STORED AS KUDU TBLPROPERTIES('kudu.table_name'='some_name').
  • IMPALA-6549 - The file handle cache, controlled by the max_cached_file_handles flag , is enabled by default.

Apache Kudu

  • KUDU-1613 - Fixed a scenario where the on-disk data of a tablet server was completely erased and a new tablet server was started on the same host. This issue could prevent tablet replicas previously hosted on the server from being evicted and re-replicated. Tablets now immediately evict replicas that respond with a different server UUID than expected.
  • KUDU-1927 - Fixed a rare race condition when connecting to masters during their startup which might cause a client to get a response without a CA certificate and/or authentication token. This would cause the client to fail to authenticate with other servers in the cluster. The leader master now always sends a CA certificate and an authentication token (when applicable) to a Kudu client with a successful ConnectToMaster response.
  • KUDU-2262 - The Kudu Java client now will retry a connection if no master is discovered as a leader, and the user has a valid authentication token. This avoids failure in recoverable cases when masters are in the process of the very first leader election after starting up.
  • KUDU-2264 -The Java client will now automatically attempt to re-acquire Kerberos credentials from the ticket cache when the prior credentials are about to expire. This allows client instances to persist longer than the expiration time of a single Kerberos ticket so long as some other process renews the credentials in the ticket cache. Documentation on interacting with Kerberos authentication has been added to the Javadoc for the AsyncKuduClient class.
  • KUDU-2265 - Follower masters are now able to verify authentication tokens even if they have never been a leader. Prior to this fix, if a follower master had never been a leader, clients would be unable to authenticate to that master, resulting in spurious error messages being logged.
  • KUDU-2295 - Fixed a tablet server crash when a tablet replica is deleted during a scan.
  • KUDU-2312 - The evaluation order of predicates in scans with multiple predicates has been made deterministic. Due to a bug, this was not necessarily the case previously. Predicates are applied in most to least selective order, with ties broken by column index. The evaluation order may change in the future, particularly when better column statistics are made available internally.
  • KUDU-2331 - Previously, the kudu tablet change_config move_replica tool required all tablet servers in the cluster to be available when performing a move. This restriction has been relaxed: only the tablet server that will receive a replica of the tablet being moved and the hosts of the tablet’s existing replicas need to be available for the move to occur.
  • KUDU-2343 - Fixed a bug in the Java client which prevented the client from locating the new leader master after a leader failover in the case that the previous leader either remained online or restarted quickly. This bug resulted in the client timing out operations with errors indicating that there was no leader master.
  • KUDU-2259 - The Unix process username of the client is now included inside the exported security credentials, so that the effective username of clients who import credentials and subsequently use unauthenticated (SASL PLAIN) connections matches the client who exported the security credentials. For example, this is useful to let the Spark executors know which username to use if the Spark driver has no authentication token. This change only affects clusters with encryption disabled using --rpc-encryption=disabled.

Apache Oozie

  • OOZIE-3173 - Coordinator job with frequency using cron syntax creates only one action in catchup mode
  • OOZIE-3183 - Better logging for SshActionExecutor and extended HA capability when calling to remote host

Apache Spark

  • SPARK-12297 - Convert Impala-written timestamps from UTC to local TZ ()
  • SPARK-22188 - [CORE] Adding security headers for preventing XSS, MitM and MIME sniffing
  • SPARK-23660 - Fix exception in yarn cluster mode when application ended fast

Apache Sqoop

  • SQOOP-3153 - Sqoop export with --as-<spec_file_format> will now display a verbose error message as these options are only valid for imports