Issues Fixed in CDH 5.15.x

The following topics describe issues fixed in CDH 5.15.x, from newest to oldest release. You can also review What's New in CDH 5.15.x or Known Issues in CDH 5.

Issues Fixed in CDH 5.15.1

Hadoop YARN Privilege Escalation CVE-2016-6811

Fixed a vulnerability in Hadoop YARN that allows a user who can escalate to the yarn user the ability to possibly run arbitrary commands as the root user. CVE-2016-6811

Apache HBase potential privilege escalation for user of HBase “Thrift 1” API Server over HTTP CVE-2018-8025

CVE-2018-8025 describes an issue in Apache HBase that affects the optional "Thrift 1" API server when running over HTTP. There is a race-condition that could lead to authenticated sessions being incorrectly applied to users, e.g. one authenticated user would be considered a different user or an unauthenticated user would be treated as an authenticated user.

Products affected: HBase Thrift Server

Releases affected:
  • CDH 5.4.x - 5.12.x
  • CDH 5.13.0, 5.13.1, 5.13.2, 5.13.3
  • CDH 5.14.0, 5.14.2, 5.14.3
  • CDH 5.15.0

Fixed versions: CDH 5.14.4, 5.15.1

Users affected: Users with the HBase Thrift 1 service role installed and configured to work in “thrift over HTTP” mode. For example, those using Hue with HBase impersonation enabled.

Severity: High

Potential privilege escalation.

CVE: CVE-2018-8025

Immediate action required: Upgrade to a CDH version with the fix, or, disable the HBase Thrift-over-HTTP service. Disabling the HBase Thrift-over-HTTP service will render Hue impersonation inoperable and all HBase access via Hue will be performed using the “hue” user instead of the authenticated user.

Knowledge article: For the latest update on this issue see the corresponding Knowledge article - TSB: 2018-315: Potential privilege escalation for user of HBase “Thrift 1” API Server over HTTP

Apache Hive CREATE TABLE EXTERNAL commands run very slowly on Amazon S3

When you run a CREATE EXTERNAL TABLE command on Amazon S3 storage, the command runs extremely slowly and might not complete. This usually occurs for tables that have pre-existing, nested data. There is no known workaround.

Affected Versions: CDH 5.15.0

Cloudera Bug: CDH-68833

Workaround: None

Fixed versions: CDH 5.15.1 and later

Cloudera Search restore operation puts shard replicas on same host

Restoring an Apache Solr collection sometimes places all shard replicas on the same host.

Cloudera Issue: CDH-68828

Zip Slip Vulnerability CVE-2018-8009

“Zip Slip” is a widespread arbitrary file overwrite critical vulnerability, which typically results in remote command execution. It was discovered and responsibly disclosed by the Snyk Security team ahead of a public disclosure on June 5, 2018, and affects thousands of projects.

Cloudera has analyzed our use of zip-related software, and has determined that only Apache Hadoop is vulnerable to this class of vulnerability in CDH 5. This has been fixed in upstream Hadoop as CVE-2018-8009.

hdfs snapshotDiff /.reserved/raw/... fails on snapshottable directories

Fixed an issue where the hdfs snapshotDiff command fails when the command is supplied with the raw path.

Cloudera Issue: CDH-66029

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.15.1:

Apache Flume

  • FLUME-2786, FLUME-3056, FLUME-3117 - Application enters a deadlock when stopped while handleConfigurationEvent
  • FLUME-2894 - Flume components should stop in the correct order
  • FLUME-2976 - Exception when JMS source tries to connect to a Weblogic server without authentication
  • FLUME-3222 - Fix for NoSuchFileException thrown when files are being deleted

Apache Hadoop

  • HADOOP-13024 - Distcp with -delete feature on raw data not implemented
  • HADOOP-13972 - ADLS to support per-store configuration
  • HADOOP-15186 - Allow Azure Data Lake SDK dependency version to be set on the command line
  • HADOOP-15317 - Improve NetworkTopology chooseRandom's loop
  • HADOOP-15342 - Updating ADLS connector to use SDK version 2.2.7
  • HADOOP-15356 - Make the HTTP timeout configurable in ADLS connector
  • HADOOP-15434 - Upgrade to ADLS SDK that exposes current timeout
  • HADOOP-15466 - Correct units in adl.http.timeout property
  • HDFS-9229 - Expose size of NameNode directory as a metric
  • HDFS-11751 - DFSZKFailoverController daemon exits with wrong status code
  • HDFS-11993 - Add log info when connect to datanode socket address failed
  • HDFS-12683 - DFSZKFailOverController re-order logic for logging Exception
  • HDFS-12710 - HTTPFS HTTP max header size env variable is not respected in branch-2
  • HDFS-12981 - Running renameSnapshot on a non-existent snapshot to itself should throw error
  • HDFS-13281 - Namenode#createFile should be /.reserved/raw/ aware
  • HDFS-13314 - NameNode should optionally exit if it detects FsImage corruption
  • MAPREDUCE-7094 - LocalDistributedCacheManager leaves classloaders open, which leaks File Descriptorss
  • YARN-4227 - Ignore expired containers from removed nodes in FairScheduler
  • YARN-4325 - NodeManager log handlers fail to send finished/failed events in some cases
  • YARN-4677 - RMNodeResourceUpdateEvent update from scheduler can lead to a race condition
  • YARN-5121 - Fix container-executor portability issues

Apache HBase

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • HBASE-20229 - ConnectionImplementation.locateRegions() returns duplicated entries when region replication is on
  • HBASE-20293 - get_splits returns duplicate split points when region replication is on
  • HBASE-20664 - Reduce the broad scope of outToken in ThriftHttpServlet

Apache Hive

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • HIVE-9915 - Allow specifying file format for managed tables
  • HIVE-15580 - Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark
  • HIVE-15682 - Eliminate per-row based dummy iterator creation
  • HIVE-15683 - Make configurable how Hive on Spark shuffles data for GROUP BY
  • HIVE-16080 - Add parquet to possible values for hive.default.fileformat and hive.default.fileformat.managed
  • HIVE-16659 - Query plan should reflect hive.spark.use.groupby.shuffle
  • HIVE-19041 - Thrift deserialization of Partition objects should intern fields
  • HIVE-19231 - Beeline generates garbled output when using UnsupportedTerminal
  • HIVE-19424 - NPE In MetaDataFormatters
  • HIVE-19668 - Over 30% of the heap is wasted by duplicate org.antlr.runtime.CommonTokens and duplicate strings
  • HIVE-19700 - Workaround for JLine issue with UnsupportedTerminal
  • HIVE-19870 - HCatalog dynamic partition query can fail if the table path is managed by Sentry

Hue

  • HUE-6697 - [jb] Prevent reset of job page tabs when job is running
  • HUE-7946 - [jb] Link to subworkflow on workflow dashboard page 404s in Hue 4
  • HUE-7989 - [useradmin] Provide better UI message and message in logs when ldap server down
  • HUE-8053 - Fix unit test
  • HUE-8053 - [useradmin] LDAP authentication with sync_groups_on_login=true fails with KeyError exception
  • HUE-8055 - [desktop] Support multiple LDAP servers in LDAP Test command
  • HUE-8063 - [fb] Better error message when HTTPFS is not working
  • HUE-8137 - [core] Refresh translations files for 4.2
  • HUE-8159 - [oozie] Unable to create Workflow/Schedule using Java document Action
  • HUE-8170 - [useradmin] Fix LDAP sync (ldap_access.py) certificate validation logic
  • HUE-8173 - [core] Add a warning log when a users switched to the old Hue 3 UI
  • HUE-8173 - [core] Add a log line when a user loads a page not via the load balancer even if we have one
  • HUE-8191 - [jb] UnicodeEncodeError occurs in Job Browser when browser language is not English
  • HUE-8202 - [jb] Fix mutual authentication failed with Isilon
  • HUE-8207 - [indexer] Issue previewing input file with unicode data
  • HUE-8217 - [dashboard] Fix HTML resultset widget templating
  • HUE-8232 - [oozie] Fix 500 error if coordinator associated workflow has been deleted
  • HUE-8236 - [core] Correct Hue config value use integer instead of math expression
  • HUE-8253 - [editor] Support downloading Query results with query names(file names) other than ISO-8859-1 charset
  • HUE-8253 - [editor] Fix the broken unit test in 5.15 due to conflict after backporting
  • HUE-8274 - [s3] Moving a folder using drag and drop deletes the folder itself
  • HUE-8280 - [fb] Move action button does not prevent move to itself
  • HUE-8280 - [fb] Fix the missed conflict issue after backporting
  • HUE-8282 - [fb] Check if EC2 instance before check IAM metadata
  • HUE-8305 - [useradmin] Optimize performance on checking Hue permissions if user is in many groups
  • HUE-8310 - Broken template for multiple custom apps
  • HUE-8313 - [core] Remove hardcoding to ImpersonationBackend when using embedded mode
  • HUE-8314 - [core] Fix SAML encryption missing config
  • HUE-8320 - [core] After copying a workflow using Hue 4 button, saving the copied workflow fails
  • HUE-8344 - [hbase] Hbase old version of data can not display in Hue
  • HUE-8350 - [solr] indexer app permission is not being acknowledged in HUE
  • HUE-8350 - [solr] indexer app permission is not being acknowledged in HUE
  • HUE-8370 - [pig] Imported old version Pig script missing properties fields
  • HUE-8392 - [oozie] Cannot add more actions using drag & drop from actions bar in the Oozie editor after adding around 3 actions
  • HUE-8404 - [useradmin] Fix multibackend invalid password removes drop down to select Local
  • HUE-8409 - [core] When idle session timeout is enabled it causes issues with Spnego
  • HUE-8440 - [jb] Link for Spark logs in Properties tab of Job Browser is incorrect
  • HUE-8455 - [pig] Oozie editor fails with 'hadoopProperties' for pig script saved in Hue 3

Apache Impala

  • IMPALA-6687 - Fixed INSERT with mixed case in partition column names.
  • IMPALA-6822 - Added a query option to control shuffling by distinct expressions.
  • IMPALA-6847 - Added a work around high memory estimates for admission control.
  • IMPALA-6908 - IsConnResetTException() should include ECONNRESET.
  • IMPALA-6934 - Corrected the wrong results with EXISTS subqueries that contain ORDER BY, LIMIT, and OFFSET.
  • IMPALA-7014 - Disabled the stacktrace symbolisation by default.
  • IMPALA-7078 - Reduced the queue size based on num_scanner_threads.
  • IMPALA-7078 - Improved memory consumption of wide Avro scans.
  • IMPALA-7288 - Fixed the Codegen crash in FinalizeModule.
  • IMPALA-7298 - Impala no longer passes IP address as hostname in Kerberos principal.

Apache Kudu

  • KUDU-2367 - Fixed an issue where a permanently failed tablet replica was not properly identified, which could cause the tablet not to re-replicate in very small clusters.
  • KUDU-2377 - Fixed an issue that caused Kudu servers to fail to start when RLIMIT_NPROC=-1.
  • KUDU-2378 - Fixed unaligned loads of int128 from rows.
  • KUDU-2379 - Fixed an issue that caused secure Spark jobs to fail.
  • KUDU-2416 - Fixed PartialRow.setMin.
  • KUDU-2443 - Fixed replica movement and replacement for RF=1.
  • KUDU-2447 - Fixed the tablet server crash with the error, "NONE predicate can not be pushed into key".
  • KUDU-2478 - Restored Python 2.6 compatibility.
  • Added the ability to adjust scan timeouts in Spark.
  • Increased the timeout to begin tablet copies, which improves Kudu's re-replication time when the cluster is busy.
  • Fixed a NullPointerException thrown when calling ColumnSchema#toString on non-decimal types.
  • Greatly improved the performance of many types of queries on tables from which many rows have been deleted.
  • Fixed an issue that caused partition pruning to be too conservative for queries from the Java client that use Decimal predicates.

Apache Oozie

  • OOZIE-2491 - oozie acl cannot specify group,it does not work
  • OOZIE-3134 - Potential inconsistency between the in-memory SLA map and the Oozie database
  • OOZIE-3260 - [sla] Remove stale item above max retries on JPA related errors from in-memory SLA map

Apache Parquet

  • PARQUET-1246 - Ignore float/double statistics in case of NaN

Apache Sentry

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • SENTRY-1209 - Sentry does not block Hive's cross-schema table renames
  • SENTRY-2020 - Fix testConsumeCycleWithInsufficientPrivileges test failure in kafka e2e tests
  • SENTRY-2144 - Table Rename Cross Database should update permission correctly
  • SENTRY-2165 - NotificationProcesser process notification methods have logs wrongly flagged as ERROR
  • SENTRY-2183 - Increase default sentry-hdfs rpc timeout to 20 mins
  • SENTRY-2184 - Performance Issue: MPath is queried for each MAuthzPathsMapping in full snapshot
  • SENTRY-2226 - Support Hive operation ALTER TABLE EXCHANGE.
  • SENTRY-2269 - Make SentryStore pluggable
  • SENTRY-2299 - NPE In Sentry HDFS Sync Plugin
  • SENTRY-2310 - Sentry is not be able to fetch full update subsequently, when there is HMS restart in the snapshot process

Apache Solr

  • SOLR-12290 - Do not close any servlet streams and improve our servlet stream closing prevention code for users and devs.
  • SOLR-12293 - Updates need to use their own connection pool to maintain connection reuse and prevent spurious recoveries.

Apache Spark

  • SPARK-12504 - [SQL] Masking credentials in the sql plan explain output for JDBC data sources
  • SPARK-12652 - [PYSPARK] Upgrade Py4J to 0.9.1
  • SPARK-13709 - [SQL] Initialize deserializer with both table and partition properties when reading partitioned tables
  • SPARK-13807 - De-duplicate `Python*Helper` instantiation code in PySpark streaming
  • SPARK-13848 - [SPARK-5185] Update to Py4J 0.9.2 in order to fix classloading issue
  • SPARK-15061 - [PYSPARK] Upgrade to Py4J 0.10.1
  • SPARK-16781 - [PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment
  • SPARK-17960 - [PYSPARK][UPGRADE TO PY4J 0.10.4]
  • SPARK-19822 - [TEST] CheckpointSuite.testCheckpointedOperation: should not filter checkpointFilesOfLatestTime with the PATH string
  • SPARK-20862 - [MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel
  • SPARK-21278 - [PYSPARK] Upgrade to Py4J 0.10.6
  • SPARK-22429 - [STREAMING] Streaming checkpointing code does not retry after failure
  • SPARK-23852 - [SQL] Add test that fails if PARQUET-1217 is not fixed

Apache Sqoop

  • SQOOP-2567 - SQOOP import for Oracle fails with invalid precision/scale for decimal
  • SQOOP-3082 - Sqoop import fails after TCP connection reset if split by datetime column

Apache Zookeeper

  • ZOOKEEPER-2375 - Prevent multiple initialization of login object in each ZooKeeperSaslClient instance

Issues Fixed in CDH 5.15.0

Apache Hive vulnerabilities CVE-2018-1282 and CVE-2018-1284

This security bulletin, TSB 2018-299, covers two vulnerabilities that have been addressed in CDH 5.15.0 and later:
  • CVE-2018-1282: JDBC driver is susceptible to SQL injection attack if the input parameters are not properly cleaned
  • CVE-2018-1284: Hive UDF series UDFXPathXXXX allows users to pass carefully crafted XML to access arbitrary files

For further details, see the original Known Issue for Hive.

Apache Oozie Server vulnerability CVE-2017-15712

A vulnerability in the Oozie Server allows a cluster user to read private files owned by the user running the Oozie Server process.

Products affected: Oozie

Releases affected: All releases prior to CDH 5.12.0, CDH 5.12.0, CDH 5.12.1, CDH 5.12.2, CDH 5.13.0, CDH 5.13.1, CDH 5.14.0

Users affected: Users running the Oozie Server

Date/time of detection: November 13, 2017

Detected by: Daryn Sharp and Jason Lowe of Oath (formerly Yahoo! Inc)

Severity: (Low/Medium/High) High

Impact: The vulnerability allows a cluster user to read private files owned by the user running the Oozie Server process. The malicious user can construct a workflow XML file containing XML directives and configuration that reference sensitive files on the Oozie server host.

CVE: CVE-2017-15712

Immediate action required: Upgrade to release where the issue is fixed.

Addressed in release/refresh/patch: CDH 5.13.2 and higher, 5.14.2 and higher, 5.15.0 and higher

Apache Hadoop MapReduce Job History Server (JHS) vulnerability CVE-2017-15713

A vulnerability in Hadoop’s Job History Server allows a cluster user to expose private files owned by the user running the MapReduce Job History Server (JHS) process. See http://seclists.org/oss-sec/2018/q1/79 for reference.

Products affected: Apache Hadoop MapReduce

Releases affected: All releases prior to CDH 5.12.0. CDH 5.12.0, CDH 5.12.1, CDH 5.12.2, CDH 5.13.0, CDH 5.13.1, CDH 5.14.0

Users affected: Users running the MapReduce Job History Server (JHS) daemon

Date/time of detection: November 8, 2017

Detected by: Man Yue Mo of lgtm.com

Severity (Low/Medium/High): High

Impact: The vulnerability allows a cluster user to expose private files owned by the user running the MapReduce Job History Server (JHS) process. The malicious user can construct a configuration file containing XML directives that reference sensitive files on the MapReduce Job History Server (JHS) host.

CVE: CVE-2017-15713

Immediate action required: Upgrade to a release where the issue is fixed.

Addressed in release/refresh/patch: CDH 5.13.2 and higher, 5.14.2 and higher, 5.15.0 and higher

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.15.0:

Apache Avro

  • AVRO-2109 - Reset buffers in case of IOException

Apache Flume

  • FLUME-2442 - Need an alternative to providing clear text passwords in flume config
  • FLUME-2917 - Provide netcat UDP source as alternative to TCP

Apache Hadoop

  • HADOOP-12886 - Exclude weak ciphers in SSLFactory through ssl-server.xml
  • HADOOP-14966 - Handle JDK-8071638 for hadoop-common
  • HADOOP-15085 - Output streams closed with IOUtils suppressing write errors
  • HADOOP-15113 - NPE in S3A getFileStatus: null instrumentation on using closed instance.
  • HADOOP-15149 - CryptoOutputStream should implement StreamCapabilities.
  • HADOOP-15161 - Streaming and common statistics missing from S3A metrics
  • HADOOP-15185 - Update ADLS connector to use the current version of ADLS SDK.
  • HADOOP-15206 - BZip2 drops and duplicates records when input split size is small
  • HDFS-1172 - Blocks in newly completed files are considered under-replicated too quickly
  • HDFS-7764 - DirectoryScanner shouldn't abort the scan if one directory had an error
  • HDFS-8693 - refreshNamenodes does not support adding a new standby to a running DN
  • HDFS-9023 - When NN is not able to identify DN for replication, reason behind it can be logged.
  • HDFS-10453 - ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
  • HDFS-10690 - Optimize insertion/removal of replica in ShortCircuitCache
  • HDFS-11187 - Optimize disk access for last partial chunk checksum of Finalized replica
  • HDFS-11494 - Log message when DN is not selected for block replication
  • HDFS-11576 - Block recovery will fail indefinitely if recovery time > heartbeat interval
  • HDFS-11847 - Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning.
  • HDFS-11848 - Enhance dfsadmin listOpenFiles command to list files under a given path
  • HDFS-12318 - Fix IOException condition for openInfo in DFSInputStream
  • HDFS-12323 - NameNode terminates after full GC thinking QJM unresponsive if full GC is much longer than timeout
  • HDFS-12832 - INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit.
  • HDFS-12881 - Output streams closed with IOUtils suppressing write errors
  • HDFS-12910 - Secure Datanode Starter should log the port when it fails to bind
  • HDFS-13112 - Token expiration edits may cause log corruption or deadlock
  • HDFS-13170 - Port webhdfs unmaskedpermission parameter to HTTPFS
  • MAPREDUCE-6711 - JobImpl fails to handle preemption events on state COMMITTING

Apache HBase

Code Changes Might Be Required

The following fixed issues might require changes to your HBase code or your configuration:

  • HBASE-15128 - Disable region splits and merges switch in master.

    This might require code changes if org.apache.hadoop.hbase.client.Admin was subclassed.

  • HBASE-15866 - Split hbase.rpc.timeout into *.read.timeout and *.write.timeout

    This might require code changes if you want to specify different values for RPC read and write timeouts.

  • HBASE-16008 - A robust way deal with early termination of HBCK.

    This might require code changes if org.apache.hadoop.hbase.client.Admin was subclassed.

Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • HBASE-14252 - RegionServers fail to start when setting hbase.ipc.server.callqueue.scan.ratio to 0
  • HBASE-19163 - "Maximum lock count exceeded" from region server's batch processing
  • HBASE-19440 - Not able to enable balancer with RSGroups once disabled
  • HBASE-19886 - Display maintenance mode in shell, web UI

Apache Hive

Code Changes Might Be Required

The following fixed issues might require changes to your HiveQL code or your configuration:

  • HIVE-16324 - Truncate table should not work when EXTERNAL property of table is set to true (lower case). This fix changes how the EXTERNAL table property is interpreted by Hive. After this fix, if your current deployment sets the EXTERNAL table property to true (lower case), the external table will no longer be truncated. Read the Jira for further details.
  • HIVE-18879 - Disallow embedded elements in the UDFXPathUtil class. This fix might change how your XML parser works if you use embedded elements with the xpath UDFs. For example, if you use embedded elements with the following UDFs in your queries: xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number, or xpath_string.
Code Changes Should Not Be Required

The following fixes should not require code changes, but they contain improvements that might enhance your deployment:

  • HIVE-8472 - Add SET LOCATION option to the ALTER DATABASE command
  • HIVE-10495 - Hive index creation code throws a NullPointerException if index table is null
  • HIVE-14786 - Option added to Beeline to display binary column data as a string or a byte array (--convertBinaryArrayToString=[true | false])
  • HIVE-14792 - Optimization added to minimize AvroSerde reads of the remote schema-file
  • HIVE-15329 - Fix for when a NullPointerException occurs during table creation
  • HIVE-15543 - Fix to prevent Hive fetching Spark memory/cores to decide parallelism when Spark dynamic allocation is enabled
  • HIVE-16601 - Display Session Id and Query Name / Id in Spark UI
  • HIVE-16663 - Added string caching for rows to Beeline
  • HIVE-16890 - Remove the superfluous wrapper from HiveVarcharWritable
  • HIVE-18228 - Azure credential properties added to the HiveConf hidden list
  • HIVE-18788 - Cleans up inputs in JDBC PreparedStatement (HivePreparedStatement) to fix SQL injection vulnerabilities

Hue

  • HUE-7860 - [core] Integrate non IO blocking Python Webserver
  • HUE-7913 - [autocomplete] Add variable locations to the autocomplete parser
  • HUE-7913 - [autocomplete] Report quoted variable locations with possible column references
  • HUE-7915 - [assist] Increase the frontend cache TTL to 10 days
  • HUE-7942 - [editor] FIX variables with incorrect placeholder.
  • HUE-7943 - [editor] Variable list are not being refreshed
  • HUE-8043 - [editor] Cancel a query using ctrl-enter shortcut

Apache Impala

  • IMPALA-4315 - Allow USE and SHOW TABLES if there is at least one table in a database where the user has table or column privileges.
  • IMPALA-4323 - The SET ROW FORMAT clause was added to the ALTER TABLE statement for the TEXT or SEQUENCE file formats.
  • IMPALA-4886 - Table metrics are available in the catalog web UI.
  • IMPALA-5654 - Disallows explicitly setting the Kudu table name property for managed Kudu tables in a CREATE TABLE and ALTER TABLE statements, e.g. CREATE TABLE t (i INT) STORED AS KUDU TBLPROPERTIES('kudu.table_name'='some_name').
  • IMPALA-6549 - The file handle cache, controlled by the max_cached_file_handles flag , is enabled by default.

Apache Kudu

  • KUDU-1613 - Fixed a scenario where the on-disk data of a tablet server was completely erased and a new tablet server was started on the same host. This issue could prevent tablet replicas previously hosted on the server from being evicted and re-replicated. Tablets now immediately evict replicas that respond with a different server UUID than expected.
  • KUDU-1927 - Fixed a rare race condition when connecting to masters during their startup which might cause a client to get a response without a CA certificate and/or authentication token. This would cause the client to fail to authenticate with other servers in the cluster. The leader master now always sends a CA certificate and an authentication token (when applicable) to a Kudu client with a successful ConnectToMaster response.
  • KUDU-2262 - The Kudu Java client now will retry a connection if no master is discovered as a leader, and the user has a valid authentication token. This avoids failure in recoverable cases when masters are in the process of the very first leader election after starting up.
  • KUDU-2264 -The Java client will now automatically attempt to re-acquire Kerberos credentials from the ticket cache when the prior credentials are about to expire. This allows client instances to persist longer than the expiration time of a single Kerberos ticket so long as some other process renews the credentials in the ticket cache. Documentation on interacting with Kerberos authentication has been added to the Javadoc for the AsyncKuduClient class.
  • KUDU-2265 - Follower masters are now able to verify authentication tokens even if they have never been a leader. Prior to this fix, if a follower master had never been a leader, clients would be unable to authenticate to that master, resulting in spurious error messages being logged.
  • KUDU-2295 - Fixed a tablet server crash when a tablet replica is deleted during a scan.
  • KUDU-2312 - The evaluation order of predicates in scans with multiple predicates has been made deterministic. Due to a bug, this was not necessarily the case previously. Predicates are applied in most to least selective order, with ties broken by column index. The evaluation order may change in the future, particularly when better column statistics are made available internally.
  • KUDU-2331 - Previously, the kudu tablet change_config move_replica tool required all tablet servers in the cluster to be available when performing a move. This restriction has been relaxed: only the tablet server that will receive a replica of the tablet being moved and the hosts of the tablet’s existing replicas need to be available for the move to occur.
  • KUDU-2343 - Fixed a bug in the Java client which prevented the client from locating the new leader master after a leader failover in the case that the previous leader either remained online or restarted quickly. This bug resulted in the client timing out operations with errors indicating that there was no leader master.
  • KUDU-2259 - The Unix process username of the client is now included inside the exported security credentials, so that the effective username of clients who import credentials and subsequently use unauthenticated (SASL PLAIN) connections matches the client who exported the security credentials. For example, this is useful to let the Spark executors know which username to use if the Spark driver has no authentication token. This change only affects clusters with encryption disabled using --rpc-encryption=disabled.

Apache Oozie

  • OOZIE-3173 - Coordinator job with frequency using cron syntax creates only one action in catchup mode
  • OOZIE-3183 - Better logging for SshActionExecutor and extended HA capability when calling to remote host

Apache Spark

  • SPARK-12297 - Convert Impala-written timestamps from UTC to local TZ ()
  • SPARK-22188 - [CORE] Adding security headers for preventing XSS, MitM and MIME sniffing
  • SPARK-23660 - Fix exception in yarn cluster mode when application ended fast

Apache Sqoop

  • SQOOP-3153 - Sqoop export with --as-<spec_file_format> will now display a verbose error message as these options are only valid for imports