What's New in CDH 5.11.x
What's New in CDH 5.11.1
This is a maintenance release that fixes some important issues. For details, see Issues Fixed in CDH 5.11.1.
What's New In CDH 5.11.0
- Supported Apache Tomcat TLS ciphers for HttpFS are configurable using the HTTPFS_SSL_CIPHERS environment variable.
- Supported Apache Tomcat TLS ciphers for the KMS are configurable using the KMS_SSL_CIPHERS environment variable.
- Amazon S3 Consistency with Metadata Caching (S3Guard)
Data written to Amazon S3 buckets is subject to the "eventual consistency" guarantee provided by Amazon Web Services (AWS), which means that data written to S3 may not be immediately available for queries and listing operations. This can cause failures in multi-step ETL workflows, where data from a previous step is not available to the next step. To mitigate these consistency issues you can now configure metadata caching for data stored in Amazon S3 using S3Guard. S3Guard requires that you provision a DynamoDB database from Amazon Web Services and configure S3Guard using the Cloudera Manager Admin Console or command-line tools. See Configuring and Managing S3Guard.
- Amazon S3 Server-side Encryption with SSE-KMS
Clusters that use Amazon S3 storage can now use Amazon Server-Side Encryption with AWS KMS–Managed Keys (SSE-KMS) to encrypt data, so you now have two choices for data-at-rest encryption on Amazon S3 (SSE-S3, SSE-KMS). Use Cloudera Manager Admin Console to configure the cluster to use this new feature as detailed in How to Configure Encryption for Amazon S3.
- You can partition RegionServers with RegionServer Groups.
- Use MOB_COMPACT_PARTITION_POLICY options to reduce the number of MOB files stored in HDFS. You can choose from daily, weekly, and monthly options.
Hive on Amazon S3 performance optimizations for:
HIVE-14204 : Dynamic partitioning writes and the INSERT OVERWRITE statement
HIVE-15546 : Parallel input path listing
Support for Microsoft Azure Data Lake Store (ADLS) as a secondary filesystem for Hive on MapReduce2 (YARN). You can use Hive on MapReduce2 to read and write data stored on ADLS. Hive-on-Spark is not currently supported to use ADLS data with CDH.
AWS cloud clusters can now share a single persistent instance of Amazon Relational Database Service (RDS) as the Hive metastore backend database, enabling persistent sharing of metadata beyond a cluster's life cycle.
Integrate Navigator with Hue: Phase 1, Metadata Discovery
- Search and tag partitions, databases, views, tables, columns.
- Off by default. Check both "Enable" fields in .
- See How to Enable and Use Navigator in Hue.
Embed new create table wizard within Editor and Assist
- Safely import multiple formats such as Kudu, Parquet, JSON, and CSV.
- More easily create table partitions.
- Continued SQL improvements
- Visually more pleasant colors and text.
- No more hanging spinner in the Editor.
HUE-5742: Allow non-public PostgreSQL schemas.
HUE-5608: Add ability to DESC table without TABLE level privilege
Apache Impala (incubating)
- Supported TLS ciphers for Apache Tomcat are configurable using the OOZIE_HTTPS_CIPHERS environment variable.
Blacklisting. This feature reduces the chance of application failure, by not scheduling work on hosts that are experiencing intermittent disk failures. See this blog post for background information.
You can enable Kerberos authentication and TLS/SSL encryption for the Spark History Server through Cloudera Manager configuration settings, rather than including the password in clear text in an Advanced Configuration Snippet field. See these settings in the Cloudera Manager user interface:
- history_server_spnego_enabled - for Kerberos authentication
With authentication enabled, only Kerberos-authorized users can read data from the Spark History Server, and non-admin users can only see information about their own jobs.
With TLS/SSL enabled, you provide the location of the keystore and its password, similar to the security configuration for other components.
Navigator lineage. The former Spark lineage extractor that was enabled through a safety valve is superceded by a more robust lineage collection mechanism. See Apache Spark Known Issues for some limitations and restrictions with this feature.
Support for Azure Data Lake Store (ADLS) as a secondary filesystem. You can use Spark jobs to read and write data stored on ADLS. Hive-on-Spark and Spark with Kudu are not currently supported for ADLS data.
- Supported TLS ciphers for Apache Tomcat are configurable using the SOLR_CIPHERS_CONFIG environment variable.
Server-Server Mutual Authentication
All ZooKeeper servers in an ensemble can now be configured to support quorum peer (server-server) mutual authentication, mitigating risk of spoofing by a rogue server on an unsecured network. The feature leverages Kerberos authentication through the SASL framework, so Kerberos is required.