What's New In CDH 5.9.x

What's New in CDH 5.9.3

This is a maintenance release that fixes some important issues. For details, see Issues Fixed in CDH 5.9.3.

What's New in CDH 5.9.2

This is a maintenance release that fixes some important issues. For details, see Issues Fixed in CDH 5.9.2.

What's New in CDH 5.9.1

This is a maintenance release that fixes some important issues. For details, see Issues Fixed in CDH 5.9.1.

What's New in CDH 5.9.0

Apache Hadoop

  • CDH 5.9 supports temporary credentials obtained from AWS Security Token Service (STS) to log in to Amazon S3.

Apache HBase

  • A tool has been added--org.apache.hadoop.hbase.replication.regionserver.DumpReplicationQueues--to dump existing replication peers, configurations, and queues when using HBase replication. The tool includes two flags:
    • --distributed - Polls each replication server for information about the replication queues being processed on this replication server. By default, this is not enabled, and the information about the replication queues and configuration is obtained from ZooKeeper.
    • --hdfs When --distributed is used, this flag attempts to calculate the total size of the WAL files used by the replication queues. Because multiple peers can be configured, this value can be overestimated.

    For more information, see Class DumpReplicationQueues.

  • Metrics have been added that expose the amount of replayed work occurring in the HBase replication system. For more information on these metrics, see Replication Metrics in the Apache HBase Reference Guide.

Apache Hive

Hue

  • HUE-2915: Integrates Hue with Amazon S3. You can now access both S3 and HDFS in the File Browser, create tables from files in S3, and save query results in S3. See how to Enable S3 Cloud Storage.

  • HUE-4039: Improves SQL Autocompleter. The new Autocompleter deeply understands Hive and Impala SQL dialects and provides smart suggestions based on statement structure and cursor position. See how to manually Enable and Disable Autocompleter.

  • HUE-3877: Adds support for Amazon RDS. You can now deploy Hue against an Amazon RDS database instance with MySQL, PostgreSQL, and Oracle engines.

  • Rebase of Hue on upstream Hue 3.11.

Apache Impala

  • Performance improvements:

    • [IMPALA-3206] Speedup for queries against DECIMAL columns in Avro tables. The code that parses DECIMAL values from Avro now uses native code generation.

    • [IMPALA-3674] Improved efficiency in LLVM code generation can reduce codegen time, especially for short queries.

    • [IMPALA-2979] Improvements to scheduling on worker nodes, enabled by the REPLICA_PREFERENCE query option. See REPLICA_PREFERENCE Query Option (CDH 5.9 or higher only) for details.

  • [IMPALA-1683] The REFRESH statement can be applied to a single partition, rather than the entire table. See REFRESH Statement and Refreshing a Single Partition for details.

  • Improvements to the Impala web user interface:

    • [IMPALA-2767] You can now force a session to expire by clicking a link in the web UI, on the /sessions tab.

    • [IMPALA-3715] The /memz tab includes more information about Impala memory usage.

    • [IMPALA-3716] The Details page for a query now includes a Memory tab.

  • [IMPALA-3499] Scalability improvements to the catalog server. Impala handles internal communication more efficiently for tables with large numbers of columns and partitions, where the size of the metadata exceeds 2 GiB.

  • [IMPALA-3677] You can send a SIGUSR1 signal to any Impala-related daemon to write a Breakpad minidump. For advanced troubleshooting, you can now produce a minidump without triggering a crash. See Breakpad Minidumps for Impala (CDH 5.8 or higher only) for details about the Breakpad minidump feature.

  • [IMPALA-3687] The schema reconciliation rules for Avro tables have changed slightly for CHAR and VARCHAR columns. Now, if the definition of such a column is changed in the Avro schema file, the column retains its CHAR or VARCHAR type as specified in the SQL definition, but the column name and comment from the Avro schema file take precedence. See Creating Avro Tables for details about column definitions in Avro tables.

  • [IMPALA-3575] Some network operations now have additional timeout and retry settings. The extra configuration helps avoid failed queries for transient network problems, to avoid hangs when a sender or receiver fails in the middle of a network transmission, and to make cancellation requests more reliable despite network issues.

Apache Oozie

Oozie adds a new database tool for migration and upgrade from Apache Derby (or any other supported database). For more information, see How to Use the New Apache Oozie Migration Tool.

Apache Sentry

  • Sentry supports data on Amazon RDS and can secure URIs with an RDS schema.
  • SENTRY-1233 - Logging improvements for SentryConfigToolSolr.
  • SENTRY-1119 - Allow data engines to obtain the ActionFactory directly from the configuration, instead of having hardcoded component-specific classes. This will allow external data engines to integrate with Sentry easily.
  • SENTRY-1229 - Added a basic configurable cache to SentryGenericProviderBackend.

Apache Spark

  • You can now set up AWS credentials for Spark with the Hadoop credential provider, to avoid exposing the AWS secret key in configuration files.

Apache Sqoop

  • The mainframe import module extension has been added to support data sets on tape.

Cloudera Search

  • The Solr watchdog is now configured to use the fully qualified domain name (FQDN) of the host on which the Solr process is running (instead of 127.0.0.1). You can override this configuration by setting SOLR_HOSTNAME environment variable to appropriate value (before starting the Solr server).
  • Cloudera Search adds support for index snapshots. For more information on how to back up, migrate, or restore your indexed data, see Backing Up and Restoring Cloudera Search.