What's New in Cloudera Documentation

What's New in Cloudera Documentation in May, 2019

This section describes new topics added and major changes made to Cloudera documentation in May, 2019:

Product What's New Link
Cloudera Navigator
Improved information on
  • Navigator TLS configuration
  • Configuring Cloudera Navigator for LDAP

What's New in Cloudera Documentation in April, 2019

This section describes new topics added and major changes made to Cloudera documentation in April, 2019:

Product What's New Link
Cloudera Navigator

Added detailed information on how metadata is extracted from multiple sources and combined to generate lineage for data assets, specifically for Hive. This section helps understand when you can expect lineage to appear after data assets are created on the cluster.

Metadata Extraction Timing

What's New in Cloudera Documentation in March, 2019

This section describes new topics added and major changes made to Cloudera documentation in March, 2019:

Product What's New Link
Cloudera Data Science Workbench Published a new Security Overview for Cloudera Data Science Workbench. This topic goes over the basics of the CDSW security model, the wildcard DNS requirement, and how authentication, authorization, and wire encryption work in Cloudera Data Science Workbench. CDSW Security Overview

What's New in Cloudera Documentation in February, 2019

This section describes new topics added and major changes made to Cloudera documentation in February, 2019:

Product What's New Link
Apache Impala Added a new section on load-balancing proxy in TLS-enabled cluster. Special Proxy Considerations for TLS/SSL-Enabled Clusters
Added a new section on enabling LDAP in Cloudera Manager. Enabling LDAP in Cloudera Manager
Updated docs to reflect decoupling of compute and storage in Hadoop clusters. Components of the Impala Server
Apache Kudu Added a recommendation to use nscd (name service caching daemon) for all name resolutions. Slow Name Resolution and nscd
Hue The Hue Guide has been completely re-organized and additional content added. Look for further improvements in the areas of performance tuning and reference architectures to be released soon.

What's New in Cloudera Documentation in January, 2019

This section describes new topics added and major changes made to Cloudera documentation in January, 2019:

Product What's New Link
Apache Spark ML Learn how to configure Spark ML to use native math libraries that accelerate model training speed for algorithms like Alternating Least Squares (ALS). Using Native Math Libraries to Accelerate Spark Machine Learning Applications
Workload XM The new Workload View feature enables you to break down workloads by specific criteria to perform deep-dive analysis on the queries. For example, you can use the Workload View feature to determine which users are executing workloads that do not adhere to SLAs. You can also examine how queries being sent to specific databases or that use specific pools are performing against SLAs.

What's New in Cloudera Documentation in November, 2018

This section describes new topics added and major changes made to Cloudera documentation in November, 2018:

Product What's New Link
Apache HBase Information about how to move the HBase Master role from one host to another. Moving HBase Master Role to Another Host
Cloudera Navigator Key Trustee KMS Added a new procedure that describes how to move a Key Trustee KMS proxy service role instance from an existing cluster host to another cluster host. This feature is for 5.16.1 and later only. Migrating a Key Trustee KMS Server Role Instance to a New Host
Workload Experience Manager (Workload XM) With the release of Cloudera Manager 5.16.1, new functionality has been added to Workload XM to redact logs and queries and for proxy server support. These new features can be enabled by configuring the Telemetry Publisher service in Cloudera Manager. In addition to these new features, now you can download the SQL commands to address "Corrupt Table Statistics" and "Missing Table Statistics" query health checks and numerous usability enhancements have also been added. What's New from Workload XM
Information about how to configure Workload XM to tunnel through a firewall in your environment. Configuring a Firewall for Workload XM
Detailed description of the diagnostic data collection performed by Workload XM. Workload XM Diagnostic Data Collection

What's New in Cloudera Documentation in October, 2018

This section describes new topics added and major changes made to Cloudera documentation in October, 2018:

Product What's New Link
Cloudera Data Science Workbench Released Cloudera Data Science Workbench 1.4.2.

This release fixes some critical bugs, including TSB-346: Risk of Data Loss on Cloudera Data Science Workbench Shutdown and Restart. Please read the TSB and Upgrade Notes carefully before you start upgrading or perform a shutdown/restart operation on any previous version of CDSW.

Apache HDFS Documented how to enable authorization for HDFS web UIs. Enabling Authorization for HDFS Web UIs
Apache Impala

Documented using a query option to set an execution time limit on queries.

Setting Time Limits on Long Running Queries

Documented the scheduler-related query hints and options support Kudu tablets.

Apache Kudu Noted that it is better to let Kudu manage its own striping over multiple devices rather than delegating the striping to a RAID-0 array. Kudu Configuration
Cloudera Navigator

Added detailed steps for streaming Navigator audit events to a Kafka topic.

Publishing Audit Events to Kafka

Added tips on how to get the most from your Navigator Audit Server implementation, including some maintenance steps that will help you make sure you are collecting the right audit events.

Maintaining Navigator Audit Server
Apache Sentry Updated the Sentry privilege tables for Hive and Impala. They now include all possible privileges on each possible scope. Privilege Tables for Hive and Impala

What's New in Cloudera Documentation in September, 2018

This section describes new topics added and major changes made to Cloudera documentation in September, 2018:

Product What's New Link
Apache Impala
Documented missing query options:
  • PARQUET_READ_STATISTICS
  • PARQUET_DICTIONARY_FILTERING

Added a table of all Impala functions with links to each function as an alternative approach to having a page for each built-in function.

Impala Built-In Functions

Reformatted the built-in functions docs format change for better readability.

Re-factored the Impala Authorization doc with the focus on Sentry privilege model and the de-emphasis on the policy file-based model.

Enabling Sentry Authorization for Impala
Apache Kudu

Added a best practice section in the Kudu-Spark Integration avoiding multiple Kudu clients per cluster.

Developing Applications With Apache Kudu

Added the troubleshooting info on detecting ext2 and ext3 filesystems.

Troubleshooting Apache Kudu
Apache YARN Added new topic that describes all aspects of creating and managing YARN ACLs. Managing YARN ACLs

What's New in Cloudera Documentation in August, 2018

This section describes new topics added and major changes made to Cloudera documentation in August, 2018:

Product What's New Link
Cloudera Data Science Workbench Added two new videos that demonstrate how to run experiments and deploy models with Cloudera Data Science Workbench.
Apache Sentry There is a new video on the Cloudera YouTube channel that shows how you can verify that your HDFS ACLs are synching with Sentry. The video also shows that URI privileges are not applied as ACLs in HDFS. How to verify that HDFS ACLs are synching with Sentry
CDH - YARN Updated the YARN tuning guide with new values. Tuning YARN
Cloudera Altus Description of Altus groups and their usage. Groups
Cloudera Navigator Added a new video that describes how to make sure your audit system is doing what you expect: are you collecting the right events? are you retaining them as long as you need them? are you archiving them where they are retrievable? Navigator Audit Checkup Video [Youtube]
Workload Experience Manager (Workload XM) Cloudera's Workload XM launched this month and with it a new documentation set that explains how to use this tool to gain in-depth understanding of the workloads you send to clusters managed by Cloudera Manager. It provides information that can be used for troubleshooting failed jobs and for optimizing slow jobs that run on those clusters. Workload Experience Manager

What's New in Cloudera Documentation in July, 2018

This section describes new topics added and major changes made to Cloudera documentation in July, 2018:

Product What's New Link
Cloudera Data Science Workbench

Released Cloudera Data Science Workbench 1.4 with new features: Experiments and Models

1.4 Release Notes

Reorganized documentation to align with major product components: Projects, Jobs, Experiments, Models, Engines, and Site Administration.

Improved LDAP/SAML experience with support for group filters. LDAP and SAML

New consolidated section for Engines in Cloudera Data Science Workbench.

Engines Overview

New topic that describes how engines are used for experiments and models.

Engines for Experiments & Models

This section also includes a topic that lists all the pre-installed packages in CDSW's Python and R kernels.

Pre-Installed Python and R Packages
Provided more code samples that demonstrate how to access cluster data from CDSW. Data Access
Cloudera Navigator Added a new video that gives a light-hearted look at the Cloudera Navigator brand and helps identify the value of each of the Navigator components. Navigator Brand Video [Youtube]
Reference Architectures Cloudera Reference Architectures are now available in HTML format. Reference Architectures
Apache Sentry After upgrading to CDH 5.13.0 and above, some customers experience a period of time in which HDFS ACLs are not synched. Possible reasons for this problem are explained in the Release Notes, along with affected versions and fixes.
Cloudera Altus Added a description of the optional ec2:DeleteKeyPair permission in the AWS cross-account role that determines how Altus generates key pairs for clusters. Key Pair Permissions on EC2

What's New in Cloudera Documentation in June, 2018

This section describes new topics added and major changes made to Cloudera documentation in June, 2018:

Product What's New Link
Apache Hive - HiveServer2 High Availability Added new command-line instructions for configuring a proxy load balancer to support HiveServer2 high availability on unmanaged clusters (those not managed by Cloudera Manager) with or without Kerberos. Configuring HiveServer2 to Load Balance Behind a Proxy on Unmanaged Clusters
Apache Sentry Clarified the GRANT ROLE statement on group name restrictions, such as character restrictions, how to use backticks with those restrictions, and OS group name requirements. GRANT ROLE Statement
Clarified the description of what happens to synchronized ACLs during Sentry service failure. HDFS/Sentry Synchronized Permissions
Added instructions for how to override Sentry's Kerberos prerequisite for the Hive metastore in Cloudera Manager. Securing the Hive Metastore
Added new Amazon S3 information on creating a table in a bucket. Creating a Table in a Bucket
Added new information on the privileges the Sentry Admin needs in HUE. Hive SQL Syntax for Use with Sentry
Added the SHOW CREATE VIEW operation to the Hive and Impala privilege tables. Authorization Privilege Model for Hive and Impala
Added a new example explaining how a user may see data from a database that they do not have access to if that data is in a view. Authorization Privilege Model for Hive and Impala
CDK Powered by Apache Kafka Updated Kafka Requirements and Supported Versions with additional information about compatibility:
  • among client and broker versions
  • between Spark and embedded Kafka clients.
Apache ZooKeeper Provided instructions for configuring the ZooKeeper server for Kerberos authentication using Cloudera Manager. Configuring ZooKeeper Server for Kerberos Authentication
Cloudera Manager Added a new procedure that describes how to migrate from the Cloudera Manager Embedded PostgreSQL database server to an external PostgreSQL database. Migrating from the Cloudera Manager Embedded PostgreSQL Database Server to an External PostgreSQL Database
Cloudera Navigator Added information on lineage in Navigator: what information is collected and how it is used to create lineage diagrams, what entities are captured in the diagrams, and how diagrams change through the lifecycle of data assets. Generating Lineage Diagrams
Cloudera Altus Added a description of a new Altus environment option for secure clusters. Enable Secure Clusters

What's New in Cloudera Documentation in May, 2018

This section describes new topics added and major changes made to Cloudera documentation in May, 2018:

Product What's New Link
Cloudera Upgrade Added a new interactive topic that walks you through the steps to upgrade Cloudera Manager. You can select your operating system, upgrade version, and database type and a customized page displays the steps for your upgrade. Upgrading Cloudera Manager Using Packages
Added a new interactive topic that walks you through the steps to upgrade CDH using Cloudera Manager. You can select your Cloudera Manager version, CDH upgrade version, and other information and a customized page displays the steps for your upgrade. Upgrading CDH
HDFS Transparent Encryption Extensively revised the KMS ACL topic, which now includes descriptions of all operations for each ACL class, as well as a diagram and explanation that guides readers through the process of how the KMS evaluates the various ACL classes. Configuring KMS Access Control Lists (ACLs)
Key Trustee KMS HA Added new documentation for a feature that provides logic to detect and warn users about a potential problem where the GPG private keys have not been properly synchronized across all Key Trustee KMS HA hosts.
Cloudera Navigator HSM KMS Added a new topic to guide users through the steps to upgrade an HSM KMS. Upgrading Cloudera Navigator HSM KMS
HBase Added new content that describes how to configure and enable cell-level ACLs for HBase. Configure Cell-Level Access Control Lists
Hue Added new content that clarifies how to migrate the Hue database for MariaDB and MySQL. MariaDB / MySQL
Cloudera Altus Added information about defining custom tags for clusters.

Added information about Altus support for CDH 5.14.

Creating a Cluster for AWS

Creating a Cluster for Azure

Restructured Altus documentation to create one doc set for Altus on AWS and Altus on Azure. Altus documentation now includes an Administration Guide and a Data Engineering Guide. Overview of Cloudera Altus

Overview of Altus Data Engineering

What's New in Cloudera Documentation in April, 2018

This section describes new topics added and major changes made to Cloudera documentation in April, 2018:

Product What's New Link
Cloudera Altus Added a new topic that describes how to set up an Altus trial account. Getting Started with a Trial Account

What's New in Cloudera Documentation in March, 2018

This section describes new topics added and major changes made to the Cloudera documentation library in March, 2018:

Product What's New Link
Cloudera Data Science Workbench Added a new video that demonstrates how to get started with a Cloudera Data Science Workbench built-in template project. CDSW Quickstart Demo [Youtube]
Added new Known Issues for Cloudera Manager and CDH integration. Known Issues
Added a new topic on migrating a CDSW Deployment to Another Host. Migrating a CDSW Deployment
Revamped the Backup topic with detailed instructions. Creating a Backup
Added a new topic on how to uninstall Cloudera Data Science Workbench. Uninstalling CDSW
JDK Requirements Added new section on Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction requirements. Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction
Navigator Added a new example for Navigator Audit Server on how to use audit events to determine what caused a schema change to a table. Use audit reports to identify the user or process that may be causing unwanted changes. Who ran which operation against a table?
Cloudera Director Added a new topic on using custom DNS names and DNS servers with auto-TLS. Using Custom DNS with Auto-TLS in AWS

What's New in Cloudera Documentation in February, 2018

This section describes new topics added and major changes made to Cloudera documentation in February, 2018:

Product What's New Link
Flume The Apache Flume content is moved to a new Flume Guide. Information for configuring, using, and managing Flume is consolidated in the Flume Guide. Flume Guide
HBase The Apache HBase content is moved to a new HBase Guide. All the information for configuring, managing, and troubleshooting HBase is in one central location. HBase Guide
Key HSM Added a new section describing the file naming convention used for encryption zone keys. Key Naming Convention
HDFS (Encryption) Added a new section describing how to resolve an error that can occur when the KMS jute buffer size is insufficient to hold all the tokens. KMS server jute buffer exception
Sentry The Apache Sentry content is moved to a new Sentry Guide. The Sentry Guide contains information on configuring, using, and troubleshooting Sentry, as well as how-to guides. Sentry Guide
Cloudera Altus Added a new topic that describes how to use the Cloudera Altus SDK for Java. Using the Altus SDK for Java

What's New in Cloudera Documentation in January, 2018

This section describes new topics added and major changes made to Cloudera documentation in January, 2018:

Product What's New Link
Cloudera Data Science Workbench Released Cloudera Data Science Workbench 1.3.0.
Impala Added tip about using Kudu Java API, instead of JDBC interface, for rapid insert operations. Configuring Impala to Work with JDBC
Added DATE_TRUNC() function. Impala Date and Time Functions
Added new upper limit for BATCH_SIZE query option. BATCH_SIZE Query Option
Added information about a new kind of runtime filter, the "min-max" filter, which applies to join queries involving Kudu tables. Using Impala to Query Kudu Tables
Added new conditional operators: IS [NOT] TRUE, IS [NOT] FALSE, and IS [NOT] UNKNOWN. SQL Operators
Added information about changes to the output of the SET statement, dividing the options into multiple groups, and hiding some groups by default. New SET ALL syntax shows all the option groups. SET Statement
Added a new impala-shell option --query_option and configuration file section [impala.query_options]. These features both allow specifying values for query options when starting impala-shell. impala-shell Configuration Options
Kafka Updated examples and removed deprecated properties for how to use Kafka with Flume. Using Kafka with Flume
Kafka Updated Kafka upgrade topic to include versions. Rolling Upgrade to Kafka 3.0.x
Key Trustee KMS Added new procedure for migrating from a Key Trustee KMS (KT KMS) to a Hardware Security Module KMS (HSM KMS). Migrating from a Key Trustee KMS to an HSM KMS
Cloudera Manager

Added information on using Cloudera Manager to configure credentials for cluster access to Microsoft ADLS. This access is enabled for running Hive and Impala queries on tables backed by data stored in ADLS and to browse ADLS data using Hue.

Configuring ADLS Access Using Cloudera Manager
Added information on how to enable performing minor maintenance on cluster hosts, Cloudera Manager now fully manages the host decommission and recommission process. You can specify whether or not to replicate under-replicated data blocks to other DataNodes to maintain the cluster's replication factor during a maintenance window. Tuning and Troubleshooting Host Decommissioning

Provided examples for how to use the API to manage BDR.

How To Automate BDR Replication with the Cloudera Manager API

Added a video walkthrough for how to add a cluster to Cloudera Manager.

How to Add a Cluster to Cloudera Manager [YouTube]

Cloudera Director

Added inforamtion on Cloudera Director 2.7 configuration option to point to an organization’s LDAP server so that users common credentials may be used to login to Cloudera Director. When enabling LDAP support, Cloudera Director’s built in user management is disabled.

Configuring Cloudera Director Server for LDAP and Active Directory

Added information that Cloudera Director can handle all aspects of Java installation on the instances that it allocates and configures for Cloudera Manager and CDH clusters, offering more flexibility while simplifying the process for users.

Deploying Java on Cluster Instances
Added a configuration option to Cloudera Director's AWS plugin to accommodate regions like GovCloud and China, where EC2 cannot tag instances upon creation. The documentation now includes the procedure for configuring the plugin to use this option. Configuring Tag-on-create for AWS GovCloud (US) and China (Beijing) Regions
Cloudera Navigator

Added information that metadata searches in Navigator now include the ability to group search results by common properties. Group by lets you use technical, managed, and custom metadata to quickly identify small files, active SQL users, table-creation trends, and other data aggregation trends revealed by metadata properties.

The documentation includes some examples of how grouping search results can help you understand trends in your data and to find specific data assets.

Grouping Search Results Using Metadata

Updated Navigator role names to more clearly reflect the privileges they provide. One specific change is that the privilege for editing the name and description metadata for Navigator entities is now part of the Managed & Custom Metadata Editor role. Users with that role or the Full Administrator role can add and update entity names and descriptions in the Navigator console.

Cloudera Navigator User Roles

Added information that audit filtering now allows a "not like" operator.

Filtering Audit Events
Added information on handling sensitive data that links to the Cloudera Manager log redaction details. Sensitive Data
The documentation now includes the specific metadata removed during Navigator Metadata Server purge tasks. What Metadata is Purged?
Kudu

Added new features and updates to Kudu administration:

  • You can now add data directories to an existing master or tablet server.
  • Kudu tablet servers are resilient to disk failures that occur on a disk storing data blocks.
  • The description of the workflow to migrate to multi-master cluster was improved with details and examples.
  • The description of the workflow to recover from a dead Kudu master was improved with details.
Kudu Administration
Specified how client applications connect to Kerberized Kudu servers. Client Authentication to Secure Kudu Clusters
Cloudera Altus Added a description of public keys for cluster creation. Creating a Cluster for AWS