What's New in Cloudera Documentation

What's New in Cloudera Documentation in September, 2018

This section describes new topics added and major changes made to the Cloudera documentation library in September, 2018:

Product What's New Link
Apache Impala
  • Documented missing query options:
    • PARQUET_READ_STATISTICS
    • PARQUET_DICTIONARY_FILTERING
  • Added a table of all Impala functions with links to each function as an alternative approach to having a page for each built-in function.
  • Reformatted the built-in functions docs format change for better readability.
  • Re-factored the Impala Authorization doc with the focus on Sentry privilege model and the de-emphasis on the policy file-based model.
Apache Kudu
  • Added a best practice section in the Kudu-Spark Integration avoiding multiple Kudu clients per cluster.
  • Added the troubleshooting info on detecting ext2 and ext3 filesystems.
Apache YARN Added new topic that describes all aspects of creating and managing YARN ACLs. Managing YARN ACLs

What's New in Cloudera Documentation in August, 2018

This section describes new topics added and major changes made to the Cloudera documentation library in August, 2018:

Product What's New Link
Cloudera Data Science Workbench Added two new videos that demonstrate how to run experiments and deploy models with Cloudera Data Science Workbench.
Apache Sentry There is a new video on the Cloudera YouTube channel that shows how you can verify that your HDFS ACLs are synching with Sentry. The video also shows that URI privileges are not applied as ACLs in HDFS. How to verify that HDFS ACLs are synching with Sentry
CDH - YARN Updated the YARN tuning guide with new values. Tuning YARN
Cloudera Altus Description of Altus groups and their usage. Groups
Cloudera Navigator Added a new video that describes how to make sure your audit system is doing what you expect: are you collecting the right events? are you retaining them as long as you need them? are you archiving them where they are retrievable? Navigator Audit Checkup Video [Youtube]
Workload Experience Manager (Workload XM) Cloudera's Workload XM launched this month and with it a new documentation set that explains how to use this tool to gain in-depth understanding of the workloads you send to clusters managed by Cloudera Manager. It provides information that can be used for troubleshooting failed jobs and for optimizing slow jobs that run on those clusters. Workload Experience Manager

What's New in Cloudera Documentation in July, 2018

This section describes new topics added and major changes made to the Cloudera documentation library in July, 2018:

Product What's New Link
Cloudera Data Science Workbench
  • Released Cloudera Data Science Workbench 1.4 with new features: Experiments and Models.
  • Reorganized documentation to align with major product components: Projects, Jobs, Experiments, Models, Engines, and Site Administration.
Improved LDAP/SAML experience with support for group filters. LDAP and SAML
  • New consolidated section for Engines in Cloudera Data Science Workbench.
  • New topic that describes how engines are used for experiments and models.
  • This section also includes a topic that lists all the pre-installed packages in CDSW's Python and R kernels.
More code samples that demonstrate how to access cluster data from CDSW. Data Access
Cloudera Navigator Added a new video that gives a light-hearted look at the Cloudera Navigator brand and helps identify the value of each of the Navigator components. Navigator Brand Video [Youtube]
Reference Architectures Cloudera Reference Architectures are now available in HTML format. Reference Architectures
Apache Sentry After upgrading to CDH 5.13.0 and above, some customers experience a period of time in which HDFS ACLs are not synched. There are two possible reasons for this problem, which are explained in the Release Notes, along with affected versions and fixes.
Cloudera Altus Description of the optional ec2:DeleteKeyPair permission in the AWS cross-account role that determines how Altus generates key pairs for clusters. Key Pair Permissions on EC2

What's New in Cloudera Documentation in June, 2018

This section describes new topics added and major changes made to the Cloudera documentation library in June, 2018:

Product What's New Link
Apache Hive - HiveServer2 High Availability Added new command-line instructions for configuring a proxy load balancer to support HiveServer2 high availability on unmanaged clusters (those not managed by Cloudera Manager) with or without Kerberos. Configuring HiveServer2 to Load Balance Behind a Proxy on Unmanaged Clusters
Apache Sentry The GRANT ROLE statement contains clarifications on group name restrictions, such as character restrictions, how to use backticks with those restrictions, and OS group name requirements. GRANT ROLE Statement
The description of what happens to synchronized ACLs during Sentry service failure has been clarified. HDFS/Sentry Synchronized Permissions
Instructions have been added for how to override Sentry's Kerberos prerequisite for the Hive metastore in Cloudera Manager. Securing the Hive Metastore
New Amazon S3 information has been added on creating a table in a bucket. Creating a Table in a Bucket
New information on the privileges the Sentry Admin needs in HUE. Hive SQL Syntax for Use with Sentry
The SHOW CREATE VIEW operation was added to the Hive and Impala privilege tables. Authorization Privilege Model for Hive and Impala
A new example explains how a user may see data from a database that they do not have access to if that data is in a view. Authorization Privilege Model for Hive and Impala
CDK Powered by Apache Kafka In addition to compatibility information for Flume, the Kafka Requirements and Supported Versions includes additional information about compatibility among client and broker versions and between Spark and embedded Kafka clients.

Kafka Client-Broker Compatibility across Kafka Versions

Kafka Client Versions used by Apache Spark in CDH

Apache ZooKeeper Instructions have been added for configuring the ZooKeeper server for Kerberos authentication using Cloudera Manager. Configuring ZooKeeper Server for Kerberos Authentication
Cloudera Manager New procedure that describes how to migrate from the Cloudera Manager Embedded PostgreSQL database server to an external PostgreSQL database. Migrating from the Cloudera Manager Embedded PostgreSQL Database Server to an External PostgreSQL Database
Cloudera Navigator Lineage in Navigator: what information is collected and how it is used to create lineage diagrams, what entities are captured in the diagrams, and how diagrams change through the lifecycle of data assets. Generating Lineage Diagrams
Cloudera Altus Description of a new Altus environment option for secure clusters. Enable Secure Clusters

What's New in Cloudera Documentation in May, 2018

This section describes new topics added and major changes made to the Cloudera documentation library in May, 2018:

Product What's New Link
Cloudera Upgrade Added a new interactive topic that walks you through the steps to upgrade Cloudera Manager. You can select your operating system, upgrade version, and database type and a customized page displays the steps for your upgrade. Upgrading Cloudera Manager Using Packages
Added a new interactive topic that walks you through the steps to upgrade CDH using Cloudera Manager. You can select your Cloudera Manager version, CDH upgrade version, and other information and a customized page displays the steps for your upgrade. Upgrading CDH
HDFS Transparent Encryption Extensively revised the KMS ACL topic, which now includes descriptions of all operations for each ACL class, as well as a diagram and explanation that guides readers through the process of how the KMS evaluates the various ACL classes. Configuring KMS Access Control Lists (ACLs)
Key Trustee KMS HA Added new documentation for a feature that provides logic to detect and warn users about a potential problem where the GPG private keys have not been properly synchronized across all Key Trustee KMS HA hosts.
Cloudera Navigator HSM KMS Added a new topic to guide users through the steps to upgrade an HSM KMS. Upgrading Cloudera Navigator HSM KMS
HBase Added new content that describes how to configure and enable cell-level ACLs for HBase. Configure Cell-Level Access Control Lists
Hue Added new content that clarifies how to migrate the Hue database for MariaDB and MySQL. MariaDB / MySQL
Cloudera Altus Added information about defining custom tags for clusters.

Added information about Altus support for CDH 5.14.

Creating a Cluster for AWS

Creating a Cluster for Azure

Restructured Altus documentation to create one doc set for Altus on AWS and Altus on Azure. Altus documentation now includes an Administration Guide and a Data Engineering Guide. Overview of Cloudera Altus

Overview of Altus Data Engineering

What's New in Cloudera Documentation in April, 2018

This section describes new topics added and major changes made to the Cloudera documentation library in April, 2018:

Product What's New Link
Cloudera Altus The Altus documentation includes a new topic that describes how to set up an Altus trial account. Getting Started with a Trial Account

What's New in Cloudera Documentation in March, 2018

This section describes new topics added and major changes made to the Cloudera documentation library in March, 2018:

Product What's New Link
Cloudera Data Science Workbench Added a new video that demonstrates how to get started with a Cloudera Data Science Workbench built-in template project. CDSW Quickstart Demo [Youtube]
New Known Issues added for Cloudera Manager and CDH integration. Known Issues
Added a new topic on migrating a CDSW Deployment to Another Host. Migrating a CDSW Deployment
Revamped the Backup topic with detailed instructions. Creating a Backup
Added a new topic on how to uninstall Cloudera Data Science Workbench. Uninstalling CDSW
JDK Requirements Added new section on Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction requirements. Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction
Navigator Navigator Audit Server documentation includes a new example of how to use audit events to determine what caused a schema change to a table. Use audit reports to identify the user or process that may be causing unwanted changes. Who ran which operation against a table?
Cloudera Director Added a new topic on using custom DNS names and DNS servers with auto-TLS. Using Custom DNS with Auto-TLS in AWS

What's New in Cloudera Documentation in February, 2018

This section describes new topics added and major changes made to the Cloudera documentation library in February, 2018:

Product What's New Link
Flume The Apache Flume content is moved to a new Flume Guide. Information for configuring, using, and managing Flume is consolidated in the Flume Guide. Flume Guide
HBase The Apache HBase content is moved to a new HBase Guide. All the information for configuring, managing, and troubleshooting HBase is in one central location. HBase Guide
Key HSM There is a new section describing the file naming convention used for encryption zone keys. Key Naming Convention
HDFS (Encryption) There is a new section describing how to resolve an error that can occur when the KMS jute buffer size is insufficient to hold all the tokens. KMS server jute buffer exception
Sentry The Apache Sentry content is moved to a new Sentry Guide. The Sentry Guide contains information on configuring, using, and troubleshooting Sentry, as well as how-to guides. Sentry Guide
Cloudera Altus The Altus documentation includes a new topic that describes how to use the Cloudera Altus SDK for Java. Using the Altus SDK for Java

What's New in Cloudera Documentation in January, 2018

This section describes new topics added and major changes made to the Cloudera documentation library in January, 2018:

Product What's New Link
Cloudera Data Science Workbench Released Cloudera Data Science Workbench 1.3.0.
Impala Added tip about using Kudu Java API, instead of JDBC interface, for rapid insert operations. Configuring Impala to Work with JDBC
Added DATE_TRUNC() function. Impala Date and Time Functions
BATCH_SIZE query option now has an upper limit. BATCH_SIZE Query Option
A new kind of runtime filter, the "min-max" filter, applies to join queries involving Kudu tables. Using Impala to Query Kudu Tables
Added new conditional operators: IS [NOT] TRUE, IS [NOT] FALSE, and IS [NOT] UNKNOWN. SQL Operators
Added information about changes to the output of the SET statement, dividing the options into multiple groups, and hiding some groups by default. New SET ALL syntax shows all the option groups. SET Statement
Added a new impala-shell option --query_option and configuration file section [impala.query_options]. These features both allow specifying values for query options when starting impala-shell. impala-shell Configuration Options
Kafka Updated examples and removed deprecated properties for how to use Kafka with Flume. Using Kafka with Flume
Kafka Updated Kafka upgrade topic to include versions. Rolling Upgrade to Kafka 3.0.x
Key Trustee KMS There is a new procedure for migrating from a Key Trustee KMS (KT KMS) to a Hardware Security Module KMS (HSM KMS). Migrating from a Key Trustee KMS to an HSM KMS
Cloudera Manager

ADLS Connectivity

You can now use Cloudera Manager to configure credentials for cluster access to Microsoft ADLS. This access is enabled for running Hive and Impala queries on tables backed by data stored in ADLS and to browse ADLS data using Hue.

Configuring ADLS Access Using Cloudera Manager

Performing Host Maintenance

To enable performing minor maintenance on cluster hosts, Cloudera Manager now fully manages the host decommission and recommission process. You can specify whether or not to replicate under-replicated data blocks to other DataNodes to maintain the cluster's replication factor during a maintenance window.
Tuning and Troubleshooting Host Decommissioning

BDR

Added examples for how to use the API to manage BDR.

How To Automate BDR Replication with the Cloudera Manager API

Video

Added a video walkthrough for how to add a cluster to Cloudera Manager.

View the video on YouTube.

View the video within the documentation.

Cloudera Director

LDAP and Active Directory

Cloudera Director 2.7 can be configured to point to an organization’s LDAP server so that users common credentials may be used to login to Cloudera Director. When enabling LDAP support, Cloudera Director’s built in user management is disabled.

Configuring Cloudera Director Server for LDAP and Active Directory

Director-managed Java Installation

Cloudera Director can now handle all aspects of Java installation on the instances that it allocates and configures for Cloudera Manager and CDH clusters, offering more flexibility while simplifying the process for users.

Deploying Java on Cluster Instances
A configuration option has been added to Cloudera Director's AWS plugin to accommodate regions like GovCloud and China, where EC2 cannot tag instances upon creation. The documentation now includes the procedure for configuring the plugin to use this option. Configuring Tag-on-create for AWS GovCloud (US) and China (Beijing) Regions
Cloudera Navigator

Group by for search results

Metadata searches in Navigator now include the ability to group search results by common properties. Group by lets you use technical, managed, and custom metadata to quickly identify small files, active SQL users, table-creation trends, and other data aggregation trends revealed by metadata properties.

The documentation includes some examples of how grouping search results can help you understand trends in your data and to find specific data assets.

Grouping Search Results Using Metadata

The Navigator role names have been updated to more clearly reflect the privileges they provide. One specific change is that the privilege for editing the name and description metadata for Navigator entities is now part of the Managed & Custom Metadata Editor role. Users with that role or the Full Administrator role can add and update entity names and descriptions in the Navigator console.

Cloudera Navigator User Roles

Audit filtering now allows a "not like" operator.

Filtering Audit Events
Cloudera Manager provides options to allow you to filter content from the Navigator audit logs. The documentation now includes a page of information for handling sensitive data that links to the Cloudera Manager log redaction details. Sensitive Data
The documentation now includes the specific metadata removed during Navigator Metadata Server purge tasks. What Metadata is Purged?
Kudu

New features and updates to Kudu administration

  • You can now add data directories to an existing master or tablet server
  • Kudu tablet servers are resilient to disk failures that occur on a disk storing data blocks.
  • The description of the workflow to migrate to multi-master cluster was improved with details and examples.
  • The description of the workflow to recover from a dead Kudu master was improved with details.
Kudu Administration
Specified how client applications connect to Kerberized Kudu servers. Client Authentication to Secure Kudu Clusters
Cloudera Altus The Altus documentation includes a description of public keys for cluster creation. Creating a Cluster for AWS