What's New In CDH 5.3.x

Continue reading:

What's New in CDH 5.3.0
What's New in CDH 5.3.1
What's New in CDH 5.3.2
What's New in CDH 5.3.3
What's New in CDH 5.3.4
What's New in CDH 5.3.5
What's New in CDH 5.3.6
What's New in CDH 5.3.8
What's New in CDH 5.3.9
What's New in CDH 5.3.10

What's New in CDH 5.3.0

The following topics describe new features introduced in CDH 5.3.0.

Oracle JDK 8 Support
Apache Hadoop
Apache Flume
Apache HBase
Apache Hive
Hue
Apache Oozie
Apache Parquet
Cloudera Search
Apache Sentry (incubating)
Apache Spark
Apache Sqoop

Oracle JDK 8 Support

CDH 5.3 supports Oracle JDK 1.8. For important information and requirements, see CDH 5 and Cloudera Manager 5 Requirements and Supported Versions and Upgrading to Oracle JDK 1.8.

Apache Hadoop

HDFS

CDH 5.3 provides the following new capabilities:

HDFS Data At Rest Encryption - This feature is now ready for use in production environments.
Important:
Client hosts may need a more recent version of libcrypto.so. See Apache Hadoop Known Issues for more information.
Important: Cloudera provides two solutions:
- Navigator Encrypt is production ready and available to Cloudera customers licensed for Cloudera Navigator. Navigator Encrypt operates at the Linux volume level, so it can encrypt cluster data inside and outside HDFS. Consult your Cloudera account team for more information.
- HDFS Encryption is production ready and operates at the HDFS directory level, enabling encryption to be applied only to HDFS folders where needed.
S3A - S3A is an HDFS implementation of the Simple Storage Service (S3) from Amazon Web Services. It is similar to S3N, which is the other implementation of this functionality. The key difference is that S3A relies on the officially-supported AWS Java SDK for communicating with S3, while S3N uses a best-effort-supported jets3t library to do the same. For a listing of the parameters, see HADOOP-10400.

YARN

YARN now provides a way for long-running applications to get new delegation tokens.

See Configuring Spark on YARN for Long-Running Applications or Configuring YARN Security.

Apache Flume

CDH 5.3 provides a Kafka Channel (FLUME-2500).

Apache HBase

CDH 5.3 provides checkAndMutate(RowMutations), in addition to existing support for atomic checkAndPut as well as checkAndDelete operations on individual rows (HBASE-11796).

Apache Hive

Hive can use multiple HDFS encryption zones.
Hive-HBase integration contains many fixes and new features such as reading HBase snapshots.
Many Hive Parquet fixes.
Hive Server 2 can handle multiple LDAP domains for authentication.

Hue

New Features:

Hue is re-based on Hue 3.7
SAML authentication has been revamped.
CDH 5.3 simplifies the task of configuring Hue to store data in an Oracle database by bundling the Oracle Install Client.

Apache Oozie

You can now update the definition and properties of an already running Coordinator. See the documentation for more information.
A new poll command in the Oozie client polls a Workflow Job, Coordinator Job, Coordinator Action, or Bundle Job until it finishes. See the documentation for more information.

Apache Parquet

PARQUET-132: Add type parameter to AvroParquetInputFormat for Spark
PARQUET-107: Add option to disable summary metadata files
PARQUET-64: Add support for new type annotations (date, time, timestamp, etc.)

Cloudera Search

New Features:

Cloudera Search includes a version of Kite 0.15.0, which includes all morphlines-related backports of all fixes and features in Kite 0.17.1. Morphlines now includes functionality that enables partially updating document as well as deleting documents. Partial updating or deleting can be completed by unique IDs or by documents that match a query. For additional information on Kite, see:
CrunchIndexerTool now sends a commit to Solr on job success.
Added support for deleting documents stored in Solr by unique id as well as by query.

Apache Sentry (incubating)

Sentry HDFS Plugin - Allows you to configure synchronization of Sentry privileges to HDFS ACLs for specific HDFS directories. This simplifies the process of sharing table data between Hive or Impala and other clients (such as MapReduce, Pig, Spark), by automatically updating the ACLs when a GRANT or REVOKE statement is executed. It also allows all roles and privileges to be managed in a central location (by Sentry).
Metrics - CDH 5.3 supports metrics for the Sentry service. These metrics can be reported either through JMX or the console; configure this by setting the property sentry.service.reporter to jmx or console. A Sentry web server listening by default on port 51000 can expose the metrics in json format. Web reporting is disabled by default; enable it by setting sentry.service.web.enable to true. You can configure the port on which Sentry web server listens by means of the sentry.service.web.port property .

Apache Spark

CDH Spark has been rebased on Apache Spark 1.2.0.
Spark Streaming can now save incoming data to a WAL (write-ahead log) on HDFS, preventing any data loss on driver failure.
Important:
This feature is currently in Beta; Cloudera includes it in CDH Spark but does not support it.
The YARN back end now supports dynamic allocation of executors. See Job Scheduling for more information.
Native library paths (set via Spark configuration options) are correctly propagated to executors in YARN mode (SPARK-1719).
The Snappy codec should now work out-of-the-box on Linux distributions with older glibc versions such as CentOS 5.
Spark SQL now includes the Spark Thrift Server in CDH.
Important:
Spark SQL remains an experimental and unsupported feature in CDH.

See Apache Spark Incompatible Changes and Limitations and Apache Spark Known Issues for additional important information.

Apache Sqoop

Sqoop 1:
- The MySQL connector now fetches on a row-by row-basis.
- The SQL server now has upsert (insert or update) support (SQOOP-1403).
- The Oracle direct connector now works with index-organized tables (SQOOP-1632). To use this capability, you must set the chunk method to PARTITION:
```
-Doraoop.chunk.method=PARTITION
```
Sqoop 2:
- FROM/TO re-factoring is now supported (SQOOP-1367).

What's New in CDH 5.3.1

This is a maintenance release that fixes some important issues; for details, see Issues Fixed in CDH 5.3.1.

What's New in CDH 5.3.2

This is a maintenance release that fixes some important issues; for details, see Issues Fixed in CDH 5.3.2.

What's New in CDH 5.3.3

This is a maintenance release that fixes some important issues; for details, see Issues Fixed in CDH 5.3.3.

What's New in CDH 5.3.4

This is a maintenance release that fixes some important issues; for details, see Issues Fixed in CDH 5.3.4.

What's New in CDH 5.3.5

This is a maintenance release that fixes the following issue. For details of other fixes, see Issues Fixed in CDH 5.3.5:

Potential job failures during YARN rolling upgrades to CDH 5.3.4

Problem: A MapReduce security fix introduced a compatibility issue that results in job failures during YARN rolling upgrades from CDH 5.3.3 to CDH 5.3.4.

Cloudera Bug: CDH-28680

Release affected: CDH 5.3.4

Release containing the fix: CDH 5.3.5

Workarounds: You can use any one of the following workarounds for this issue:

Upgrade to CDH 5.3.5.
Restart any jobs that might have failed during the upgrade.
Explicitly set the version of MapReduce to be used so it is picked on a per-job basis.
1. Update the YARN property, MR Application Classpath (mapreduce.application.classpath), either in Cloudera Manager or in the mapred-site.xml file. Remove all existing values and add a new entry: <parcel-path>/lib/hadoop-mapreduce/*, where <parcel-path> is the absolute path to the parcel installation. For example, the default installation path for the CDH 5.3.3 parcel would be: /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop-mapreduce/*.
2. Wait until jobs submitted with the above client configuration change have run to completion.
3. Upgrade to CDH 5.3.4.
4. Update the MR Application Classpath (mapreduce.application.classpath) property to point to the new CDH 5.3.4 parcel.
  Do not delete the old parcel until after all jobs submitted prior to the upgrade have finished running.

What's New in CDH 5.3.6

This is a maintenance release that fixes several issues. For details, see Issues Fixed in CDH 5.3.6.

What's New in CDH 5.3.8

This is a maintenance release that fixes some important issues; for details, see Issues Fixed in CDH 5.3.8.

What's New in CDH 5.3.9

This is a maintenance release that fixes some important issues; for details, see Issues Fixed in CDH 5.3.9.

What's New in CDH 5.3.10

This is a maintenance release that fixes some important issues; for details, see Issues Fixed in CDH 5.3.10.

What's New In CDH 5.4.x

What's New In CDH 5.2.x