CDH 5.1.0

Cloudera’s 100% Open Source Hadoop Platform

CDH is Cloudera's open source software distribution and consists of Apache Hadoop and additional key open source projects to ensure you get the most out of Hadoop and your data.

It is the only Hadoop solution to offer unified querying options (including batch processing, interactive SQL, text search, and machine learning) and necessary enterprise security features (such as role-based access controls).

Please note: CDH requires manual installation from the command line.
For a faster, automated installation download Cloudera Manager.

CDH 5.1.0 Packaging and Tarballs

  Note: This section only contains packaging information for the current release. To see packaging and tarball information for older releases, refer CDH Packaging Information for Previous Releases.
To view the overall release notes for CDH 5.x.x, CDH 5 Release Notes.

Component

Package Version

Tarball

Release Notes

Changes File

Apache Avro

avro-1.7.5+cdh5.1.0+30

Tarball

Release notes

Changes

Apache Crunch

crunch-0.10.0+cdh5.1.0+14

Tarball

Release notes

Changes

DataFu

pig-udf-datafu-1.1.0+cdh5.1.0+8

Tarball

Release notes

Changes

Apache Flume

flume-ng-1.5.0+cdh5.1.0+10

Tarball

Release notes

Changes

Apache Hadoop

hadoop-2.3.0+cdh5.1.0+795

Tarball

Release notes

Changes

Apache HBase

hbase-0.98.1+cdh5.1.0+64

Tarball

Release notes

Changes

HBase-Solr

hbase-solr-1.5+cdh5.1.0+12

Tarball

Release notes

Changes

Apache Hive

hive-0.12.0+cdh5.1.0+369

Tarball

Release notes

Changes

Hue

hue-3.6.0+cdh5.1.0+86

Tarball

Release notes

Changes

Cloudera Impala

impala-1.4+cdh5.1.0+0

(none)

Release notes

Changes

Kite SDK

kite-0.10.0+cdh5.1.0+120

Tarball

Release notes

Changes

Llama

llama-1.0.0+cdh5.1.0+0

Tarball

Release notes

Changes

Apache Mahout

mahout-0.9+cdh5.1.0+11

Tarball

Release notes

Changes

Apache Oozie

oozie-4.0.0+cdh5.1.0+249

Tarball

Release notes

Changes

Parquet

parquet-1.2.5+cdh5.1.0+130

Tarball

Release notes

Changes

Parquet-format

parquet-format-1.0.0+cdh5.1.0+6

Tarball

Release notes

Changes

Apache Pig

pig-0.12.0+cdh5.1.0+33

Tarball

Release notes

Changes

Cloudera Search

search-1.0.0+cdh5.1.0+0

Tarball

Release notes

Changes

Apache Sentry (incubating)

sentry-1.3.0+cdh5.1.0+155

Tarball

Release notes

Changes

Apache Solr

solr-4.4.0+cdh5.1.0+231

Tarball

Release notes

Changes

Apache Spark

spark-1.0.0+cdh5.1.0+41

Tarball

Release notes

Changes

Apache Sqoop

sqoop-1.4.4+cdh5.1.0+55

Tarball

Release notes

Changes

Apache Sqoop2

sqoop2-1.99.3+cdh5.1.0+26

Tarball

Release notes

Changes

Apache Whirr

whirr-0.9.0+cdh5.1.0+9

Tarball

Release notes

Changes

Apache ZooKeeper

zookeeper-3.4.5+cdh5.1.0+29

Tarball

Release notes

Changes

What's New in CDH 5.1.0

Operating System Support

CDH 5.1 adds support for version 6.5 of RHEL and related platforms; see Supported Operating Systems.

Apache Crunch

  • CDH 5.1.0 implements Crunch 0.10.0.

Apache Flume

  • CDH 5.1.0 implements Flume 1.5.0.

Apache Hadoop

HDFS

POSIX Access Control Lists: As of CDH 5.1, HDFS supports POSIX Access Control Lists (ACLs), an addition to the traditional POSIX permissions model already supported. ACLs provide fine-grained control of permissions for HDFS files by providing a way to set different permissions for specific named users or named groups. For more information, see Enabling HDFS Extended ACLs.

NFS Gateway Improvements:CDH 5.1 makes the following improvements to the HDFS NFS gateway capability:
  • Subdirectory mounts :
    • Previously, clients could mount only the HDFS root directory.
    • As of CDH 5.1. a single mount point, configured via the nfs.export.point property in hdfs-site.xml on the NFS gateway node, is available to clients.
  • Improved support for Kerberized clusters (HDFS-5898):
    • Previously the NFS Gateway could connect to a secure cluster, but didn’t support logging in from a keytab.
    • As of CDH 5.1, set the nfs.kerberos.principal and nfs.keytab.file properties in hdfs-site.xml to allow users to log in from a keytab.
  • Support for port monitoring (HDFS-6406):
    • Previously, the NFS Gateway would always accept connections from any client.
    • As of CDH 5.1, set nfs.port.monitoring.disabled to false in hdfs-site.xml to allow connections only from privileged ports (those with root access).
  • Static uid/gid mapping for NFS clients that are not in synch with the NFS Gateway (HDFS-6435):
    • NFS sends UIDs and GIDs over the network from client to server, meaning that the UIDs and GIDs must be in synch between clients and server machines in order for users and groups to be set appropriately for file access and file creation; this is usually but not always the case.
    • As of CDH 5.1, you can configure a static UID/GID mapping file, by default /etc/nfs.map.
    • You can change the default (to use a different file path) by means of the nfs.static.mapping.file property in hdfs-site.xml.
    • The following sample entries illustrate the format of the file:
      uid 10 100 # Map the remote UID 10 the local UID 100
      gid 11 101 # Map the remote GID 11 to the local GID 101
  • Hadoop portmap, or insecure system portmap, no longer required:
    • Many supported OS have portmap bugs detailed here.
    • CDH 5.1 allows you to circumvent the problems by starting the NFS gateway as root, whether you install CDH from packages or parcels.
        Note:

      After initially registering with the system portmap as root, the NFS Gateway drops privileges and runs as a regular user.

    • Cloudera Manager starts the gateway as root by default.
  • Support for AIX NFS clients (HDFS-6549):
    • To deploy AIX NFS clients, set nfs.aix.compatibility.mode.enabled to true in hdfs-site.xml.
    • This enables code that handles bugs in the AIX implementation of NFS.

MapReduce and YARN

YARN with Impala supports Dynamic Prioritization.

Apache HBase

  • CDH 5.1.0 implements HBase 0.98.
  • As of CDH 5.1.0, HBase fully supports BucketCache, which was introduced as an experimental feature in CDH 5 Beta 1.
  • HBase now supports access control for EXEC permissions.
  • CDH 5.1.0 HBase introduces a reverse scan API; allowing you to scan a table in reverse.
  • You can now run a MapReduce job over a snapshot from HBase, rather than being limited to live data.
  • A new stateless streaming scanner is available over the REST API.
  • The delete* methods of the Delete class of the HBase Client API now use the timestamp from the constructor, the same behavior as the Put class. (In HBase versions before CDH 5.1, the delete* methods ignored the constructor's timestamp, and used the value of HConstants.LATEST_TIMESTAMP. This behavior was different from the behavior of the add() methods of the Put class.)
  • The SnapshotInfo tool has been enhanced in the following ways:
    • A new option, -list-snapshots, has been added to the SnapshotInfo command. This option allows you to list snapshots on either a local or remote server.
    • You can now pass the -size-in-bytes flag to print the size of snapshot files in bytes rather than the default human-readable format.
    • The size of each snapshot file in bytes is checked against the size reported in the manifest, and if the two sizes differ, the tool reports the file as corrupt.
  • A new -target option for ExportSnapshot allows you to specify a different name for the target cluster from the snapshot name on the source cluster.

For more information about these features, see New Features and Changes for HBase in CDH 5.

In addition, Cloudera has fixed some binary incompatibilities between HBase 0.96 and 0.98. As a result, the incompatibilities introduced by HBASE-10452 and HBASE-10339 do not affect CDH 5.1 HBase, as explained below:
  • HBASE-10452 introduced a new exception and error message in setTimeStamp(), for an extremely unlikely event when where getting a TimeRange could fail because of an integer overflow. CDH 5.1 suppresses the new exception to retain compatibility with HBase 0.96, but logs the error.
  • HBASE-10339 contained code which inadvertently changed the signatures of the getFamilyMap method. CDH 5.1 restores these signatures to those used in HBase 0.96, to retain compatibility.

Apache Hive

  • Permission inheritance fixes
  • Support for decimal computation, and for reading and writing decimal-format data from and to Parquet and Avro

Hue

CDH 5.1.0 implements Hue 3.6.

New Features:

  • Search App v2:
    • 100% Dynamic dashboard
    • Drag-and-Drop dashboard builder
    • Text, Timeline, Pie, Line, Bar, Map, Filters, Grid and HTML widgets
    • Solr Index creation wizard (from a file)
  • Ability to view compressed Snappy, Avro and Parquet files
  • Impala HA
  • Close Impala and Hive sessions queries and commands

Apache Mahout

  • CDH 5.1.0 implements Mahout 0.9.

See also Apache Mahout Incompatible Changes .

Apache Oozie

  • You can now submit Sqoop jobs from the Oozie command line.
  • LAST_ONLY execution mode now works correctly (OOZIE-1319).

Cloudera Search

New Features:

  • A Quick Start script that automates using Search to query data from the Enron Email dataset. The script downloads the data, expands it, moves it to HDFS, indexes, and pushes the results live. The documentation now also includes a companion quick start guide, which describes the tasks the script completes, as well as customization options.
  • Solrctl now has built-in support for schema-less Solr. For more information, see Using Schemaless Mode.
  • Sentry-based document-level security for role-based access control of a collection. Document-level access control associates authorization tokens with each document in the collection, enabling granting Sentry roles access to sets of documents in a collection.
  • Cloudera Search includes a version of Kite 0.10.0, which includes all morphlines-related backports of all fixes and features in Kite 0.15.0. For additional information on Kite, see:
  • Support for the Parquet file format is included with this version of Kite 0.10.0.
  • Inclusion of hbase-indexer-1.5.1, a new version of the Lily HBase Indexer. This new version of the indexer includes the 0.10.0 version of Kite mentioned above. This 0.10.0 version of Kite includes the backports and fixes included in Kite 0.13.0.

Apache Sentry (incubating)

  • CDH 5.1.0 implements Sentry 1.2. This includes a database-backed Sentry service which uses the more traditional GRANT/REVOKE statements instead of the previous policy file approach making it easier to maintain and modify privileges.
  • Revised authorization privilege model for Hive and Impala. For more details, see Appendix: Authorization Privilege Model for Hive and Impala.

Apache Spark

  • CDH 5.1.0 implements Spark 1.0.
  • The spark-submit command abstracts across the variety of deployment modes that Spark supports and takes care of assembling the classpath for you.
  • Application History Server (SparkHistoryServer) improves monitoring capabilities.
  • You can launch PySpark applications against YARN clusters. PySpark currently only works in YARN Client mode.
Other improvements include:
  • Streaming integration with Kerberos
  • Addition of more algorithms to MLLib (Sparse Vector Support)
  • Improvements to Avro integration
  • Spark SQL alpha release (new SQL engine). Spark SQL allows you to run SQL statements inside a Spark application that manipulate and produce RDDs.
      Note:

    Because of its immaturity and alpha status, Cloudera does not currently offer commercial support for Spark SQL, but bundles it with our distribution so that you can try it out.

  • Authentication of all Spark communications

CDH 5.x System Requirements:

Supported Operating Systems

Supported JDK Versions

Supported Databases