Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Please Read and Accept our Terms


Long term component architecture

As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.

 

PLEASE NOTE:

With the exception of DSSD support, Cloudera Enterprise 5.6.0 is identical to CDH 5.5.2/Cloudera Manager 5.5.3  If you do not need DSSD support, you do not need to upgrade if you are already using the latest 5.5.x release.

 

CDH 5 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.

Operating System Version Packages
Red Hat Enterprise Linux (RHEL)-compatible
Red Hat Enterprise Linux 5.7 64-bit
  6.2 64-bit
  6.4 64-bit
  6.4 in SE Linux mode 64-bit
  6.5 64-bit
CentOS 5.7 64-bit
  6.2 64-bit
  6.4 64-bit
  6.4 in SE Linux mode 64-bit
  6.5 64-bit
Oracle Linux with default kernel and Unbreakable Enterprise Kernel 5.6 (UEK R2) 64-bit
  6.4 (UEK R2) 64-bit
  6.5 (UEK R2, UEK R3) 64-bit
SLES
SLES Linux Enterprise Server (SLES) 11 with Service Pack 2 or later 64-bit
Ubuntu/Debian
Ubuntu Precise (12.04) - Long-Term Support (LTS) 64-bit
  Trusty (14.04) - Long-Term Support (LTS) 64-bit
Debian Wheezy (7.0, 7.1) 64-bit

Note:

  • CDH 5 provides only 64-bit packages.
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera packages, you can also download source tarballs from Downloads.

 

Selected tab: SupportedOperatingSystems
Component MySQL SQLite PostgreSQL Oracle Derby - see Note 4
Oozie 5.5, 5.6 - 8.4, 9.1, 9.2, 9.3

See Note 2

11gR2 Default
Flume - - - - Default (for the JDBC Channel only)
Hue 5.5, 5.6

See Note 1

Default 8.4, 9.1, 9.2, 9.3

See Note 2

11gR2 -
Hive/Impala 5.5, 5.6

See Note 1

- 8.4, 9.1, 9.2, 9.3

See Note 2

11gR2 Default
Sentry 5.5, 5.6

See Note 1

- 8.4, 9.1, 9.2,, 9.3

See Note 2

11gR2 -
Sqoop 1 See Note 3 - See Note 3 See Note 3 -
Sqoop 2 See Note 4 - See Note 4 See Note 4 Default

Note:

  1. MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and later.
  2. PostgreSQL 9.2 is supported on CDH 5.1 and later. PostgreSQL 9.3 is supported on CDH 5.2 and later.
  3. For the purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  4. Sqoop 2 can transfer data to and from MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, and Microsoft SQL Server 2012 and above. The Sqoop 2 repository database is supported only on Derby.
  5. Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation and Upgrade guide for recommendations.
Selected tab: SupportedDatabases

CDH 5 is supported with the versions shown in the table that follows.

Table 1. Supported JDK Versions

Latest Certified Version Minimum Supported Version Exceptions
1.7.0_67 1.7.0_67 None
1.8.0_11 1.8.0_11 None

Selected tab: SupportedJDKVersions

CDH requires IPv4. IPv6 is not supported.

See also Configuring Network Names.

Selected tab: SupportedInternetProtocol
Selected tab: SystemRequirements

What's New in CDH 5.3.0

The following topics describe new features introduced in CDH 5.3.0.

 

Oracle JDK 8 Support

CDH 5.3 supports Oracle JDK 1.8. For important information and requirements, see CDH 5 Requirements and Supported Versions and Upgrading to Oracle JDK 1.8.

 

Apache Hadoop

HDFS

CDH 5.3 provides the following new capabilities:

  • HDFS Data At Rest Encryption - This feature is now ready for use in production environments.

    Important:

    Client hosts may need a more recent version of libcypto.so. See Apache Hadoop Known Issues for more information.

     

    Important: Cloudera provides two solutions:

    • Navigator Encrypt is production ready and available to Cloudera customers licensed for Cloudera Navigator. Navigator Encrypt operates at the Linux volume level, so it can encrypt cluster data inside and outside HDFS. Consult your Cloudera account team for more information.
    • HDFS Encryption is production ready and operates at the HDFS directory level, enabling encryption to be applied only to HDFS folders where needed.

     

  • Hot Swap - You can add or replace HDFS data volumes without shutting down the DataNode host (HDFS-1362). This capability is not yet fully supported in Cloudera Manager, but you can use it from the command line.

    See Configuring Hot Swap for DataNodes.

  • S3A - S3A is an HDFS implementation of the Simple Storage Service (S3) from Amazon Web Services. It is similar functionality to S3N, which is the other implementation of this functionality. The key difference is that S3A relies on the officially-supported AWS Java SDK for communicating with S3, while S3N uses a best-effort-supported jets3t library to do the same.

 

YARN

YARN now provides a way for long-running applications to get new delegation tokens.

See Configuring YARN for Long-running Applications or Configuring YARN Security.

 

Apache Flume

CDH 5.3 provides a Kafka Channel (FLUME-2500).

 

Apache Hive

  • Hive can use multiple HDFS encryption zones.
  • Hive-HBase integration contains many fixes and new features such as reading HBase snapshots.
  • Many Hive Parquet fixes.
  • Hive Server 2 can handle multiple LDAP domains for authentication.

 

Hue

New Features:

  • Hue is re-based on Hue 3.7
  • SAML authentication has been revamped.
  • CDH 5.3 simplifies the task of configuring Hue to store data in an Oracle database by bundling the Oracle Install Client. For instructions, see Hue Database.

 

Apache Oozie

  • You can now update the definition and properties of an already running Coordinator. See the documentation for more information.
  • A new poll command in the Oozie client polls a Workflow Job, Coordinator Job, Coordinator Action, or Bundle Job until it finishes. See the documentationfor more information.

 

Apache Parquet

  • PARQUET-132: Add type parameter to AvroParquetInputFormat for Spark
  • PARQUET-107: Add option to disable summary metadata files
  • PARQUET-64: Add support for new type annotations (date, time, timestamp, etc.)

 

Cloudera Search

New Features:

  • Cloudera Search includes a version of Kite 0.15.0, which includes all morphlines-related backports of all fixes and features in Kite 0.17.1. Morphlines now includes functionality that enables partially updating document as well as deleting documents. Partial updating or deleting can be completed by unique IDs or by documents that match a query. For additional information on Kite, see:
  • CrunchIndexerTool now sends a commit to Solr on job success.
  • Added support for deleting documents stored in Solr by unique id as well as by query.

 

Apache Sentry (incubating)

  • Sentry HDFS Plugin - Allows you to configure synchronization of Sentry privileges to HDFS ACLs for specific HDFS directories. This simplifies the process of sharing table data between Hive or Impala and other clients (such as MapReduce, Pig, Spark), by automatically updating the ACLs when aGRANT or REVOKE statement is executed. It also allows all roles and privileges to be managed in a central location (by Sentry).
  • Metrics - CDH5.3 supports metrics for the Sentry service. These metrics can be reported either through JMX or the console; configure this by setting the property sentry.service.reporter to jmxor console. A Sentry web server listening by default on port 51000 can expose the metrics in jsonformat. Web reporting is disabled by default; enable it by setting sentry.service.web.enable to true. You can configure the port on which Sentry web server listens by means of the sentry.service.web.port property .

     

Apache Spark

  • CDH Spark has been rebased on Apache Spark 1.2.0.
  • Spark Streaming can now save incoming data to a WAL (write-ahead log) on HDFS, preventing any data loss on driver failure.

    Important:

    This feature is currently in Beta; Cloudera includes it in CDH Spark but does not support it.

     

  • The Yarn back end now supports dynamic allocation of executors. See http://spark.apache.org/docs/latest/job-scheduling.html for more information.
  • Native library paths (set via Spark configuration options) are correctly propagated to executors in Yarn mode (SPARK-1719).
  • The Snappy codec should now work out-of-the-box on Linux distributions with older glibc versions such as CentOS 5.
  • Spark SQL now includes the Spark Thrift Server in CDH.

    Important:

    Spark SQL remains an experimental and unsupported feature in CDH.

     

See Apache Spark Incompatible Changes and Apache Spark Known Issues for additional important information.

 

Apache Sqoop

  • Sqoop 1:
    • The MySQL connector now fetches on a row-by row-basis.
    • The SQL server now has upsert (insert or update) support (SQOOP-1403).
    • The Oracle direct connector now works with index-organized tables (SQOOP-1632). To use this capability, you must set the chunk method toPARTITION:

      -Doraoop.chunk.method=PARTITION

  • Sqoop 2:
    • FROM/TO re-factoring is now supported (SQOOP-1367).
Selected tab: WhatsNew

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University

Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.