Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Long term component architecture

As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.

Thank you for choosing CDH, your download instructions are below:


CDH 5 Installation Guide


This CDH 5 Installation Guide is for Apache Hadoop developers and system administrators interested in Hadoop installation. It describes how to install and configure version 5 of Cloudera's Distribution Including Apache Hadoop (CDH 5), and how to deploy it on a cluster.

The guide covers the following major topics.

Before You Start:

 

Installation tasks:

    Install CDH 5. Start here for a new installation on a cluster.

    Deploy CDH 5. Do these tasks after installing core Hadoop.

    Install components. Install additional components after installing and deploying HDFS and MapReduce. (Components are listed below.)

 

  Note:

To install a release earlier than the current CDH 5 release (for example if you want to add new nodes to a cluster without upgrading the cluster to the latest release), follow these instructions.

 

Upgrade tasks:

    Upgrade from CDH 4 to CDH 5. Use these instructions if you are currently running a CDH 4 release; or

    Upgrade from an earlier CDH 5 release. Use these instructions if you are currently running a CDH 5 release.

    Upgrade components. Upgrade all installed components after upgrading core Hadoop. (Components are listed below.)

  Note: Use these instructions to migrate data from a CDH 4 cluster to a CDH 5 cluster.

 

 

Please Read and Accept our Terms

CDH 5 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.

 

Operating System

Version

Packages

Red Hat compatible

 

 

Red Hat Enterprise Linux (RHEL)

5.7

64-bit

 

6.2

64-bit

 

6.4

64-bit

CentOS

5.7

64-bit

 

6.2

64-bit

 

6.4

64-bit

Oracle Linux with Unbreakable Enterprise Kernel

5.6

64-bit

 

6.4

64-bit

SLES

 

 

SLES Linux Enterprise Server (SLES)

11 with Service Pack 1 or later

64-bit

Ubuntu/Debian

 

 

Ubuntu

Precise (12.04) - Long-Term Support (LTS)

64-bit

Debian

Wheezy (7.0, 7.1)

64-bit

Note:

  • CDH 5 provides only 64-bit packages.
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera's packages, you can also download source tarballs from Downloads.

 

Selected tab: SupportedOperatingSystems

Component

MySQL

SQLite

PostgreSQL

Oracle

Derby - see Note 4

Oozie

5.5

8.4

10.2, 11gR2

Default

Flume

Default (for the JDBC Channel only)

Hue

5.0+ See Note 1

Default

8.4

11gR2

Hive

5.5

8.4

10.2, 11gR2

Default

Sqoop 1

See Note 2

 –

See Note 2

See Note 2

Sqoop 2

See Note 3

 –

See Note 3

See Note 3

Default

Notes

  1. Cloudera's recommendations are:
    • For Red Hat and similar systems:
      • Use MySQL server version 5.0 (or higher) and version 5.0 client shared libraries on Red Hat 5 and similar systems.
      • Use MySQL server version 5.1 (or higher) and version 5.1 client shared libraries on Red Hat 6 and similar systems.

      If you use a higher server version than recommended here (for example, if you use 5.5) make sure you install the corresponding client libraries.

    • For SLES systems, use MySQL server version 5.0 (or higher) and version 5.0 client shared libraries.
    • For Ubuntu systems:
      • Use MySQL server version 5.5 (or higher) and version 5.0 client shared libraries on Precise (12.04).
  2. For connectivity purposes only, Sqoop 1 supports MySQL5.1, PostgreSQL 9.1.4, Oracle 10.2, Teradata 13.1, and Netezza TwinFin 5.0. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  3. Sqoop 2 can transport data to and from MySQL5.1, PostgreSQL 9.1.4, Oracle 10.2, and Microsoft SQL Server 2012. The Sqoop 2 repository is supported only on Derby.
  4. Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the CDH 5 Installation Guide for recommendations.
Selected tab: SupportedDatabases

CDH 5 is supported with Oracle JDK 1.7.

Table 1. Supported JDK 1.7 Versions
Latest Certified Version Minimum Supported Version Exceptions
1.7.0_45 1.7.0_25 None

Selected tab: SupportedJDKVersions
Selected tab: SystemRequirements

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.0.6:

  • HDFS-7960 - The full block report should prune zombie storages even if they're not empty
  • HDFS-7278 - Add a command that allows sysadmins to manually trigger full block reports from a DN
  • HDFS-6831 - Inconsistency between hdfs dfsadmin and hdfs dfsadmin -help
  • HDFS-7596 - NameNode should prune dead storages from storageMap
  • HDFS-7208 - NN doesn't schedule replication when a DN storage fails
  • HDFS-7575 - Upgrade should generate a unique storage ID for each volume
  • YARN-570 - Time strings are formatted in different timezone
  • YARN-2251 - Avoid negative elapsed time in JHS/MRAM web UI and services
  • HIVE-8874 - Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster
  • SOLR-6268 - HdfsUpdateLog has a race condition that can expose a closed HDFS FileSystem instance and should close its FileSystem instance if either inherited close method is called.
  • SOLR-6393 - Improve transaction log replay speed on HDFS.
  • SOLR-6403 - TransactionLog replay status logging.

 

 

 

Selected tab: WhatsNew

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University

Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.