CDH 5.3.2

Cloudera’s 100% Open Source Hadoop Platform

CDH is Cloudera's open source software distribution and consists of Apache Hadoop and additional key open source projects to ensure you get the most out of Hadoop and your data.

It is the only Hadoop solution to offer unified querying options (including batch processing, interactive SQL, text search, and machine learning) and necessary enterprise security features (such as role-based access controls).

Please note: CDH requires manual installation from the command line.
For a faster, automated installation download Cloudera Manager.

Installing the Latest CDH 5 Release

  Important:
  • If you use Cloudera Manager, do not use these command-line instructions.
  • This information applies specifically to CDH 5.3.x. If you use an earlier version of CDH, see the documentation for that version located at Cloudera Documentation.

Ways To Install CDH 5

You can install CDH 5 in any of the following ways:

  • Install Cloudera Manager, CDH, and managed services in a Cloudera Manager Deployment.
      Note: Cloudera recommends that you use this automated method if possible.
  • Or use one of the manual methods described below:
    • Download and install the CDH 5 1-click Install" package; OR
    • Add the CDH 5 repository; OR
    • Build your own CDH 5 repository

    If you use one of these manual methods rather than Cloudera Manager, the first (downloading and installing the "1-click Install" package) is recommended in most cases because it is simpler than building or adding a repository.

  • Install from a CDH 5 tarball — see, the next topic, "How Packaging Affects CDH 5 Deployment".

How Packaging Affects CDH 5 Deployment

Installing from Packages

Installing from a Tarball

  Note: The instructions in this Installation Guide are tailored for a package installation, as described in the sections that follow, and do not cover installation or deployment from tarballs.
  • If you install CDH 5 from a tarball, you will install YARN.
  • In CDH 5, there is no separate tarball for MRv1. Instead, the MRv1 binaries, examples, etc., are delivered in the Hadoop tarball itself. The scripts for running MRv1 are in the bin-mapreduce1 directory in the tarball, and the MRv1 examples are in the examples-mapreduce1 directory.

Before You Begin Installing CDH 5 Manually

  • The instructions on this page are for new installations. If you need to upgrade from an earlier release, see Upgrading from CDH 4 to CDH 5.
  • For a list of supported operating systems, see CDH 5 Requirements and Supported Versions.
  • These instructions assume that the sudo command is configured on the hosts where you will be doing the installation. If this is not the case, you will need the root user (superuser) to configure it.
  Note:

If you are migrating from MapReduce v1 (MRv1) to MapReduce v2 (MRv2, YARN), see Migrating from MapReduce 1 (MRv1) to MapReduce 2 (MRv2, YARN) for important information and instructions.

  Important: Running Services

When starting, stopping and restarting CDH components, always use the service (8) command rather than running scripts in /etc/init.d directly. This is important because service sets the current working directory to / and removes most environment variables (passing only LANG and TERM), to create a predictable environment for the service. If you run the scripts in /etc/init.d, locally-set environment variables could produce unpredictable results. If you install CDH from RPMs, service will be installed as part of the Linux Standard Base (LSB).

  Important: Java Development Kit:

High Availability

In CDH 5 you can configure high availability both for the NameNode and the JobTracker or Resource Manager.

Steps to Install CDH 5 Manually

Step 1: Add or Build the CDH 5 Repository or Download the "1-click Install" package.

  • If you are installing CDH 5 on a Red Hat system, you can download Cloudera packages using yum or your web browser.
  • If you are installing CDH 5 on a SLES system, you can download the Cloudera packages using zypper or YaST or your web browser.
  • If you are installing CDH 5 on an Ubuntu or Debian system, you can download the Cloudera packages using apt or your web browser.

On Red Hat-compatible Systems

Use one of the following methods to add or build the CDH 5 repository or download the package on Red Hat-compatible systems.
  Note:

Use only one of the three methods.

Do this on all the systems in the cluster.

To download and install the CDH 5 "1-click Install" package:

  1. Click the entry in the table below that matches your Red Hat or CentOS system, choose Save File, and save the file to a directory to which you have write access (it can be your home directory).
    OS Version Click this Link
    Red Hat/CentOS/Oracle 5 Red Hat/CentOS/Oracle 5 link
    Red Hat/CentOS/Oracle 6 Red Hat/CentOS/Oracle 6 link
  2. Install the RPM. For Red Hat/CentOS/Oracle 5:
    $ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
    

    For Red Hat/CentOS/Oracle 6 (64-bit):

    $ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
    

Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

  Note: Make sure your repositories are up to date
Before proceeding, make sure the repositories on each system are up to date:
sudo yum clean all
This ensures that the system repositories contain the latest software (it does not actually install anything).

OR: To add the CDH 5 repository:

Click the entry in the table below that matches your RHEL or CentOS system, navigate to the repo file for your system and save it in the /etc/yum.repos.d/ directory.

For OS Version

Click this Link

RHEL/CentOS/Oracle 5

RHEL/CentOS/Oracle 5 link

RHEL/CentOS/Oracle 6 (64-bit)

RHEL/CentOS/Oracle 6 link

Now continue with Step 2: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

  Note: Make sure your repositories are up to date
Before proceeding, make sure the repositories on each system are up to date:
sudo yum clean all
This ensures that the system repositories contain the latest software (it does not actually install anything).

OR: To build a Yum repository:

If you want to create your own yum repository, download the appropriate repo file, create the repo, distribute the repo file and set up a web server, as described under Creating a Local Yum Repository.

Now continue with Step 2: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

  Note: Make sure your repositories are up to date
Before proceeding, make sure the repositories on each system are up to date:
sudo yum clean all
This ensures that the system repositories contain the latest software (it does not actually install anything).

On SLES Systems

Use one of the following methods to download the CDH 5 repository or package on SLES systems.
  Note:

Use only one of the three methods.

To download and install the CDH 5 "1-click Install" package:

  1. Download the CDH 5 "1-click Install" package.

    Click this link, choose Save File, and save it to a directory to which you have write access (for example, your home directory).

  2. Install the RPM:
    $ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm
    
  3. Update your system package index by running:
    $ sudo zypper refresh
    

Now continue with Step 2: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

OR: To add the CDH 5 repository:

  1. Run the following command:
    $ sudo zypper addrepo -f http://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/cloudera-cdh5.repo
    
  2. Update your system package index by running:
    $ sudo zypper refresh
    

Now continue with Step 2: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

  Note: Make sure your repositories are up to date
Before proceeding to the next step, make sure the repositories on each system are up to date:
sudo zypper clean --all
This ensures that the system repositories contain the latest software (it does not actually install anything).

OR: To build a SLES repository:

If you want to create your own SLES repository, create a mirror of the CDH SLES directory by following these instructions that explain how to create a SLES repository from the mirror.

Now continue with Step 2: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

  Note: Make sure your repositories are up to date
Before proceeding to the next step, make sure the repositories on each system are up to date:
sudo zypper clean --all
This ensures that the system repositories contain the latest software (it does not actually install anything).

On Ubuntu or Debian Systems

Use one of the following methods to download the CDH 5 repository or package.

  Note:
  • Use only one of the three methods.
  • There is an extra step if you are adding a repository on Ubuntu Trusty, as described below.
  • Unless you are adding a repository on Ubuntu Trusty, don't forget to run apt-get update after downloading, adding, or building the repository.

To download and install the CDH 5 "1-click Install" package:

  1. Download the CDH 5 "1-click Install" package:
    OS Version Click this Link
    Wheezy Wheezy link
    Precise Precise link
    Trusty Trusty link
  2. Install the package by doing one of the following:
    • Choose Open with in the download window to use the package manager.
    • Choose Save File, save the package to a directory to which you have write access (for example, your home directory), and install it from the command line. For example:
      sudo dpkg -i cdh5-repository_1.0_all.deb
      
  Note: Make sure your repositories are up to date
Before proceeding to the next step, make sure the repositories on each system are up to date:
sudo apt-get update
This ensures that the system repositories contain the latest software (it does not actually install anything).

Now continue with Step 2: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

OR: To add the CDH 5 repository:

  • Download the appropriate cloudera.list file by issuing one of the following commands. You can use another HTTP client if wget is not available, but the syntax may be different.
      Important: Ubuntu 14.04 (Trusty)

    If you are running Ubuntu Trusty, you need to perform an additional step after adding the repository. See "Additional Step for Trusty" below.

    OS Version Command
    Debian Wheezy
    $ sudo wget 'http://archive.cloudera.com/cdh5/ubuntu/wheezy/amd64/cdh/cloudera.list' \
        -O /etc/apt/sources.list.d/cloudera.list 
    
    Ubuntu Precise
    $ sudo wget 'http://archive.cloudera.com/cdh5/ubuntu/wheezy/amd64/cdh/cloudera.list' \
        -O /etc/apt/sources.list.d/cloudera.list
    
    Ubuntu Lucid
    $ sudo wget 'http://archive.cloudera.com/cdh5/ubuntu/lucid/amd64/cdh/cloudera.list' \
        -O /etc/apt/sources.list.d/cloudera.list
    
    Ubuntu Trusty
    $ sudo wget 'http://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh/cloudera.list' \
        -O /etc/apt/sources.list.d/cloudera.list
    
  Note: Make sure your repositories are up to date
Unless you are adding a repository on Ubuntu Trusty, make sure the repositories on each system are up to date before proceeding to the next step:
sudo apt-get update
This ensures that the system repositories contain the latest software (it does not actually install anything).

Additional step for Trusty

This step ensures that you get the right ZooKeeper package for the current CDH release. You need to prioritize the Cloudera repository you have just added, such that you install the CDH version of ZooKeeper rather than the version that is bundled with Ubuntu Trusty.

To do this, create a file at /etc/apt/preferences.d/cloudera.pref with the following contents:
Package: *
Pin: release o=Cloudera, l=Cloudera
Pin-Priority: 501
  Note:

You do not need to run apt-get update after creating this file.

Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

OR: To build a Debian repository:

If you want to create your own apt repository, create a mirror of the CDH Debian directory and then create an apt repository from the mirror.

  Note: Make sure your repositories are up to date
Before proceeding to the next step, make sure the repositories on each system are up to date:
sudo apt-get update
This ensures that the system repositories contain the latest software (it does not actually install anything).

Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

Step 2: Optionally Add a Repository Key

Before installing YARN or MRv1: (Optionally) add a repository key on each system in the cluster. Add the Cloudera Public GPG Key to your repository by executing one of the following commands:

  • For Red Hat/CentOS/Oracle 5 systems:
    $ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
    
  • For Red Hat/CentOS/Oracle 6 systems:
    $ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
    
  • For all SLES systems:
    $ sudo rpm --import http://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera
    
  • For Ubuntu or Debian systems:
    OS Version Command
    Debian Wheezy
    $ wget http://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key -O archive.key
    $ sudo apt-key add archive.key
    
    Ubuntu Precise
    $ wget http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key -O archive.key
    $ sudo apt-key add archive.key
    
    Ubuntu Lucid
    $ wget http://archive.cloudera.com/cdh5/ubuntu/lucid/amd64/cdh/archive.key -O archive.key
    $ sudo apt-key add archive.key
    
    Ubuntu Trusty
    $ wget http://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh/archive.key -O archive.key
    $ sudo apt-key add archive.key
    

This key enables you to verify that you are downloading genuine packages.

Step 3: Install CDH 5 with YARN

  Note:

Skip this step if you intend to use only MRv1. Directions for installing MRv1 are in Step 4.

To install CDH 5 with YARN:

  Note:

If you decide to configure HA for the NameNode, do not install hadoop-hdfs-secondarynamenode. After completing the HA software configuration, follow the installation instructions under Deploying HDFS High Availability.

  1. Install and deploy ZooKeeper.
      Important:

    Cloudera recommends that you install (or update) and start a ZooKeeper cluster before proceeding. This is a requirement if you are deploying high availability (HA) for the NameNode.

    Follow instructions under ZooKeeper Installation.

  2. Install each type of daemon package on the appropriate systems(s), as follows.

    Where to install

    Install commands

    Resource Manager host (analogous to MRv1 JobTracker) running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-yarn-resourcemanager

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-yarn-resourcemanager

    Ubuntu or Debian

    sudo apt-get update; sudo apt-get install hadoop-yarn-resourcemanager

    NameNode host running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-hdfs-namenode

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode

    Ubuntu or Debian

    sudo apt-get install hadoop-hdfs-namenode

    Secondary NameNode host (if used) running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode

    Ubuntu or Debian

    sudo apt-get install hadoop-hdfs-secondarynamenode

    All cluster hosts except the Resource Manager running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

    Ubuntu or Debian

    sudo apt-get install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

    One host in the cluster running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

    Ubuntu or Debian

    sudo apt-get install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

    All client hosts running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-client

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-client

    Ubuntu or Debian

    sudo apt-get install hadoop-client

  Note:

The hadoop-yarn and hadoop-hdfs packages are installed on each system automatically as dependencies of the other packages.

Step 4: Install CDH 5 with MRv1

  Note:

If you are also installing YARN, you can skip any packages you have already installed in Step 3: Install CDH 5 with YARN.

Skip this step and go to Step 3: Install CDH 5 with YARN if you intend to use only YARN.

  Important: Before proceeding, you need to decide:
  • Whether to configure High Availability (HA) for the NameNode and/or JobTracker; see the High Availability for more information and instructions.
  • Where to deploy the NameNode, Secondary NameNode, and JobTracker daemons. As a general rule:
    • The NameNode and JobTracker run on the same "master" host unless the cluster is large (more than a few tens of nodes), and the master host (or hosts) should not run the Secondary NameNode (if used), DataNode or TaskTracker services.
    • In a large cluster, it is especially important that the Secondary NameNode (if used) runs on a separate machine from the NameNode.
    • Each node in the cluster except the master host(s) should run the DataNode and TaskTracker services.

If you decide to configure HA for the NameNode, do not install hadoop-hdfs-secondarynamenode. After completing the HA software configuration, follow the installation instructions under Deploying HDFS High Availability.

First, install and deploy ZooKeeper.
  Important:

Cloudera recommends that you install (or update) and start a ZooKeeper cluster before proceeding. This is a requirement if you are deploying high availability (HA) for the NameNode or JobTracker.

Follow instructions under ZooKeeper Installation. Make sure you create the myid file in the data directory, as instructed, if you are starting a ZooKeeper ensemble after a fresh install.

Next, install packages.

Install each type of daemon package on the appropriate systems(s), as follows.
  Note:

On Ubuntu systems, Ubuntu may try to start the service immediately after you install it. This should fail harmlessly, but if you want to prevent it, there is advice here.

Where to install

Install commands

JobTracker host running:

 

Red Hat/CentOS compatible

sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-jobtracker

SLES

sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-jobtracker

Ubuntu or Debian

sudo apt-get update; sudo apt-get install hadoop-0.20-mapreduce-jobtracker

NameNode host running:

 

Red Hat/CentOS compatible

sudo yum clean all; sudo yum install hadoop-hdfs-namenode

SLES

sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode

Ubuntu or Debian

sudo apt-get install hadoop-hdfs-namenode

Secondary NameNode host (if used) running:

 

Red Hat/CentOS compatible

sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode

SLES

sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode

Ubuntu or Debian

sudo apt-get install hadoop-hdfs-secondarynamenode

All cluster hosts except the JobTracker, NameNode, and Secondary (or Standby) NameNode hosts running:

 

Red Hat/CentOS compatible

sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

SLES

sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

Ubuntu or Debian

sudo apt-get install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

All client hosts running:

 

Red Hat/CentOS compatible

sudo yum clean all; sudo yum install hadoop-client

SLES

sudo zypper clean --all; sudo zypper install hadoop-client

Ubuntu or Debian

sudo apt-get install hadoop-client

Step 5: (Optional) Install LZO

If you decide to install LZO ( Lempel–Ziv–Oberhumer compression), proceed as follows. For information about choosing a compression format, see Choosing a Data Compression Format
  Note:
If you are upgrading to a new version of LZO, rather than installing it for the first time, you must first remove the old version; for example, on a RHEL system:
yum remove hadoop-lzo
  1. Add the repository on each host in the cluster. Follow the instructions for your OS version:
    For OS Version Do this
    Red Hat/CentOS/Oracle 5 Navigate to this link and save the file in the /etc/yum.repos.d/ directory.
    Red Hat/CentOS 6 Navigate to this link and save the file in the /etc/yum.repos.d/ directory.
    SLES
    1. Run the following command:
       $ sudo zypper addrepo -f 
      http://archive.cloudera.com/gplextras5/sles/11/x86_64/gplextras/
      cloudera-gplextras5.repo
      
    2. Update your system package index by running:
       $ sudo zypper refresh
      
    Ubuntu or Debian Navigate to this link and save the file as /etc/apt/sources.list.d/gplextras.list.
      Important: Make sure you do not let the file name default to cloudera.list, as that will overwrite your existing cloudera.list.
  2. Install the package on each host as follows:
    For OS version Install commands
    Red Hat/CentOS compatible
    sudo yum install hadoop-lzo
    
    SLES
    sudo zypper install hadoop-lzo
    
    Ubuntu or Debian
    sudo apt-get install hadoop-lzo
    
  3. Continue with installing and deploying CDH. As part of the deployment, you will need to do some additional configuration for LZO, as shown under Configuring LZO .
      Important: Make sure you do this configuration after you have copied the default configuration files to a custom location and set alternatives to point to it.

Step 6: Deploy CDH and Install Components

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.3.2:

  • AVRO-1630 - Creating Builder from instance loses data
  • AVRO-1628 - Add Schema.createUnion(Schema... type)
  • AVRO-1539 - Add FileSystem-based FsInput Constructor
  • AVRO-1623 - GenericData#validate() of enum: IndexOutOfBoundsException
  • AVRO-1614 - Always getting a value...
  • AVRO-1592 - Java keyword as an enum constant in Avro schema file causes deserialization to fail.
  • AVRO-1619 - Generate better JavaDoc
  • AVRO-1622 - Add missing license headers
  • AVRO-1604 - ReflectData.AllowNull fails to generate schemas when @Nullable is present.
  • AVRO-1407 - NettyTransceiver can cause a infinite loop when slow to connect
  • AVRO-834 - Data File corruption recovery tool
  • AVRO-1596 - Cannot read past corrupted block in Avro data file
  • HADOOP-11350 - The size of header buffer of HttpServer is too small when HTTPS is enabled
  • HDFS-7707 - Edit log corruption due to delayed block removal again
  • HDFS-7718 - Store KeyProvider in ClientContext to avoid leaking key provider threads when using FileContext
  • HDFS-6425 - Large postponedMisreplicatedBlocks has impact on blockReport latency
  • HDFS-7560 - ACLs removed by removeDefaultAcl() will be back after NameNode restart/failover
  • HDFS-7513 - HDFS inotify: add defaultBlockSize to CreateEvent
  • HDFS-7158 - Reduce the memory usage of WebImageViewer
  • HDFS-7497 - Inconsistent report of decommissioning DataNodes between dfsadmin and NameNode webui
  • HDFS-6917 - Add an hdfs debug command to validate blocks, call recoverlease, etc.
  • HDFS-6779 - Add missing version subcommand for hdfs
  • YARN-2697 - RMAuthenticationHandler is no longer useful
  • YARN-2656 - RM web services authentication filter should add support for proxy user
  • YARN-3082 - Non thread safe access to systemCredentials in NodeHeartbeatResponse processing
  • YARN-3079 - Scheduler should also update maximumAllocation when updateNodeResource.
  • YARN-2992 - ZKRMStateStore crashes due to session expiry
  • YARN-2675 - containersKilled metrics is not updated when the container is killed during localization
  • YARN-2715 - Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.
  • MAPREDUCE-6198 - NPE from JobTracker#resolveAndAddToTopology in MR1 cause initJob and heartbeat failure.
  • MAPREDUCE-6196 - Fix BigDecimal ArithmeticException in PiEstimator
  • HBASE-12540 - TestRegionServerMetrics#testMobMetrics test failure
  • HBASE-12533 - staging directories are not deleted after secure bulk load
  • HBASE-12077 - FilterLists create many ArrayList$Itr objects per row.
  • HBASE-12386 - Replication gets stuck following a transient zookeeper error to remote peer cluster
  • HBASE-11979 - Compaction progress reporting is wrong
  • HBASE-12445 - hbase is removing all remaining cells immediately after the cell marked with marker = KeyValue.Type.DeleteColumn via PUT
  • HBASE-12837 - ReplicationAdmin leaks zk connections
  • HIVE-7647 - Beeline does not honor --headerInterval and --color when executing with "-e"
  • HIVE-7733 - Ambiguous column reference error on query
  • HIVE-9303 - Parquet files are written with incorrect definition levels
  • HIVE-8444 - update pom to junit 4.11
  • HIVE-9474 - truncate table changes permissions on the target
  • HIVE-9462 - HIVE-8577 - breaks type evolution
  • HIVE-9482 - Hive parquet timestamp compatibility
  • HIVE-6308 - COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.
  • HIVE-9502 - Parquet cannot read Map types from files written with Hive 0.12 or earlier
  • HIVE-9445 - Revert HIVE-5700 - enforce single date format for partition column storage
  • HIVE-9393 - reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG
  • HIVE-7800 - Parquet Column Index Access Schema Size Checking
  • HIVE-9330 - DummyTxnManager will throw NPE if WriteEntity writeType has not been set
  • HIVE-9265 - Hive with encryption throws NPE to fs path without schema
  • HIVE-9199 - Excessive exclusive lock used in some DDLs with DummyTxnManager
  • HIVE-6978 - beeline always exits with 0 status, should exit with non-zero status on error
  • HUE-2556 - [core] Cannot update project tags of a document
  • HUE-2528 - Partitions limit gets capped to 1000 despite configuration
  • HUE-2548 - [metastore] Create table then load data does redirect to the table page
  • HUE-2525 - [core] Fix manual install of samples
  • HUE-2501 - [metastore] Creating a table with header files bigger than 64MB truncates it
  • HUE-2484 - [beeswax] Configure support for Hive Server2 LDAP authentication
  • HUE-2532 - [search] Fix share URL on Internet Explorer
  • HUE-2531 - [impala] Autogrow missing result list
  • HUE-2524 - [impala] Sort numerically recent queries tab
  • HUE-2495 - [oozie] Improve dashboards sorting mechanism
  • HUE-2511 - [impala] Infinite scroll keeps fetching results even if finished
  • HUE-2102 - [oozie] Workflow with credentials can't be used with Coordinator
  • HUE-2152 - [pig] Credentials support in editor
  • OOZIE-2131 - Add flag to sqoop action to skip hbase delegation token generation
  • OOZIE-2047 - Oozie does not support Hive tables that use datatypes introduced since Hive 0.8
  • OOZIE-2102 - Streaming actions are broken cause of incorrect method signature
  • PARQUET-173 - StatisticsFilter doesn't handle And properly
  • PARQUET-157 - Divide by zero in logging code
  • PARQUET-142 - parquet-tools doesn't filter _SUCCESS file
  • PARQUET-124 - parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
  • PARQUET-136 - NPE thrown in StatisticsFilter when all values in a string/binary column trunk are null
  • PARQUET-168 - Wrong command line option description in parquet-tools
  • PARQUET-145 - InternalParquetRecordReader.close() should not throw an exception if initialization has failed
  • PARQUET-140 - Allow clients to control the GenericData object that is used to read Avro records
  • SOLR-7033 - [RecoveryStrategy should not publish any state when closed / cancelled.
  • SOLR-5961 - Solr gets crazy on /overseer/queue state change
  • SOLR-6640 - Replication can cause index corruption
  • SOLR-5875 - QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard
  • SOLR-6919 - Log REST info before executing
  • SOLR-6969 - When opening an HDFSTransactionLog for append we must first attempt to recover it's lease to prevent data loss.
  • SOLR-5515 - NPE when getting stats on date field with empty result on solrcloud
  • SPARK-3778 - newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn
  • SPARK-4835 - Streaming saveAs*HadoopFiles() methods may throw FileAlreadyExistsException during checkpoint recovery
  • SQOOP-2057 - Skip delegation token generation flag during hbase import
  • SQOOP-1779 - Add support for --hive-database when importing Parquet files into Hive
  • IMPALA-1622 - Fix overflow in StringParser::StringToFloatInternal()
  • IMPALA-1614 - Compute stats fails if table name starts with number
  • IMPALA-1623 - unix_timestamp() does not return correct time
  • IMPALA-1535 - Partition pruning with NULL
  • IMPALA-1606 - Impala does not always give short name to Llama
  • IMPALA-1120 - Fetch column statistics using Hive 0.13 bulk API

In addition, CDH 5.3.2 reverts YARN-2713, which has caused problems since its inclusion in CDH 5.3.0.

CDH 5 Requirements and Supported Versions

The following sections describe the requirements and supported versions of operating systems, databases, JDK, and Internet Protocol (IP) for CDH 5.

For the latest information on compatibility across all Cloudera products, see the Product Compatibility Matrix.

Supported Operating Systems

CDH 5 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.

Operating System Version Packages
Red Hat Enterprise Linux (RHEL)-compatible
Red Hat Enterprise Linux 5.7 64-bit
  6.2 64-bit
  6.4 64-bit
  6.4 in SE Linux mode 64-bit
  6.5 64-bit
CentOS 5.7 64-bit
  6.2 64-bit
  6.4 64-bit
  6.4 in SE Linux mode 64-bit
  6.5 64-bit
Oracle Linux with default kernel and Unbreakable Enterprise Kernel 5.6 (UEK R2) 64-bit
  6.4 (UEK R2) 64-bit
  6.5 (UEK R2, UEK R3) 64-bit
SLES
SLES Linux Enterprise Server (SLES) 11 with Service Pack 2 or later 64-bit
Ubuntu/Debian
Ubuntu Precise (12.04) - Long-Term Support (LTS) 64-bit
  Trusty (14.04) - Long-Term Support (LTS) 64-bit
Debian Wheezy (7.0, 7.1) 64-bit
  Note:
  • CDH 5 provides only 64-bit packages.
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera packages, you can also download source tarballs from Downloads.

Supported Databases

Component MySQL SQLite PostgreSQL Oracle Derby - see Note 4
Oozie 5.5, 5.6 8.4, 9.1, 9.2, 9.3

See Note 2

11gR2 Default
Flume Default (for the JDBC Channel only)
Hue 5.5, 5.6

See Note 1

Default 8.4, 9.1, 9.2, 9.3

See Note 2

11gR2
Hive/Impala 5.5, 5.6

See Note 1

8.4, 9.1, 9.2, 9.3

See Note 2

11gR2 Default
Sentry 5.5, 5.6

See Note 1

8.4, 9.1, 9.2,, 9.3

See Note 2

11gR2
Sqoop 1 See Note 3 See Note 3 See Note 3
Sqoop 2 See Note 4 See Note 4 See Note 4 Default
  Note:
  1. MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and later.
  2. PostgreSQL 9.2 is supported on CDH 5.1 and later. PostgreSQL 9.3 is supported on CDH 5.2 and later.
  3. For the purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  4. Sqoop 2 can transfer data to and from MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, and Microsoft SQL Server 2012 and above. The Sqoop 2 repository database is supported only on Derby.
  5. Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation and Upgrade guide for recommendations.

Supported JDK Versions

CDH 5 is supported with the versions shown in the table that follows.

Table 1. Supported JDK Versions
Latest Certified Version Minimum Supported Version Exceptions
1.7.0_67 1.7.0_67 None
1.8.0_11 1.8.0_11 None

Supported Internet Protocol

CDH requires IPv4. IPv6 is not supported.

See also Configuring Network Names.