Installing the Latest CDH 5 Release

This page explains how to do an unmanaged deployment of CDH 5 from the command line. For a managed deployment, see Cloudera Manager Deployment.

CDH 5 Installation Options

There are multiple ways to install CDH 5:

  • Managed Deployment: automatically install CDH 5 with a Cloudera Manager Deployment. This is the simplest and preferred method.
  • Unmanaged Deployment:
    • Manually install the CDH 5 package or repository: either add the CDH 5 repository OR build your own CDH 5 repository.
    • Manually install the CDH 5 tarball. See "Package and Tarball Binaries" below.

Package and Tarball Binaries

Installing from Packages

Installing from a Tarball

  • The CDH 5 tarball deploys YARN and includes the MRv1 binaries. There is no separate tarball for MRv1. The MRv1 scripts are in the directory, bin-mapreduce1, and examples are in examples-mapreduce1.

Before You Begin Installing CDH 5 Manually

Steps to Install CDH 5 Manually

Step 1: Add or Build the CDH 5 Repository

  • To install CDH 5 on a RHEL system, download packages with yum or use a web browser.
  • To install CDH 5 on a SLES system, download packages with zypper or YaST or use a web browser.
  • To install CDH 5 on an Ubuntu or Debian system, download packages with apt or use a web browser.

On RHEL-compatible Systems

Use one of the following methods to install CDH 5 on RHEL-compatible systems.

Do this on all the systems in the cluster.

To add the CDH 5 repository:

Download the repo file. Click the link for your RHEL or CentOS system in the table, find the appropriate repo file, and save in /etc/yum.repos.d/.

For OS Version

Link to CDH 5 Repository

RHEL/CentOS/Oracle 5

RHEL/CentOS/Oracle 5 link

RHEL/CentOS/Oracle 6

RHEL/CentOS/Oracle 6 link

RHEL/CentOS/Oracle 7

RHEL/CentOS/Oracle 7 link

Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.

OR: To build a Yum repository:

Follow the instructions at Creating a Local Yum Repository to create your own yum repository:
  • Download the appropriate repo file
  • Create the repo
  • Distribute the repo and set up a web server.

Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.

On SLES Systems

Use one of the following methods to download the CDH 5 repository or package on SLES systems.

To add the CDH 5 repository:

  1. Run the following command:
    $ sudo zypper addrepo -f https://archive.cloudera.com/cdh5/sles/12/x86_64/cdh/cloudera-cdh.repo
  2. Update your system package index by running:
    $ sudo zypper refresh

Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.

OR: To build a SLES repository:

If you want to create your own SLES repository, create a mirror of the CDH SLES directory by following these instructions that explain how to create a SLES repository from the mirror.

Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.

On Ubuntu or Debian Systems

Use one of the following methods to download the CDH 5 repository or package.

To add the CDH 5 repository:

  • Download the appropriate cloudera.list file by issuing one of the following commands. You can use another HTTP client if wget is not available, but the syntax may be different.
    OS Version Command
    Debian 8 Jessie
    $ sudo wget 'https://archive.cloudera.com/cdh5/debian/jessie/amd64/cdh/cloudera.list' \
        -O /etc/apt/sources.list.d/cloudera.list 
    Debian 7 Wheezy
    $ sudo wget 'https://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/cloudera.list' \
        -O /etc/apt/sources.list.d/cloudera.list 
    Ubuntu 16 Xenial
    $ sudo wget 'https://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/cloudera.list' \
        -O /etc/apt/sources.list.d/cloudera.list
    Ubuntu 14 Trusty
    $ sudo wget 'https://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh/cloudera.list' \
        -O /etc/apt/sources.list.d/cloudera.list
    Ubuntu 12 Precise
    $ sudo wget 'https://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/cloudera.list' \
        -O /etc/apt/sources.list.d/cloudera.list
Additional step for Ubuntu Trusty and Debian Jessie

This step ensures that you get the right ZooKeeper package for the current CDH release. You need to prioritize the Cloudera repository you have just added, such that you install the CDH version of ZooKeeper rather than the version that is bundled with Ubuntu Trusty or Debian Jessie.

To do this, create a file at /etc/apt/preferences.d/cloudera.pref with the following contents:
Package: *
Pin: release o=Cloudera, l=Cloudera
Pin-Priority: 501

Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.

OR: To build a Debian repository:

If you want to create your own apt repository, create a mirror of the CDH Debian directory and then create an apt repository from the mirror.

Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.

Step 2: Optionally Add a Repository Key

Before installing YARN or MRv1: (Optionally) add a repository key on each system in the cluster. Add the Cloudera Public GPG Key to your repository by executing one of the following commands:

  • For RHEL/CentOS/Oracle 5 systems:
    sudo rpm --import https://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
  • For RHEL/CentOS/Oracle 6 systems:
    sudo rpm --import https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
  • For RHEL/CentOS/Oracle 7 systems:
    sudo rpm --import https://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/RPM-GPG-KEY-cloudera
  • For all SLES systems:
    sudo rpm --import https://archive.cloudera.com/cdh5/sles/12/x86_64/cdh/RPM-GPG-KEY-cloudera
  • For Ubuntu or Debian systems:
    OS Version Command
    Debian 8 Jessie
    $ wget https://archive.cloudera.com/cdh5/debian/jessie/amd64/cdh/archive.key -O archive.key
    $ sudo apt-key add archive.key
    Debian 7 Wheezy
    $ wget https://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key -O archive.key
    $ sudo apt-key add archive.key
    Ubuntu 16 Xenial
    $ wget https://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/archive.key -O archive.key
    $ sudo apt-key add archive.key
    Ubuntu 14 Trusty
    $ wget https://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh/archive.key -O archive.key
    $ sudo apt-key add archive.key
    Ubuntu 12 Precise
    $ wget https://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key -O archive.key
    $ sudo apt-key add archive.key

This key enables you to verify that you are downloading genuine packages.

Step 3: Install CDH 5 with YARN

To install CDH 5 with YARN:

  1. Install and deploy ZooKeeper.

    Follow instructions under ZooKeeper Installation.

  2. Install each type of daemon package on the appropriate systems(s), as follows.

    Where to install

    Install commands

    Resource Manager host (analogous to MRv1 JobTracker) running:

     

    RHEL/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-yarn-resourcemanager

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-yarn-resourcemanager

    Ubuntu or Debian

    sudo apt-get update; sudo apt-get install hadoop-yarn-resourcemanager

    NameNode host running:

     

    RHEL/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-hdfs-namenode

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode

    Ubuntu or Debian

    sudo apt-get install hadoop-hdfs-namenode

    Secondary NameNode host (if used) running:

     

    RHEL/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode

    Ubuntu or Debian

    sudo apt-get install hadoop-hdfs-secondarynamenode

    All cluster hosts except the Resource Manager running:

     

    RHEL/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

    Ubuntu or Debian

    sudo apt-get install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

    One host in the cluster running:

     

    RHEL/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

    Ubuntu or Debian

    sudo apt-get install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

    All client hosts running:

     

    RHEL/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-client

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-client

    Ubuntu or Debian

    sudo apt-get install hadoop-client

Step 4: Install CDH 5 with MRv1

First, install and deploy ZooKeeper.

Follow instructions under ZooKeeper Installation. Make sure you create the myid file in the data directory, as instructed, if you are starting a ZooKeeper ensemble after a fresh install.

Next, install packages.

Install each type of daemon package on the appropriate systems(s), as follows.

Where to install

Install commands

JobTracker host running:

 

RHEL/CentOS compatible

sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-jobtracker

SLES

sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-jobtracker

Ubuntu or Debian

sudo apt-get update; sudo apt-get install hadoop-0.20-mapreduce-jobtracker

NameNode host running:

 

RHEL/CentOS compatible

sudo yum clean all; sudo yum install hadoop-hdfs-namenode

SLES

sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode

Ubuntu or Debian

sudo apt-get install hadoop-hdfs-namenode

Secondary NameNode host (if used) running:

 

RHEL/CentOS compatible

sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode

SLES

sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode

Ubuntu or Debian

sudo apt-get install hadoop-hdfs-secondarynamenode

All cluster hosts except the JobTracker, NameNode, and Secondary (or Standby) NameNode hosts running:

 

RHEL/CentOS compatible

sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

SLES

sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

Ubuntu or Debian

sudo apt-get install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

All client hosts running:

 

RHEL/CentOS compatible

sudo yum clean all; sudo yum install hadoop-client

SLES

sudo zypper clean --all; sudo zypper install hadoop-client

Ubuntu or Debian

sudo apt-get install hadoop-client

Step 5: (Optional) Install LZO

This section explains how to install LZO ( Lempel–Ziv–Oberhumer) compression. For more information, see Choosing and Configuring Data Compression
  1. Add the repository on each host in the cluster. Follow the instructions for your OS version:
    For OS Version Do this
    RHEL/CentOS/Oracle 5 Go to this link and save the file in the /etc/yum.repos.d/ directory.
    RHEL/CentOS/Oracle 6 Go to this link and save the file in the /etc/yum.repos.d/ directory.
    RHEL/CentOS/Oracle 7 Go to this link and save the file in the /etc/yum.repos.d/ directory.
    SLES
    1. Run the following command:
       $ sudo zypper addrepo -f
      https://archive.cloudera.com/gplextras5/sles/12/x86_64/gplextras/
      cloudera-gplextras5.repo
    2. Update your system package index by running:
       $ sudo zypper refresh
    Ubuntu or Debian Go to this link and save the file as /etc/apt/sources.list.d/gplextras.list.
  2. Install the package on each host as follows:
    For OS version Install commands
    RHEL/CentOS compatible
    sudo yum install hadoop-lzo
    SLES
    sudo zypper install hadoop-lzo
    Ubuntu or Debian
    sudo apt-get install hadoop-lzo
  3. Continue with installing and deploying CDH. As part of the deployment, you will need to do some additional configuration for LZO, as shown under Configuring LZO.

Step 6: Deploy CDH and Install Components