Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Long term component architecture

As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.

Thank you for choosing CDH, your download instructions are below:



Installing CDH 5

To upgrade to the latest CDH 5 release, use the following topics from our Documentation.

 

Ways To Install CDH 5

You can install CDH 5 in any of the following ways:

  • Automated method using Cloudera Manager; instructions here. Cloudera Manager automates the installation and configuration of CDH 5 on an entire cluster if you have root or password-less sudo SSH access to your cluster's machines.

      Note: Cloudera recommends that you use the automated method if possible.

  • Manual methods described below:
    • Download and install the CDH 5 1-click Install" package
    • Add the CDH 5 repository
    • Build your own CDH 5 repository

    If you use one of these methods rather than Cloudera Manager, the first of these methods (downloading and installing the "1-click Install" package) is recommended in most cases because it is simpler than building or adding a repository.

  • Install from a CDH 5 tarball — see, the next topic, "How Packaging Affects CDH 5 Deployment".

 

How Packaging Affects CDH 5 Deployment

Installing from Packages

 

Installing from a Tarball

  Note: The instructions in this Installation Guide are tailored for a package installation, as described in the sections that follow, and do not cover installation or deployment from tarballs.

 

  • If you install CDH 5 from a tarball, you will install YARN.
  • In CDH 5, there is no separate tarball for MRv1. Instead, the MRv1 binaries, examples, etc., are delivered in the Hadoop tarball itself. The scripts for running MRv1 are in the bin-mapreduce1 directory in the tarball, and the MRv1 examples are in the examples-mapreduce1 directory.

 

Before You Begin Installing CDH 5 Manually

  • The instructions on this page are for new installations. If you need to upgrade from an earlier release, see Upgrading from CDH 4 to CDH 5.
  • For a list of supported operating systems, see CDH 5 Requirements and Supported Versions.
  • These instructions assume that the sudo command is configured on the hosts where you will be doing the installation. If this is not the case, you will need the root user (superuser) to configure it.

  Note:

If you are migrating from MapReduce v1 (MRv1) to MapReduce v2 (MRv2, YARN), see Migrating from MapReduce v1 (MRv1) to MapReduce v2 (MRv2, YARN) for important information and instructions.

 

  NoteRunning Services

When starting, stopping and restarting CDH components, always use the service (8) command rather than running scripts in /etc/init.d directly. This is important becauseservice sets the current working directory to / and removes most environment variables (passing only LANG and TERM) so as to create a predictable environment in which to administer the service. If you run the scripts in /etc/init.d, any environment variables you have set remain in force, and could produce unpredictable results. (If you install CDH from packages, service will be installed as part of the Linux Standard Base (LSB).)

 

  Important:

  • Java Development Kit: if you have not already done so, install the Oracle Java Development Kit (JDK); see Java Development Kit Installation.
  • Scheduler defaults: note the following differences between MRv1 and MRv2 (YARN).
    • MRv1:
      • Cloudera Manager sets the default to FIFO.
      • CDH 5 sets the default to FIFO, with FIFO, Fair Scheduler, and Capacity Scheduler on the classpath by default.
    • MRv2 (YARN):
      • Cloudera Manager sets the default to Fair Scheduler.
      • CDH 5 sets the default to Fair Scheduler, with FIFO and Fair Scheduler on the classpath by default.
      • YARN does not support Capacity Scheduler.

 

High Availability

In CDH 5 you can configure high availability both for the NameNode and the JobTracker or Resource Manager.

 

Steps to Install CDH 5 Manually

 

Step 1: Add or Build the CDH 5 Repository or Download the "1-click Install" package.

 

  • If you are installing CDH 5 on a Red Hat system, you can download Cloudera packages using yum or your web browser.
  • If you are installing CDH 5 on a SLES system, you can download the Cloudera packages using zypper or YaST or your web browser.
  • If you are installing CDH 5 on an Ubuntu or Debian system, you can download the Cloudera packages using apt or your web browser.

 

On Red Hat-compatible Systems

Use one of the following methods to add or build the CDH 5 repository or download the package on Red Hat-compatible systems.

  Note:

Use only one of the three methods.

Do this on all the systems in the cluster.

To download and install the CDH 5 "1-click Install" package:

  1. Click the entry in the table below that matches your Red Hat or CentOS system, choose Save File, and save the file to a directory to which you have write access (it can be your home directory).
    OS Version Click this Link
    Red Hat/CentOS/Oracle 5 Red Hat/CentOS/Oracle 5 link
    Red Hat/CentOS/Oracle 6 Red Hat/CentOS/Oracle 6 link
  2. Install the RPM. For Red Hat/CentOS/Oracle 5:

    $ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm

    For Red Hat/CentOS/Oracle 6 (64-bit):

    $ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm

Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

OR: To add the CDH 5 repository:

Click the entry in the table below that matches your Red Hat or CentOS system, navigate to the repo file for your system and save it in the /etc/yum.repos.d/ directory.

For OS Version

Click this Link

Red Hat/CentOS/Oracle 5

Red Hat/CentOS/Oracle 5 link

Red Hat/CentOS/Oracle 6 (64-bit)

Red Hat/CentOS/Oracle 6 link

Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

OR: To build a Yum repository:

If you want to create your own yum repository, download the appropriate repo file, create the repo, distribute the repo file and set up a web server, as described under Creating a Local Yum Repository.

Now continue with Step 2: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

  NoteMake sure your repositories are up to date

Before proceeding, make sure the repositories on each system are up to date:

sudo yum clean all

 

This ensures that the system repositories contain the latest software (it does not actually install anything).

 

On SLES Systems

Use one of the following methods to download the CDH 5 repository or package on SLES systems.

  Note:

Use only one of the three methods.

To download and install the CDH 5 "1-click Install" package:

  1. Download the CDH 5 "1-click Install" package.

    Click this link, choose Save File, and save it to a directory to which you have write access (it can be your home directory).

  2. Install the RPM:

    $ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm

  3. Update your system package index by running:

    $ sudo zypper refresh

Now continue with Step 2: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

OR: To add the CDH 5 repository:

  1. Run the following command:

    $ sudo zypper addrepo -f http://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/cloudera-cdh5.repo

  2. Update your system package index by running:

    $ sudo zypper refresh

Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

OR: To build a SLES repository:

If you want to create your own SLES repository, create a mirror of the CDH SLES directory by following these instructions that explain how to create a SLES repository from the mirror.

Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

  NoteMake sure your repositories are up to date

Before proceeding, make sure the repositories on each system are up to date:

sudo zypper clean --all

 

This ensures that the system repositories contain the latest software (it does not actually install anything).

 

On Ubuntu or Debian Systems

Use one of the following methods to download the CDH 5 repository or package.

  Note:

Use only one of the three methods.

To download and install the CDH 5 "1-click Install" package:

  1. Download the CDH 5 "1-click Install" package:
    OS Version Click this Link
    Wheezy Wheezy link
    Precise Precise link
  2. Install the package. Do one of the following:
    • Choose Open with in the download window to use the package manager.
    • Choose Save File, save the package to a directory to which you have write access (it can be your home directory) and install it from the command line, for example:

      sudo dpkg -i cdh5-repository_1.0_all.deb

Now continue with Step 2: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

OR: To add the CDH 5 repository:

Create a new file /etc/apt/sources.list.d/cloudera.list with the following contents:

  • For Ubuntu systems:

    deb [arch=amd64] http://archive.cloudera.com/cdh5/<OS-release-arch><RELEASE>-cdh5 contrib
    deb-src http://archive.cloudera.com/cdh5/<OS-release-arch><RELEASE>-cdh5 contrib

  • For Debian systems:

    deb http://archive.cloudera.com/cdh5/<OS-release-arch><RELEASE>-cdh5 contrib
    deb-src http://archive.cloudera.com/cdh5/<OS-release-arch><RELEASE>-cdh5 contrib

where: <OS-release-arch> is debian/wheezy/amd64/cdh or ubuntu/precise/amd64/cdh, and <RELEASE> is the name of your distribution, which you can find by runninglsb_release -c.

For example, to install CDH 5 for 64-bit Ubuntu Precise:

deb [arch=amd64] http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh precise-cdh5 contrib
deb-src http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh precise-cdh5 contrib

Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

OR: To build a Debian repository:

If you want to create your own apt repository, create a mirror of the CDH Debian directory and then create an apt repository from the mirror.

Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.

  NoteMake sure your repositories are up to date

Before proceeding, make sure the repositories on each system are up to date:

sudo apt-get update

 

This ensures that the system repositories contain the latest software (it does not actually install anything).

 

Step 2: Optionally Add a Repository Key

 

Before installing YARN or MRv1: (Optionally) add a repository key on each system in the cluster. Add the Cloudera Public GPG Key to your repository by executing one of the following commands:

  • For Red Hat/CentOS/Oracle 5 systems:

    $ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera

  • For Red Hat/CentOS/Oracle 6 systems:

    $ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

  • For all SLES systems:

    $ sudo rpm --import http://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera

  • For Ubuntu Precise systems:

    $ curl -s http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key
    | sudo apt-key add -

  • For Debian Wheezy systems:

    $ curl -s http://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key
    | sudo apt-key add -

This key enables you to verify that you are downloading genuine packages.

 

Step 3: Install CDH 5 with YARN

  Note:

Skip this step if you intend to use only MRv1. Directions for installing MRv1 are in Step 3.

 

To install CDH 5 with YARN:

  Note:

If you decide to configure HA for the NameNode, do not install hadoop-hdfs-secondarynamenode. After completing the HA software configuration, follow the installation instructions under Deploying HDFS High Availability.

 

  1. Install and deploy ZooKeeper.

      Important:

    Cloudera recommends that you install (or update) and start a ZooKeeper cluster before proceeding. This is a requirement if you are deploying high availability (HA) for the NameNode.

    Follow instructions under ZooKeeper Installation.

  2. Install each type of daemon package on the appropriate systems(s), as follows.

    Where to install

    Install commands

    Resource Manager host (analogous to MRv1 JobTracker) running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-yarn-resourcemanager

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-yarn-resourcemanager

    Ubuntu or Debian

    sudo apt-get update; sudo apt-get install hadoop-yarn-resourcemanager

    NameNode host running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-hdfs-namenode

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode

    Ubuntu or Debian

    sudo apt-get install hadoop-hdfs-namenode

    Secondary NameNode host (if used) running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode

    Ubuntu or Debian

    sudo apt-get install hadoop-hdfs-secondarynamenode

    All cluster hosts except the Resource Manager running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

    Ubuntu or Debian

    sudo apt-get install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce

    One host in the cluster running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

    Ubuntu or Debian

    sudo apt-get install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

    All client hosts running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-client

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-client

    Ubuntu or Debian

    sudo apt-get install hadoop-client

  Note:

The hadoop-yarn and hadoop-hdfs packages are installed on each system automatically as dependencies of the other packages.

 

Step 4: Install CDH 5 with MRv1

 

  Note:

If you are also installing YARN, you can skip any packages you have already installed in Step 3: Install CDH 5 with YARN.

Skip this step and go to Step 3: Install CDH 5 with YARN if you intend to use only YARN.

 

  Important:

Before proceeding, you need to decide:

  1. Whether to configure High Availability (HA) for the NameNode and/or JobTracker; see the CDH 5 High Availability Guide for more information and instructions.
  2. Where to deploy the NameNode, Secondary NameNode, and JobTracker daemons. As a general rule:
    • The NameNode and JobTracker run on the same "master" host unless the cluster is large (more than a few tens of nodes), and the master host (or hosts) should not run the Secondary NameNode (if used), DataNode or TaskTracker services.
    • In a large cluster, it is especially important that the Secondary NameNode (if used) runs on a separate machine from the NameNode.
    • Each node in the cluster except the master host(s) should run the DataNode and TaskTracker services.

If you decide to configure HA for the NameNode, do not install hadoop-hdfs-secondarynamenode. After completing the HA software configuration, follow the installation instructions under Deploying HDFS High Availability.

 

  1. Install and deploy ZooKeeper.

      Important:

    Cloudera recommends that you install (or update) and start a ZooKeeper cluster before proceeding. This is a requirement if you are deploying high availability (HA) for the NameNode or JobTracker.

     

    Follow instructions under ZooKeeper Installation.

  2. Install each type of daemon package on the appropriate systems(s), as follows.

    Where to install

    Install commands

    JobTracker host running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-jobtracker

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-jobtracker

    Ubuntu or Debian

    sudo apt-get update; sudo apt-get install hadoop-0.20-mapreduce-jobtracker

    NameNode host running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-hdfs-namenode

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode

    Ubuntu or Debian

    sudo apt-get install hadoop-hdfs-namenode

    Secondary NameNode host (if used) running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode

    Ubuntu or Debian

    sudo apt-get install hadoop-hdfs-secondarynamenode

    All cluster hosts except the JobTracker, NameNode, and Secondary (or Standby) NameNode hosts running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

    Ubuntu or Debian

    sudo apt-get install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

    All client hosts running:

     

    Red Hat/CentOS compatible

    sudo yum clean all; sudo yum install hadoop-client

    SLES

    sudo zypper clean --all; sudo zypper install hadoop-client

    Ubuntu or Debian

    sudo apt-get install hadoop-client

 

 

Step 5: (Optional) Install LZO

 

If you decide to install LZO ( Lempel–Ziv–Oberhumer compression), proceed as follows.

  Note:

If you are upgrading to a new version of LZO, rather than installing it for the first time, you must first remove the old version; for example, on a RHEL system:

yum remove hadoop-lzo

 

  1. Add the repository on each host in the cluster. Follow the instructions for your OS version:
    For OS Version Do this
    Red Hat/CentOS/Oracle 5 Navigate to this link and save the file in the /etc/yum.repos.d/ directory.
    Red Hat/CentOS 6 Navigate to this link and save the file in the /etc/yum.repos.d/ directory.
    SLES
    1. Run the following command:

      $ sudo zypper addrepo -f
      http://archive.cloudera.com/gplextras5/sles/11/x86_64/gplextras/
      cloudera-gplextras5.repo

    2. Update your system package index by running:

      $ sudo zypper refresh

    Ubuntu or Debian Navigate to this link and save the file as /etc/apt/sources.list.d/gplextras.list.

      Important: Make sure you do not let the file name default to cloudera.list, as that will overwrite your existingcloudera.list.

  2. Install the package on each host as follows:
    For OS version Install commands
    Red Hat/CentOS compatible

    sudo yum install hadoop-lzo

    SLES

    sudo zypper install hadoop-lzo

    Ubuntu or Debian

    sudo apt-get install hadoop-lzo

  3. Continue with installing and deploying CDH. As part of the deployment, you will need to do some additional configuration for LZO, as shown under Configuring LZO .

      Important: Make sure you do this configuration after you have copied the default configuration files to a custom location and set alternatives to point to it.

 

Step 6: Deploy CDH and Install Components

 

Now proceed with:

 



 

Please Read and Accept our Terms

CDH 5 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.

Operating System

Version

Packages

Red Hat compatible

 

 

Red Hat Enterprise Linux (RHEL)

5.7

64-bit

 

6.2

64-bit

 

6.4

64-bit

 

6.5

64-bit

CentOS

5.7

64-bit

 

6.2

64-bit

 

6.4

64-bit

 

6.5

64-bit

Oracle Linux with default kernel and Unbreakable Enterprise Kernel

5.6

64-bit

 

6.4

64-bit

 

6.5

64-bit

SLES

 

 

SLES Linux Enterprise Server (SLES)

11 with Service Pack 1 or later

64-bit

Ubuntu/Debian

 

 

Ubuntu

Precise (12.04) - Long-Term Support (LTS)

64-bit

Debian

Wheezy (7.0, 7.1)

64-bit

Note:

  • CDH 5 provides only 64-bit packages.
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera's packages, you can also download source tarballs from Downloads.

 

 

Selected tab: SupportedOperatingSystems

Component

MySQL

SQLite

PostgreSQL

Oracle

Derby - see Note 4

Oozie

5.5, 5.6

8.4, 9.2

11gR2

Default

Flume

Default (for the JDBC Channel only)

Hue

5.5, 5.6 See Note 1

Default

8.4, 9.2

11gR2

Hive/Impala

5.5, 5.6

8.4, 9.2

11gR2

Default

Sqoop 1

See Note 2

 –

See Note 2

See Note 2

Sqoop 2

See Note 3

 –

See Note 3

See Note 3

Default

Notes

  1. Cloudera's recommendations are:
    • For Red Hat and similar systems:
      • Use MySQL server version 5.0 (or higher) and version 5.0 client shared libraries on Red Hat 5 and similar systems.
      • Use MySQL server version 5.1 (or higher) and version 5.1 client shared libraries on Red Hat 6 and similar systems.

      If you use a higher server version than recommended here (for example, if you use 5.5) make sure you install the corresponding client libraries.

    • For SLES systems, use MySQL server version 5.0 (or higher) and version 5.0 client shared libraries.
    • For Ubuntu systems:
      • Use MySQL server version 5.5 (or higher) and version 5.0 client shared libraries on Precise (12.04).
  2. For connectivity purposes only, Sqoop 1 supports MySQL5.1, PostgreSQL 9.1.4, Oracle 10.2, Teradata 13.1, and Netezza TwinFin 5.0. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  3. Sqoop 2 can transport data to and from MySQL5.1, PostgreSQL 9.1.4, Oracle 10.2, and Microsoft SQL Server 2012. The Sqoop 2 repository is supported only on Derby.
  4. Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the CDH 5 Installation Guide for recommendations.
Selected tab: SupportedDatabases

CDH 5 is supported with Oracle JDK 1.7.

Table 1. Supported JDK 1.7 Versions
Latest Certified Version Minimum Supported Version Exceptions
1.7.0_55 1.7.0_55 None

Selected tab: SupportedJDKVersions
Selected tab: SystemRequirements

What's New in CDH 5.1.0

Operating System Support

CDH 5.1 adds support for version 6.5 of RHEL and related platforms; see Supported Operating Systems.

Apache Crunch

 

  • CDH 5.1.0 implements Crunch 0.10.0.

Apache Flume

  • CDH 5.1.0 implements Flume 1.5.0.

Apache Hadoop

HDFS

POSIX Access Control Lists: As of CDH 5.1, HDFS supports POSIX Access Control Lists (ACLs), an addition to the traditional POSIX permissions model already supported. ACLs provide fine-grained control of permissions for HDFS files by providing a way to set different permissions for specific named users or named groups. For more information, see Enabling HDFS Extended ACLs.

NFS Gateway Improvements:CDH 5.1 makes the following improvements to the HDFS NFS gateway capability:

  • Subdirectory mounts :
    • Previously, clients could mount only the HDFS root directory.
    • As of CDH 5.1. a single mount point, configured via the nfs.export.point property in hdfs-site.xml on the NFS gateway node, is available to clients.
  • Improved support for Kerberized clusters (HDFS-5898):
    • Previously the NFS Gateway could connect to a secure cluster, but didn’t support logging in from a keytab.
    • As of CDH 5.1, set the nfs.kerberos.principal and nfs.keytab.file properties in hdfs-site.xml to allow users to log in from a keytab.
  • Support for port monitoring (HDFS-6406):
    • Previously, the NFS Gateway would always accept connections from any client.
    • As of CDH 5.1, set nfs.port.monitoring.disabled to false in hdfs-site.xml to allow connections only from privileged ports (those with root access).
  • Static uid/gid mapping for NFS clients that are not in synch with the NFS Gateway (HDFS-6435):
    • NFS sends UIDs and GIDs over the network from client to server, meaning that the UIDs and GIDs must be in synch between clients and server machines in order for users and groups to be set appropriately for file access and file creation; this is usually but not always the case.
    • As of CDH 5.1, you can configure a static UID/GID mapping file, by default /etc/nfs.map.
    • You can change the default (to use a different file path) by means of the nfs.static.mapping.file property in hdfs-site.xml.
    • The following sample entries illustrate the format of the file:

      uid 10 100 # Map the remote UID 10 the local UID 100
      gid 11 101 # Map the remote GID 11 to the local GID 101

  • Hadoop portmap, or insecure system portmap, no longer required:
    • Many supported OS have portmap bugs detailed here.
    • CDH 5.1 allows you to circumvent the problems by starting the NFS gateway as root, whether you install CDH from packages or parcels.

        Note:

      After initially registering with the system portmap as root, the NFS Gateway drops privileges and runs as a regular user.

    • Cloudera Manager starts the gateway as root by default.
  • Support for AIX NFS clients (HDFS-6549):
    • To deploy AIX NFS clients, set nfs.aix.compatibility.mode.enabled to true in hdfs-site.xml.
    • This enables code that handles bugs in the AIX implementation of NFS.

MapReduce and YARN

YARN with Impala supports Dynamic Prioritization.

Apache HBase

  • CDH 5.1.0 implements HBase 0.98.
  • As of CDH 5.1.0, HBase fully supports BucketCache, which was introduced as an experimental feature in CDH 5 Beta 1.
  • HBase now supports access control for EXEC permissions.
  • CDH 5.1.0 HBase introduces a reverse scan API; allowing you to scan a table in reverse.
  • You can now run a MapReduce job over a snapshot from HBase, rather than being limited to live data.
  • A new stateless streaming scanner is available over the REST API.
  • The delete* methods of the Delete class of the HBase Client API now use the timestamp from the constructor, the same behavior as the Put class. (In HBase versions before CDH 5.1, the delete* methods ignored the constructor's timestamp, and used the value of HConstants.LATEST_TIMESTAMP. This behavior was different from the behavior of the add() methods of the Put class.)
  • The SnapshotInfo tool has been enhanced in the following ways:
    • A new option, -list-snapshots, has been added to the SnapshotInfo command. This option allows you to list snapshots on either a local or remote server.
    • You can now pass the -size-in-bytes flag to print the size of snapshot files in bytes rather than the default human-readable format.
    • The size of each snapshot file in bytes is checked against the size reported in the manifest, and if the two sizes differ, the tool reports the file as corrupt.
  • A new -target option for ExportSnapshot allows you to specify a different name for the target cluster from the snapshot name on the source cluster.

For more information about these features, see New Features and Changes for HBase in CDH 5.

In addition, Cloudera has fixed some binary incompatibilities between HBase 0.96 and 0.98. As a result, the incompatibilities introduced by HBASE-10452and HBASE-10339 do not affect CDH 5.1 HBase, as explained below:

  • HBASE-10452 introduced a new exception and error message in setTimeStamp(), for an extremely unlikely event when where getting a TimeRangecould fail because of an integer overflow. CDH 5.1 suppresses the new exception to retain compatibility with HBase 0.96, but logs the error.
  • HBASE-10339 contained code which inadvertently changed the signatures of the getFamilyMap method. CDH 5.1 restores these signatures to those used in HBase 0.96, to retain compatibility.

Apache Hive

  • Permission inheritance fixes
  • Support for decimal computation, and for reading and writing decimal-format data from and to Parquet and Avro

Hue

CDH 5.1.0 implements Hue 3.6.

New Features:

  • Search App v2:
    • 100% Dynamic dashboard
    • Drag-and-Drop dashboard builder
    • Text, Timeline, Pie, Line, Bar, Map, Filters, Grid and HTML widgets
    • Solr Index creation wizard (from a file)
  • Ability to view compressed Snappy, Avro and Parquet files
  • Impala HA
  • Close Impala and Hive sessions queries and commands

Apache Mahout

  • CDH 5.1.0 implements Mahout 0.9.

See also Apache Mahout Incompatible Changes .

Apache Oozie

  • You can now submit Sqoop jobs from the Oozie command line.
  • LAST_ONLY execution mode now works correctly (OOZIE-1319).

Cloudera Search

New Features:

  • A Quick Start script that automates using Search to query data from the Enron Email dataset. The script downloads the data, expands it, moves it to HDFS, indexes, and pushes the results live. The documentation now also includes a companion quick start guide, which describes the tasks the script completes, as well as customization options.
  • Solrctl now has built-in support for schema-less Solr. For more information, see Using Schemaless Mode.
  • Sentry-based document-level security for role-based access control of a collection. Document-level access control associates authorization tokens with each document in the collection, enabling granting Sentry roles access to sets of documents in a collection.
  • Cloudera Search includes a version of Kite 0.10.0, which includes all morphlines-related backports of all fixes and features in Kite 0.15.0. For additional information on Kite, see:
  • Support for the Parquet file format is included with this version of Kite 0.10.0.
  • Inclusion of hbase-indexer-1.5.1, a new version of the Lily HBase Indexer. This new version of the indexer includes the 0.10.0 version of Kite mentioned above. This 0.10.0 version of Kite includes the backports and fixes included in Kite 0.13.0.

Apache Sentry (incubating)

  • CDH 5.1.0 implements Sentry 1.2. This includes a database-backed Sentry service which uses the more traditional GRANT/REVOKE statements instead of the previous policy file approach making it easier to maintain and modify privileges.
  • Revised authorization privilege model for Hive and Impala. For more details, see Appendix: Authorization Privilege Model for Hive and Impala.

Apache Spark

  • CDH 5.1.0 implements Spark 1.0.
  • The spark-submit command abstracts across the variety of deployment modes that Spark supports and takes care of assembling the classpath for you.
  • Application History Server (SparkHistoryServer) improves monitoring capabilities.
  • You can launch PySpark applications against YARN clusters. PySpark currently only works in YARN Client mode.

Other improvements include:

  • Streaming integration with Kerberos
  • Addition of more algorithms to MLLib (Sparse Vector Support)
  • Improvements to Avro integration
  • Spark SQL alpha release (new SQL engine). Spark SQL allows you to run SQL statements inside a Spark application that manipulate and produce RDDs.

      Note:

    Because of its immaturity and alpha status, Cloudera does not currently offer commercial support for Spark SQL, but bundles it with our distribution so that you can try it out.

     

  • Authentication of all Spark communications
Selected tab: WhatsNew

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University

Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.