This is the documentation for CDH 4.6.0.
Documentation for other versions is available at Cloudera Documentation.

Creating a Local Yum Repository

This section explains how to set up a local yum repository which you can then use to install CDH on the machines in your cluster. There are a number of reasons you might want to do this, for example:

  • The computers in your cluster may not have Internet access. You can still use yum to do an installation on those machines by creating a local yum repository.
  • You may want to keep a stable local repository to ensure that any new installations (or re-installations on existing cluster members) use exactly the same bits.
  • Using a local repository may be the most efficient way to distribute the software to the cluster members.

To set up your own internal mirror, do the following.

  Note:

Before You Start

These instructions assume you already have the appropriate Cloudera repo file on the system on which you are going to create the local repository. If this is not the case, follow the instructions under To download and install the CDH4 Package. (Downloading and installing the RPM also downloads the repo file and saves it in /etc/yum.repos.d.)

  1. On a computer that does have Internet access, install a web server such as apache/lighttpd on the machine which will serve the RPMs. The default configuration should work. Make sure the firewall on this web server will let http traffic go through.
  2. On the same computer as in the previous step, install the yum-utils and createrepo packages if they are not already installed (yum-utils includes the reposync command):
    sudo yum install yum-utils createrepo
  3. On the same computer as in the previous steps, download the yum repository into a temporary location. On Red Hat/CentOS 6, you can use a command such as:
    reposync -r cloudera-cdh4 
      Note:

    cloudera-cdh4 is the name of the repository on your system; the name is usually in square brackets on the first line of the repo file, which in this example is /etc/yum.repos.d/cloudera-cdh4.repo.

  4. Put all the RPMs into a directory served by your web server. For this example, we'll call it /var/www/html/cdh/4/RPMS/noarch/ (or x86_64 or i386 instead of noarch). Make sure you can remotely access the files in the directory you just created (the URL should look like http://<yourwebserver>/cdh/4/RPMS/).
  5. On your web server, go to /var/www/html/cdh/4/ and type the following command:
    createrepo .
    This will create or update the necessary metadata so yum can understand this new repository (you will see a new directory named repodata).
      Important:

    Check the permissions of the subdirectories and files under /var/www/html/cdh/4/. Make sure they are all readable by your web server user.

  6. Edit the repo file you got from Cloudera (see Before You Start) and replace the line starting with baseurl= or mirrorlist= with baseurl=http://<yourwebserver>/cdh/4/
  7. Save this modified repo file in /etc/yum.repos.d/, and check that you can install CDH through yum.

Example:

yum update && yum install hadoop

Once you have confirmed that your internal mirror works, you can distribute this modified repo file to all your machines, and they should all be able to install CDH without needing access to the Internet. Follow the instructions under CDH4 Installation.