This is the documentation for Cloudera 5.2.x.
Documentation for other versions is available at Cloudera Documentation.

Installing the ZooKeeper Packages

There are two ZooKeeper server packages:

  • The zookeeper base package provides the basic libraries and scripts that are necessary to run ZooKeeper servers and clients. The documentation is also included in this package.
  • The zookeeper-server package contains the init.d scripts necessary to run ZooKeeper as a daemon process. Because zookeeper-server depends on zookeeper, installing the server package automatically installs the base package.
  Note: Install Cloudera Repository

Before using the instructions on this page to install or upgrade, install the Cloudera yum, zypper/YaST or apt repository, and install or upgrade CDH 5 and make sure it is functioning correctly. For instructions, see Installing the Latest CDH 5 Release and Upgrading Unmanaged CDH Using the Command Line.

Installing the ZooKeeper Base Package

To install ZooKeeper On Red Hat-compatible systems:

$ sudo yum install zookeeper

To install ZooKeeper on Ubuntu and other Debian systems:

$ sudo apt-get install zookeeper

To install ZooKeeper on SLES systems:

$ sudo zypper install zookeeper

Installing the ZooKeeper Server Package and Starting ZooKeeper on a Single Server

The instructions provided here deploy a single ZooKeeper server in "standalone" mode. This is appropriate for evaluation, testing and development purposes, but may not provide sufficient reliability for a production application. See Installing ZooKeeper in a Production Environment for more information.

To install the ZooKeeper Server On Red Hat-compatible systems:

$ sudo yum install zookeeper-server

To install a ZooKeeper server on Ubuntu and other Debian systems:

$ sudo apt-get install zookeeper-server

To install ZooKeeper on SLES systems:

$ sudo zypper install zookeeper-server
To create /var/lib/zookeeper and set permissions:
mkdir -p /var/lib/zookeeper
chown -R zookeeper /var/lib/zookeeper/

To start ZooKeeper

  Note:

ZooKeeper may start automatically on installation on Ubuntu and other Debian systems. This automatic start will happen only if the data directory exists; otherwise you will be prompted to initialize as shown below.

  • To start ZooKeeper after an upgrade:
$ sudo service zookeeper-server start
  • To start ZooKeeper after a first-time install:
$ sudo service zookeeper-server init
$ sudo service zookeeper-server start
  Note:

If you are deploying multiple ZooKeeper servers after a fresh install, you need to create a myid file in the data directory. You can do this by means of an init command option: $ sudo service zookeeper-server init --myid=1

Installing ZooKeeper in a Production Environment

In a production environment, you should deploy ZooKeeper as an ensemble with an odd number of nodes. As long as a majority of the servers in the ensemble are available, the ZooKeeper service will be available. The minimum recommended ensemble size is three ZooKeeper servers, and it is recommended that each server run on a separate machine.

Deploying a ZooKeeper ensemble requires some additional configuration. The configuration file (zoo.cfg) on each server must include a list of all servers in the ensemble, and each server must also have a myid file in its data directory (by default /var/lib/zookeeper) that identifies it as one of the servers in the ensemble. Proceed as follows on each server.
  1. Use the commands under Installing the ZooKeeper Server Package and Starting ZooKeeper on a Single Server to install zookeeper-server on each host.
  2. Test the expected loads to set the Java heap size so as to avoid swapping. Make sure you are well below the threshold at which the system would start swapping; for example 12GB for a machine with 16GB of RAM.
  3. Create a configuration file. This file can be called anything you like, and must specify settings for at least the parameters shown under "Minimum Configuration" in the ZooKeeper Administrator's Guide. You should also configure values for initLimit, syncLimit, and server.n; see the explanations in the administrator's guide. For example:
    tickTime=2000
    dataDir=/var/lib/zookeeper/
    clientPort=2181
    initLimit=5
    syncLimit=2
    server.1=zoo1:2888:3888
    server.2=zoo2:2888:3888
    server.3=zoo3:2888:3888

    In this example, the final three lines are in the form server.id=hostname:port:port. The first port is for a follower in the ensemble to listen on for the leader; the second is for leader election. You set id for each server in the next step.

  4. Create a file named myid in the server's DataDir; in this example, /var/lib/zookeeper/myid . The file must contain only a single line, and that line must consist of a single unique number between 1 and 255; this is the id component mentioned in the previous step. In this example, the server whose hostname is zoo1 must have a myid file that contains only 1.
  5. Start each server as described in the previous section.
  6. Test the deployment by running a ZooKeeper client:
    zookeeper-client -server hostname:port
    For example:
    zookeeper-client -server zoo1:2181

For more information on configuring a multi-server deployment, see Clustered (Multi-Server) Setup in the ZooKeeper Administrator's Guide.

Setting up Supervisory Process for the ZooKeeper Server

The ZooKeeper server is designed to be both highly reliable and highly available. This means that:

  • If a ZooKeeper server encounters an error it cannot recover from, it will "fail fast" (the process will exit immediately)
  • When the server shuts down, the ensemble remains active, and continues serving requests
  • Once restarted, the server rejoins the ensemble without any further manual intervention.

Cloudera recommends that you fully automate this process by configuring a supervisory service to manage each server, and restart the ZooKeeper server process automatically if it fails. See the ZooKeeper Administrator's Guide for more information.