Setting Up Apache ZooKeeper Using the Command Line

Apache ZooKeeper is a highly reliable and available service that provides coordination between distributed processes.

Initialize a Single ZooKeeper Server

The instructions provided here deploy a single ZooKeeper server in standalone mode. This is appropriate for evaluation, testing and development purposes, but does not provide sufficient reliability for a production application. See Initialize Multiple ZooKeeper Servers in a Production Environment for more information.

  1. Create /var/lib/zookeeper and set permissions:
    sudo mkdir -p /var/lib/zookeeper
    sudo chown -R zookeeper /var/lib/zookeeper
  2. Initialize and start ZooKeeper:
    sudo service zookeeper-server init

    This outputs the following message:

    No myid provided, be sure to specify it in /var/lib/zookeeper/myid if using non-standalone

    For a single ZooKeeper server in standalone mode, you can safely ignore this message.

  3. Start ZooKeeper:
    • RHEL 7 compatible:
      sudo systemctl start zookeeper-server
    • RHEL 6 compatible, SLES, Ubuntu:
      sudo service zookeeper-server start
  4. Continue to Setting up Supervisory Process for the ZooKeeper Server.

Initialize Multiple ZooKeeper Servers in a Production Environment

In a production environment, deploy ZooKeeper as an ensemble with an odd number of servers. As long as a majority of the servers in the ensemble are available, the ZooKeeper service will be available. The minimum recommended ensemble size is three ZooKeeper servers, and Cloudera recommends that each server run on a separate machine. In addition, the ZooKeeper server process should have its own dedicated disk storage if possible.

Deploying a ZooKeeper ensemble requires some additional configuration. The configuration file (zoo.cfg) on each server must include a list of all servers in the ensemble, and each server must also have a myid file in its data directory (/var/lib/zookeeper by default) that identifies it as one of the servers in the ensemble.

Perform the following steps on each ZooKeeper server host.
  1. Create a configuration file. This file can be called anything you like, and must specify settings for at least the parameters shown under "Minimum Configuration" in the ZooKeeper Administrator's Guide. You should also configure values for initLimit, syncLimit, and server.n; see the explanations in the administrator's guide. For example:
    tickTime=2000
    dataDir=/var/lib/zookeeper/
    clientPort=2181
    initLimit=5
    syncLimit=2
    server.1=zoo1:2888:3888
    server.2=zoo2:2888:3888
    server.3=zoo3:2888:3888

    In this example, the final three lines are in the form server.id=hostname:port:port. The first port is for a follower in the ensemble to listen on for the leader; the second is for leader election. You set id for each server in the next step.

  2. Create a file named myid in the server's DataDir; in this example, /var/lib/zookeeper/myid . The file must contain only a single line, and that line must consist of a single unique number between 1 and 255; this is the id component mentioned in the previous step. In this example, the server whose hostname is zoo1 must have a myid file that contains only 1.
  3. Start each server as described in the previous section.
  4. Test the deployment by running a ZooKeeper client:
    zookeeper-client -server hostname:port
    For example:
    zookeeper-client -server zoo1:2181

For more information on configuring a multi-server deployment, see Clustered (Multi-Server) Setup in the ZooKeeper Administrator's Guide.

Setting up Supervisory Process for the ZooKeeper Server

The ZooKeeper server is designed to be both highly reliable and highly available. This means that:

  • If a ZooKeeper server encounters an error it cannot recover from, it will "fail fast" (the process will exit immediately)
  • When the server shuts down, the ensemble remains active, and continues serving requests
  • Once restarted, the server rejoins the ensemble without any further manual intervention.

Cloudera recommends that you fully automate this process by configuring a supervisory service to manage each server, and restart the ZooKeeper server process automatically if it fails. See the ZooKeeper Administrator's Guide for more information.