Maintaining a ZooKeeper Server

The ZooKeeper server continually saves znode snapshot files and, optionally, transactional logs in a Data Directory to enable you to recover data. It's a good idea to back up the ZooKeeper Data Directory periodically. Although ZooKeeper is highly reliable because a persistent copy is replicated on each server, recovering from backups may be necessary if a catastrophic failure or user error occurs.

When you use the default configuration, the ZooKeeper server does not remove the snapshots and log files, so they will accumulate over time. You will need to clean up this directory occasionally, taking into account on your backup schedules and processes. To automate the cleanup, a zkCleanup.sh script is provided in the bin directory of the zookeeper base package. Modify this script as necessary for your situation. In general, you want to run this as a cron task based on your backup schedule.

The data directory is specified by the dataDir parameter in the ZooKeeper configuration file, and the data log directory is specified by the dataLogDir parameter.

For more information, see Ongoing Data Directory Cleanup.