Apache ZooKeeper Known Issues

Adding New ZooKeeper Servers Can Lead to Data Loss

When the number of new ZooKeeper servers exceeds the number that already exist in the ZooKeeper service (for example, if you increase the number of servers from 1 to 3), and a Start command is immediately issued to the ZooKeeper service, the new servers can form a quorum, which causes data loss in existing servers.

Users of the following versions of Cloudera Manager are affected:

5.0.0–5.0.5, 5.1.0–5.1.4, 5.2.0–5.2.4, and 5.3.0–5.3.2

Workaround: If you use a version of Cloudera Manager listed above, upgrade to the next available maintenance release with the bug fix (within the minor version), or to Cloudera Manager 5.4.

The ZooKeeper server cannot be migrated from version 3.4 to 3.3, then back to 3.4, without user intervention.

Upgrading from 3.3 to 3.4 is supported, as is downgrading from 3.4 to 3.3. However, moving from 3.4 to 3.3 and back to 3.4 will fail. 3.4 is checking the datadir for acceptedEpoch and currentEpoch files and comparing these against the snapshot and log files contained in the same directory. These epoch files are new in 3.4.

As a result: 1) Upgrading from 3.3 to 3.4 is fine - the *Epoch files do not exist, and the server creates them. 2) Downgrading from 3.4 to 3.3 is also fine as version 3.3 ignores the *Epoch files. 3) Going from 3.4 to 3.3 then back to 3.4 fails because 3.4 sees invalid *Epoch files in the datadir; 3.3 will have ignored them, applying changes to the snapshot and log files without updating the *Epoch files.


Anticipated Resolution: See workaround

Workaround: Delete the *Epoch files if this situation occurs — the version 3.4 server will recreate them as in case 1) above.