Apache ZooKeeper Known Issues
ZooKeeper JMX did not support TLS when managed by Cloudera Manager
The ZooKeeper service optionally exposes a JMX port used for reporting and metrics. By default, Cloudera Manager enables this port, but prior to Cloudera Manager 6.1.0, it did not support mutual TLS authentication on this connection. While JMX has a password-based authentication mechanism that Cloudera Manager enables by default, weaknesses have been found in the authentication mechanism, and Oracle now advises JMX connections to enable mutual TLS authentication in addition to password-based authentication. A successful attack may leak data, cause denial of service, or even allow arbitrary code execution on the Java process that exposes a JMX port. Beginning in Cloudera Manager 6.1.0, it is possible to configure mutual TLS authentication on ZooKeeper’s JMX port.
Products affected: ZooKeeper
Releases affected: Cloudera Manager 6.1.0 and lower, Cloudera Manager 5.16 and lower
Users affected: All
Date/time of detection: June 7, 2018
Severity (Low/Medium/High): 9.8 High (CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)
Impact: Remote code execution
Addressed in release/refresh/patch: Cloudera Manager 6.1.0
Adding New ZooKeeper Servers Can Lead to Data Loss
When the number of new ZooKeeper servers exceeds the number that already exist in the ZooKeeper service (for example, if you increase the number of servers from 1 to 3), and a Start command is immediately issued to the ZooKeeper service, the new servers can form a quorum, which causes data loss in existing servers.
Users of the following versions of Cloudera Manager are affected:
5.0.0–5.0.5, 5.1.0–5.1.4, 5.2.0–5.2.4, and 5.3.0–5.3.2
Workaround: If you use a version of Cloudera Manager listed above, upgrade to the next available maintenance release with the bug fix (within the minor version), or to Cloudera Manager 5.4.
The ZooKeeper server cannot be migrated from version 3.4 to 3.3, then back to 3.4, without user intervention.
Upgrading from 3.3 to 3.4 is supported, as is downgrading from 3.4 to 3.3. However, moving from 3.4 to 3.3 and back to 3.4 will fail. 3.4 is checking the datadir for acceptedEpoch and currentEpoch files and comparing these against the snapshot and log files contained in the same directory. These epoch files are new in 3.4.
As a result: 1) Upgrading from 3.3 to 3.4 is fine - the *Epoch files do not exist, and the server creates them. 2) Downgrading from 3.4 to 3.3 is also fine as version 3.3 ignores the *Epoch files. 3) Going from 3.4 to 3.3 then back to 3.4 fails because 3.4 sees invalid *Epoch files in the datadir; 3.3 will have ignored them, applying changes to the snapshot and log files without updating the *Epoch files.
Cloudera Bug: CDH-5272
Anticipated Resolution: See workaround
Workaround: Delete the *Epoch files if this situation occurs — the version 3.4 server will recreate them as in case 1) above.