Troubleshooting HDFS Encryption

This topic contains HDFS Encryption-specific troubleshooting information in the form of issues you might face when encrypting HDFS files/directories and their workarounds.

KMS server jute buffer exception

Description

You see the following error when the KMS (for example, as a ZooKeeper client) jute buffer size is insufficient to hold all the tokens:
2017-01-31 21:23:56,416 WARN org.apache.zookeeper.ClientCnxn: Session 0x259f5fb3c1000fb for server example.cloudera.com/10.172.0.1:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len4196356 is out of range! 

Solution

Increase the jute buffer size and restart the KMS. In Cloudera Manager, go to the KMS Configuration page, and in the Additional Java Configuration Options for KMS (kms_java_opts) field, enter -Djute.maxbuffer=<number_of_bytes>. Restart the KMS.

Retrieval of encryption keys fails

Description

You see the following error when trying to list encryption keys
user1@example-sles-4:~> hadoop key list
Cannot list keys for KeyProvider: KMSClientProvider[https: //example-sles-2.example.com:16000/kms/v1/]: Retrieval of all keys failed.

Solution

Make sure your truststore has been updated with the relevant certificate(s), such as the Key Trustee server certificate.

DistCp between unencrypted and encrypted locations fails

Description

By default, DistCp compares checksums provided by the filesystem to verify that data was successfully copied to the destination. However, when copying between unencrypted and encrypted locations, the filesystem checksums will not match since the underlying block data is different.

Solution

Specify the -skipcrccheck and -update distcp flags to avoid verifying checksums.

Cannot move encrypted files to trash

Description

With HDFS encryption enabled, you cannot move encrypted files or directories to the trash directory.

Solution

To remove encrypted files/directories, use the following command with the -skipTrash flag specified to bypass trash.
rm -r -skipTrash /testdir

NameNode - KMS communication fails after long periods of inactivity

Description

Encrypted files and encryption zones cannot be created if a long period of time (by default, 20 hours) has passed since the last time the KMS and NameNode communicated.

Solution

For lower CDH 5 releases, there are two possible workarounds to this issue :
  • You can increase the KMS authentication token validity period to a very high number. Since the default value is 10 hours, this bug will only be encountered after 20 hours of no communication between the NameNode and the KMS. Add the following property to the kms-site.xmlSafety Valve:
    <property>
    <name>hadoop.kms.authentication.token.validity</name> 
    <value>SOME VERY HIGH NUMBER</value> 
    </property>
  • You can switch the KMS signature secret provider to the string secret provider by adding the following property to the kms-site.xml Safety Valve:
    <property>
    <name>hadoop.kms.authentication.signature.secret</name> 
    <value>SOME VERY SECRET STRING</value> 
    </property>