Configuring Encryption

The goal of encryption is to ensure that only authorized users can view, use, or contribute to a data set. These security controls add another layer of protection against potential threats by end-users, administrators and other malicious actors on the network. Data protection can be applied at a number of levels within Hadoop:
  • OS Filesystem-level - Encryption can be applied at the Linux operating system filesystem level to cover all files in a volume. An example of this approach is Cloudera Navigator Encrypt (formerly Gazzang zNcrypt) which is available for Cloudera customers licensed for Cloudera Navigator. Navigator Encrypt operates at the Linux volume level, so it can encrypt cluster data inside and outside HDFS, such as temp/spill files, configuration files and metadata databases (to be used only for data related to a CDH cluster). Navigator Encrypt must be used with Cloudera Navigator Key Trustee Server (formerly Gazzang zTrustee).

    CDH components such as Impala, MapReduce, YARN, or HBase also have the ability to encrypt data that lives temporarily on the local filesystem outside HDFS. To enable this feature, see Configuring Encryption for Data Spills.

  • Network-level - Encryption can be applied to encrypt data just before it gets sent across a network and to decrypt it just after receipt. In Hadoop this means coverage for data sent from client user interfaces as well as service-to-service communication like remote procedure calls (RPCs). This protection uses industry-standard protocols such as TLS/SSL.
  • HDFS-level - Encryption applied by the HDFS client software. HDFS Transparent Encryption operates at the HDFS folder level, allowing you to encrypt some folders and leave others unencrypted. Cannot encrypt any data outside HDFS. To ensure reliable key storage (so that data is not lost), Cloudera Navigator Key Trustee Server should be used, while the default Java keystore can be used for test purposes. See Enabling HDFS Encryption Using Cloudera Navigator Key Trustee Server for more information.

    Unlike OS and network-level encryption, HDFS transparent encryption is end-to-end. That is, it protects data at-rest and in-transit, which makes it more efficient than implementing a combination of OS-level and network-level encryption.