SSH and HTTPS in the Hadoop Cluster
SSH and HTTPS can be used to transmit information securely:
- SSH (Secure Shell) is a secure shell that usually runs on top of SSL and has a built-in username/password authentication scheme that can be used for secure access to a remote host; it is a more secure alternative to rlogin and telnet.
- HTTPS (HTTP Secure) is HTTP running on top of SSL, adding security to standard HTTP communications.
It is a good idea to use SSH for remote administration purposes (instead of rlogin, for example). But note that it is not used to secure communication among the elements in a Hadoop cluster (DataNode, NameNode, TaskTracker or YARN ResourceManager, JobTracker or YARN NodeManager, or the /etc/init.d scripts that start daemons locally).
The Hadoop components use SSH in the following cases:
Some communication within Hadoop can be configured to use HTTPS. Implementing this requires generating valid certificates and configuring clients to use those certificates. The HTTPS functionality that can be configured in CDH4 is:
- Encrypted MapReduce Shuffle (both MRv1 and YARN).
- Encrypted Web UIs; the same configuration parameters that enable Encrypted MapReduce Shuffle implement Encrypted Web UIs.
These features are discussed under Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport.