Networking and Security Requirements

CDH and Cloudera Manager Supported Transport Layer Security Versions

The following components are supported by the indicated versions of Transport Layer Security (TLS):

Components Supported by TLS

Component

Role Name Port Version
Cloudera Manager Cloudera Manager Server   7182 TLS 1.2
Cloudera Manager Cloudera Manager Server   7183 TLS 1.2
Flume     9099 TLS 1.2
Flume   Avro Source/Sink   TLS 1.2
Flume   Flume HTTP Source/Sink   TLS 1.2
HBase Master HBase Master Web UI Port 60010 TLS 1.2
HDFS NameNode Secure NameNode Web UI Port 50470 TLS 1.2
HDFS Secondary NameNode Secure Secondary NameNode Web UI Port 50495 TLS 1.2
HDFS HttpFS REST Port 14000 TLS 1.1, TLS 1.2
Hive HiveServer2 HiveServer2 Port 10000 TLS 1.2
Hue Hue Server Hue HTTP Port 8888 TLS 1.2
Impala Impala Daemon Impala Daemon Beeswax Port 21000 TLS 1.2
Impala Impala Daemon Impala Daemon HiveServer2 Port 21050 TLS 1.2
Impala Impala Daemon Impala Daemon Backend Port 22000 TLS 1.2
Impala Impala StateStore StateStore Service Port 24000 TLS 1.2
Impala Impala Daemon Impala Daemon HTTP Server Port 25000 TLS 1.2
Impala Impala StateStore StateStore HTTP Server Port 25010 TLS 1.2
Impala Impala Catalog Server Catalog Server HTTP Server Port 25020 TLS 1.2
Impala Impala Catalog Server Catalog Server Service Port 26000 TLS 1.2
Oozie Oozie Server Oozie HTTPS Port 11443 TLS 1.1, TLS 1.2
Solr Solr Server Solr HTTP Port 8983 TLS 1.1, TLS 1.2
Solr Solr Server Solr HTTPS Port 8985 TLS 1.1, TLS 1.2
Spark History Server   18080 TLS 1.2
YARN ResourceManager ResourceManager Web Application HTTP Port 8090 TLS 1.2
YARN JobHistory Server MRv1 JobHistory Web Application HTTP Port 19890 TLS 1.2

CDH and Cloudera Manager Networking and Security Requirements

The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:

  • Networking Protocols Support
    CDH requires IPv4. IPv6 is not supported and must be disabled.

    See also Configure Network Names.

  • Multihoming Support

    Multihoming CDH or Cloudera Manager is not supported outside specifically certified Cloudera partner appliances. Cloudera finds that current Hadoop architectures combined with modern network infrastructures and security practices remove the need for multihoming. Multihoming, however, is beneficial internally in appliance form factors to take advantage of high-bandwidth InfiniBand interconnects.

    Although some subareas of the product may work with unsupported custom multihoming configurations, there are known issues with multihoming. In addition, unknown issues may arise because multihoming is not covered by our test matrix outside the Cloudera-certified partner appliances.

  • Entropy

    Data at rest encryption requires sufficient entropy to ensure randomness.

    See entropy requirements in Data at Rest Encryption Requirements.

  • Cluster hosts must have a working network name resolution system and correctly formatted /etc/hosts file. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must:
    • Contain consistent information about hostnames and IP addresses across all hosts
    • Not contain uppercase hostnames
    • Not contain duplicate IP addresses

    Cluster hosts must not use aliases, either in /etc/hosts or in configuring DNS. A properly formatted /etc/hosts file should be similar to the following example:

    127.0.0.1 localhost.localdomain localhost
    192.168.1.1 cluster-01.example.com cluster-01
    192.168.1.2 cluster-02.example.com cluster-02
    192.168.1.3 cluster-03.example.com cluster-03 
  • In most cases, the Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation or upgrade wizard. You must log in using a root account or an account that has password-less sudo permission. For authentication during the installation and upgrade procedures, you must either enter the password or upload a public and private key pair for the root or sudo user account. If you want to use a public and private key pair, the public key must be installed on the cluster hosts before you use Cloudera Manager.

    Cloudera Manager uses SSH only during the initial install or upgrade. Once the cluster is set up, you can disable root SSH access or change the root password. Cloudera Manager does not save SSH credentials, and all credential information is discarded when the installation is complete.

  • The Cloudera Manager Agent runs as root so that it can make sure that the required directories are created and that processes and files are owned by the appropriate user (for example, the hdfs and mapred users).
  • Security-Enhanced Linux (SELinux) must not block Cloudera Manager or CDH operations.
  • Firewalls (such as iptables and firewalld) must be disabled or configured to allow access to ports used by Cloudera Manager, CDH, and related services.
  • For RHEL and CentOS, the /etc/sysconfig/network file on each host must contain the correct hostname.
  • Cloudera Manager and CDH use several user accounts and groups to complete their tasks. The set of user accounts and groups varies according to the components you choose to install. Do not delete these accounts or groups and do not modify their permissions and rights. Ensure that no existing systems prevent these accounts and groups from functioning. For example, if you have scripts that delete user accounts not in a whitelist, add these accounts to the list of permitted accounts. Cloudera Manager, CDH, and managed services create and use the following accounts and groups:
Users and Groups

Component (Version)

Unix User ID Groups Functionality
Cloudera Manager (all versions) cloudera-scm cloudera-scm Clusters managed by Cloudera Manager run Cloudera Manager Server, monitoring roles, and other Cloudera Server processes as cloudera-scm.

Requires keytab file named cmf.keytab because name is hard-coded in Cloudera Manager.

Apache Accumulo accumulo accumulo Accumulo processes run as this user.
Apache Flume flume flume The sink that writes to HDFS as user must have write privileges.
Apache HBase hbase hbase The Master and the RegionServer processes run as this user.
HDFS hdfs hdfs, hadoop The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.
Apache Hive hive hive

The HiveServer2 process and the Hive Metastore processes run as this user.

A user must be defined for Hive access to its Metastore DB (for example, MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml.

Apache HCatalog hive hive

The WebHCat service (for REST access to Hive functionality) runs as the hive user.

HttpFS httpfs httpfs

The HttpFS service runs as this user. See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.

Hue hue hue

Hue services run as this user.

Hue Load Balancer apache apache The Hue Load balancer has a dependency on the apache2 package that uses the apache user name. Cloudera Manager does not run processes using this user ID.
Impala impala impala, hive Impala services run as this user.
Apache Kafka kafka kafka Kafka brokers and mirror makers run as this user.
Java KeyStore KMS kms kms The Java KeyStore KMS service runs as this user.
Key Trustee KMS kms kms The Key Trustee KMS service runs as this user.
Key Trustee Server keytrustee keytrustee The Key Trustee Server service runs as this user.
Kudu kudu kudu Kudu services run as this user.
MapReduce mapred mapred, hadoop Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos.
Apache Oozie oozie oozie The Oozie service runs as this user.
Parquet ~ ~ No special users.
Apache Pig ~ ~ No special users.
Cloudera Search solr solr The Solr processes run as this user.
Apache Spark spark spark The Spark History Server process runs as this user.
Apache Sentry sentry sentry The Sentry service runs as this user.
Apache Sqoop sqoop sqoop This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.
YARN yarn yarn, hadoop Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos.
Apache ZooKeeper zookeeper zookeeper The ZooKeeper processes run as this user. It is not configurable.