Operating System Known Issues

YARN failure due to Debian 8 kernel limitation

The Linux kernel for Debian 8 (Jessie) does not support the cgroup setting for cpu.cfs_period_us and causes failures such as that shown in the log message below:
...DATE TIME FATAL NodeManager
Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor

at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:269)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
...

Bug: Debian 805498

Linux kernel security patch and CDH services crashes

After applying a recent Linux kernel security patch for CVE-2017-1000364, CDH services that use the JSVC set of libraries crash with a Java Virtual Machine (JVM) error such as:
A fatal error has been detected by the Java Runtime Environment:
SIGBUS (0x7) at pc=0x00007fe91ef6cebc, pid=30321, tid=0x00007fe930c67700

Cloudera services for HDFS and Impala cannot start after applying the patch.

Commonly used Linux distributions are shown in the table below. However, the issue affects any CDH release that runs on RHEL, CentOS, Oracle Linux, SUSE Linux, or Ubuntu and that has had the Linux kernel security patch for CVE-2017-1000364 applied.

If you have already applied the patch for your OS according to the advisories for CVE-2017-1000364, apply the kernel update that contains the fix for your operating system (some of which are listed in the table). If you cannot apply the kernel update, you can workaround the issue by increasing the Java thread stack size as detailed in the steps below.

Distribution Advisories for CVE-2017-1000364 Advisory updates
Oracle Linux 6 ELSA-2017-1486 Oracle has fixed this problem in ELSA-2017-1723.
Oracle Linux 7 ELSA-2017-1484 Oracle has also added the fix for Oracle Linux 7 in ELBA-2017-1674.
RHEL 6 RHSA-2017-1486 RedHat has fixed this problem for RHEL 6, marked this as outdated and superseded by RHSA-2017-1723.
RHEL 7 RHSA-2017-1484 RedHat has fixed this problem for RHEL 7 and has marked this patch as outdated and superseded by RHBA-2017-1674.
SUSE CVE-2017-1000364 SUSE has also fixed this problem and the patch names are included in this same advisory.
Ubuntu 14.04 USN-3335-1 Fixed in USN-3338-2.

Workaround

If you cannot apply the kernel update, you can set the Java thread stack size to -Xss1280k for the affected services using the appropriate Java configuration option or the environment advanced configuration snippet, as detailed below.

For role instances that have specific Java configuration options properties:

  1. Log in to Cloudera Manager Admin Console.
  2. Select Clusters > Impala, and then click the Configuration tab.
  3. Type java in the search field to display Java related configuration parameters. The Java Configuration Options for Catalog Server property field displays. Type -Xss1280k in the entry field, adding to any existing settings.
  4. Click Save Changes.
  5. Navigate to the HDFS service by selecting Clusters > HDFS.
  6. Click the Configuration tab.
  7. Click the Scope filter DataNode. The Java Configuration Options for DataNode field displays among the properties listed. Enter -Xss1280k into the field, adding to any existing properties.
  8. Click Save Changes.
  9. Select the Scope filter NFS Gateway. The Java Configuration Options for NFS Gateway field displays among the properties listed. Enter -Xss1280k into the field, adding to any existing properties.
  10. Click Save Changes.
  11. Restart the affected roles (or configure the safety valves in next section and restart when finished with all configurations).

For role instances that do not have specific Java configuration options:

  1. Log in to Cloudera Manager Admin Console.
  2. Select Clusters > Impala, and then click the Configuration tab.
  3. Click the Scope filter Impala Daemon and Category filter Advanced.
  4. Type impala daemon environment in the search field to find the safety valve entry field.
  5. In the Impala Daemon Environment Advanced Configuration Snippet (Safety Valve), enter:
    JAVA_TOOL_OPTIONS=-Xss1280K
  6. Click Save Changes.
  7. Click the Scope filter Impala StateStore and Category filter Advanced.
  8. In the Impala StateStore Environment Advanced Configuration Snippet (Safety Valve), enter:
    JAVA_TOOL_OPTIONS=-Xss1280K
  9. Click Save Changes.
  10. Restart the affected roles.

The table below summarizes the parameters that can be set for the affected services:

Service Settable Java Configuration Option
HDFS DataNode Java Configuration Options for DataNode
HDFS NFS Gateway Java Configuration Options for NFS Gateway
Impala Catalog Server Java Configuration Options for Catalog Server
Impala Daemon Impala Daemon Environment Advanced Configuration Snippet (Safety Valve)
JAVA_TOOL_OPTIONS=-Xss1280K
Impala StateStore Impala StateStore Environment Advanced Configuration Snippet (Safety Valve)
JAVA_TOOL_OPTIONS=-Xss1280K

Some OS versions reject weak ciphers for TLS

For some OS configurations, the curl command cannot connect to certain CDH services that run on Apache Tomcat when the cluster has been configured for TLS/SSL. Specifically, HttpFS, KMS, Oozie, and Solr services reject connection attempts because the default cipher configuration uses weak temporary server keys (based on Diffie-Hellman key exchange protocol).

Affected OS/version: Linux distributions that use network security services (nss) library version 3.28.4-3.el6_9, including Oracle Linux 6.x, RHEL 6.x, CentOS 6.x. These conflict with Tomcat 6.0.48, which is included with CDH 5.11. The issue has been resolved in CDH 5.11.1.

Workaround: Override the default cipher configuration using all the ciphers from the list below. Only commas separate each cipher, and there are no line endings or other characters in this list.

  • For clusters not managed by Cloudera Manager, enter this list in the appropriate configuration file for the service (see the "Not managed" column in the table below). Restart the service.
    TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_3DES_EDE_CBC_SHA
    
  • For clusters managed by Cloudera Manager, follow the steps below.
  Cluster managed by Cloudera Manager Not managed
Service Configuration property Environment variable Configuration file
HttpFS HttpFS Environment Advanced Configuration Snippet (Safety Valve) HTTPFS_SSL_CIPHERS /etc/hadoop-httpfs/conf/httpfs-env.sh
KMS Key Management Server Environment Advanced Configuration Snippet (Safety Valve) KMS_SSL_CIPHERS /etc/hadoop-kms/conf/kms-env.sh
Oozie Oozie Server Environment Advanced Configuration Snippet OOZIE_HTTPS_CIPHERS /etc/oozie/conf/oozie-env.sh
Solr Solr Server Environment Advanced Configuration Snippet SOLR_CIPHERS_CONFIG /etc/default/solr

To override the ciphers on Cloudera Manager clusters:

  1. Log into Cloudera Manager Admin Console.
  2. Search for the advanced configuration snippet name ("Configuration property") for each respective service role.
  3. Enter the appropriate environment variable in the entry field.
  4. Copy the list from here:
    TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_3DES_EDE_CBC_SHA
    
  5. Paste as the setting for the environment variable. For example, overriding HttpFS is as follows:



  6. Restart the service.

SUSE Linux Enterprise Server 12 (SLES12) and Kerberos Credential Cache Issue

Cloudera has become aware of a conflict in the handling of the Kerberos credential cache in SLES12 that can result in the following error:
Credential cache directory /run/user/uid does not exist

The issue is caused by a change in SLES12 systemd and the location of Kerberos credentials cache for user accounts. This issue may emerge when password-less accounts, such as impala or flume, for example, are executed using sudo -u user or su - user and the system attempts to retrieve the credential from the cache.

Workaround:

To work around this issue, add the following line to your system's Kerberos configuration file (/etc/krb5.conf), in the [libdefaults] section:
[libdefaults]
default_ccache_name = FILE:/tmp/krb5cc_%{uid}
...

For clusters managed by Cloudera Manager Server:

  1. Log in to Cloudera Manager Admin Console.
  2. Click the Administration tab and select Settings from the drop-down menu.
  3. Select Kerberos for the Category filter.
  4. Type the word "safety" in the Search field to find the Advanced Configuration Snippet (Safety Valve) settings for Kerberos.
  5. Enter the name=value pair for the property as shown here:

  6. Click Save Changes.

Random JVM Hangs on RHEL 6.6 (Kernels 3.14-3.17)

User Java processes—HBase RegionServers, HDFS DataNodes—can deadlock and hang for no apparent reason.

Impact: A Java application or daemon hangs. A jstack, or attaching a debugger, can unblock the process.

Workaround:

Upgrade the operating system to one with kernel version 3.18 or later; for example, RHEL 6.6z and later. For information on checking your RHEL kernel version, see http://www.cyberciti.biz/faq/centsos-redhat-rhel-6-kernel-version/. In RHEL, the issue has been fixed recently in kernel-2.6.32-504.16.2.el6 update (April 21, https://rhn.redhat.com/errata/RHSA-2015-0864.html). The following example shows how to check for the fix:

% rpm -qp --changelog kernel-2.6.32-504.16.2.el6.x86_64.rpm | grep 'Ensure get_futex_key_refs() always implies a barrier'

Other distributions running kernel versions 3.14-3.17 may be affected.

For more information, see https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64

Leap-Second Events

Impact: After a leap-second event, Java applications (including CDH services) using older Java and Linux kernel versions, may consume almost 100% CPU. See https://access.redhat.com/articles/15145.

Leap-second events are tied to the time synchronization methods of the Linux kernel, the Linux distribution and version, and the Java version used by applications running on affected kernels.

Although Java is increasingly agnostic to system clock progression (and less susceptible to a kernel's mishandling of a leap-second event), using JDK 7 or 8 should prevent issues at the CDH level (for CDH components that use the Java Virtual Machine).

Immediate action required:

(1) Ensure that the kernel is up to date.

  • RHEL5/6/7, CentOS 5/6/7 - 2.6.32-298 or higher

  • Ubuntu Trusty 14.04 LTS - No action required

  • Ubuntu Precise 12.04 LTS - 3.2.0-29.46 or higher

  • Debian Jessie (8.x)- No action required

  • Debian Wheezy (7.x) - 3.2.29-1 or higher

  • Oracle Enterprise Linux (OEL) - Kernels built in 2013 or later

  • SLES11 (SP2, SP3 and SP4)

    • SP2 - 3.0.38-0.5 or higher

    • SP3 and SP4 - No action required

  • SLES12 - No action required.

(2) Ensure that your Java JDKs are current (especially if the kernel is not up to date and cannot be upgraded).
  • Java 7 - At least jdk-1.7.0_65

  • Java 8 - No action required.

(3) Ensure that your systems use either NTP or PTP synchronization.

For systems not using time synchronization, update both the OS tzdata and Java tzdata packages to the tzdata-2016g version, at a minimum. For OS tzdata package updates, contact OS support or check updated OS repositories. For Java tzdata package updates, see Oracle's Timezone Updater Tool.