This is the documentation for Cloudera 5.2.x.
Documentation for other versions is available at Cloudera Documentation.

Apache Hadoop Known Issues

— Deprecated Properties

In Hadoop 2.0.0 and later, a number of Hadoop and HDFS properties have been deprecated. (The change dates from Hadoop 0.23.1, on which the Beta releases of CDH 4 were based). A list of deprecated properties and their replacements can be found at http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-common/DeprecatedProperties.html.

HDFS

HDFS Encryption Known Issues

  Important: The HDFS Data at Rest Encryption feature included in CDH 5.2.0 has several known limitations. Therefore, Cloudera does not currently support this feature and it is not recommended for production use. If you're interested in trying the feature out in a test environment, contact your account team.

— DistCp between unencrypted and encrypted locations fails

By default, DistCp compares checksums provided by the filesystem to verify that data was successfully copied to the destination. However, when copying between unencrypted and encrypted locations, the filesystem checksums will not match since the underlying block data is different.

Severity: Low

Workaround: Specify the -skipcrccheck and -update distcp flags to avoid verifying checksums.

— NameNode - KMS communication fails after long periods of inactivity

Encrypted files and encryption zones cannot be created if a long period of time (by default, 20 hours) has passed since the last time the KMS and NameNode communicated.

Bug: HADOOP-11187

Severity: Low

Workaround: There are two possible workarounds to this issue:
  • You can increase the KMS authentication token validity period to a very high number. Since the default value is 10 hours, this bug will only be encountered after 20 hours of no communication between the NameNode and the KMS. Add the following property to the kms-site.xml Safety Valve:
    <property> 
    <name>hadoop.kms.authentication.token.validity</name> 
    <value>SOME VERY HIGH NUMBER</value> 
    </property>
  • You can switch the KMS signature secret provider to the string secret provider by adding the following property to the kms-site.xml Safety Valve:
    <property> 
    <name>hadoop.kms.authentication.signature.secret</name> 
    <value>SOME VERY SECRET STRING</value> 
    </property> 

— Spark fails when the KMS is configured to use SSL

Severity: Medium

Workaround: None. Disable SSL support for the KMS to proceed.

— Files inside encryption zones cannot be read in Hue

Hue uses either WebHDFS or HttpFS to access files. Both are proxy user clients of KMS and the KMS client library does not currently handle proxy users correctly.

Bug: HADOOP-11176

Severity: High

Workaround: None

— Cannot move encrypted files to trash

With HDFS encryption enabled, you cannot move encrypted files or directories to the trash directory.

Bug: HDFS-6767

Severity: Low

Workaround: To remove encrypted files/directories, use the following command with the -skipTrash flag specified to bypass trash.
rm -r -skipTrash /testdir

— If you install CDH using packages, HDFS NFS gateway works out of the box only on RHEL-compatible systems

Because of a bug in native versions of portmap/rpcbind, the HDFS NFS gateway does not work out of the box on SLES, Ubuntu, or Debian systems if you install CDH from the command-line, using packages. It does work on supported versions of RHEL- compatible systems on which rpcbind-0.2.0-10.el6 or later is installed, and it does work if you use Cloudera Manager to install CDH, or if you start the gateway as root.

Bug: 731542 (Red Hat), 823364 (SLES), 594880 (Debian)

Severity: High

Workarounds and caveats:
  • On Red Hat and similar systems, make sure rpcbind-0.2.0-10.el6 or later is installed.
  • On SLES, Debian, and Ubuntu systems, do one of the following:
    • Install CDH using Cloudera Manager; or
    • As of CDH 5.1, start the NFS gateway as root; or
    • Start the NFS gateway without using packages; or
    • You can use the gateway by running rpcbind in insecure mode, using the -i option, but keep in mind that this allows anyone from a remote host to bind to the portmap.

— Upgrade Requires an HDFS Upgrade

Upgrading from any release earlier than CDH 5.2.0 to CDH 5.2.0 or later requires an HDFS Upgrade. See Upgrading Unmanaged CDH Using the Command Line for further information.

— No error when changing permission to 777 on .snapshot directory

Snapshots are read-only; running chmod 777 on the .snapshots directory does not change this, but does not produce an error (though other illegal operations do).

Bug: HDFS-4981

Severity: Low

Workaround: None

— Snapshot operations are not supported by ViewFileSystem

Bug: None

Severity: Low

Workaround: None

— Snapshots do not retain directories' quotas settings

Bug: HDFS-4897

Severity: Medium

Workaround: None

— NameNode cannot use wildcard address in a secure cluster

In a secure cluster, you cannot use a wildcard for the NameNode's RPC or HTTP bind address. For example, dfs.namenode.http-address must be a real, routable address and port, not 0.0.0.0.<port>. This should affect you only if you are running a secure cluster and your NameNode needs to bind to multiple local addresses.

Bug: HDFS-4448

Severity: Medium

Workaround: None

— Permissions for dfs.namenode.name.dir incorrectly set.

Hadoop daemons should set permissions for the dfs.namenode.name.dir (or dfs.name.dir) directories to drwx------ (700), but in fact these permissions are set to the file-system default, usually drwxr-xr-x (755).

Bug: HDFS-2470

Severity: Low

Workaround: Use chmod to set permissions to 700. See Configuring Local Storage Directories for Use by HDFS for more information and instructions.

— hadoop fsck -move does not work in a cluster with host-based Kerberos

Bug: None

Severity: Low

Workaround: Use hadoop fsck -delete

— HttpFS cannot get delegation token without prior authenticated request.

A request to obtain a delegation token cannot initiate an SPNEGO authentication sequence; it must be accompanied by an authentication cookie from a prior SPNEGO authentication sequence.

Bug: HDFS-3988

Severity: Low

Workaround: Make another WebHDFS request (such as GETHOMEDIR) to initiate an SPNEGO authentication sequence and then make the delegation token request.

— DistCp does not work between a secure cluster and an insecure cluster in some cases

See the upstream bug reports for details.

Bugs: HDFS-7037, HADOOP-10016, HADOOP-8828

Severity: High

Workaround: None

— Using DistCp with Hftp on a secure cluster using SPNEGO requires that the dfs.https.port property be configured

In order to DistCp using Hftp from a secure cluster using SPNEGO, you must configure the dfs.https.port property on the client to use the HTTP port (50070 by default).

Bug: HDFS-3983

Severity: Low

Workaround: Configure dfs.https.port to use the HTTP port on the client

Non-HA DFS Clients do not attempt reconnects

This problem means that streams cannot survive a NameNode restart or network interruption that lasts longer than the time it takes to write a block.

Bug: HDFS-4389

Offline Image Viewer (OIV) tool regression: missing Delimited outputs.

Bugs: HDFS-6673, HDFS-5952

Severity: Medium

Workaround: Set up dfs.namenode.legacy-oiv-image.dir to an appropriate directory on the secondary NameNode (or standby NameNode in an HA configuration), and use hdfs oiv_legacy to process the legacy format of the OIV fsimage.

MapReduce

— Starting an unmanaged ApplicationMaster may fail

Starting a custom Unmanaged ApplicationMaster may fail due to a race in getting the necessary tokens.

Bug: YARN-1577

Severity: Low

Workaround: Try to get the tokens again; the custom unmanaged ApplicationMaster should be able to fetch the necessary tokens and start successfully.

— Job movement between queues does not persist across ResourceManager restart

CDH 5 adds the capability to move a submitted application to a different scheduler queue. This queue placement is not persisted across ResourceManager restart or failover, which resumes the application in the original queue.

Bug: YARN-1558

Severity: Medium

Workaround: After ResourceManager restart, re-issue previously issued move requests.

— No JobTracker becomes active if both JobTrackers are migrated to other hosts

If JobTrackers in an High Availability configuration are shut down, migrated to new hosts, then restarted, no JobTracker becomes active. The logs show a Mismatched address exception.

Bug: None

Severity: Low

Workaround: After shutting down the JobTrackers on the original hosts, and before starting them on the new hosts, delete the ZooKeeper state using the following command:
$ zkCli.sh rmr /hadoop-ha/<logical name>

— Hadoop Pipes may not be usable in an MRv1 Hadoop installation done through tarballs

Under MRv1, MapReduce's C++ interface, Hadoop Pipes, may not be usable with a Hadoop installation done through tarballs unless you build the C++ code on the operating system you are using.

Bug: None

Severity: Medium

Workaround: Build the C++ code on the operating system you are using. The C++ code is present under src/c++ in the tarball.

— Task-completed percentage may be reported as slightly under 100% in the web UI, even when all of a job's tasks have successfully completed.

Bug: None

Severity: Low

Workaround: None

— Spurious warning in MRv1 jobs

The mapreduce.client.genericoptionsparser.used property is not correctly checked by JobClient and this leads to a spurious warning.

Bug: None

Severity: Low

Workaround: MapReduce jobs using GenericOptionsParser or implementing Tool can remove the warning by setting this property to true.

— Oozie workflows will not be recovered in the event of a JobTracker failover on a secure cluster

Delegation tokens created by clients (via JobClient#getDelegationToken()) do not persist when the JobTracker fails over. This limitation means that Oozie workflows will not be recovered successfully in the event of a failover on a secure cluster.

Bug: None

Severity: Medium

Workaround: Re-submit the workflow.

— Encrypted shuffle in MRv2 does not work if used with LinuxContainerExecutor and encrypted web UIs.

In MRv2, if the LinuxContainerExecutor is used (usually as part of Kerberos security), and hadoop.ssl.enabled is set to true (See Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport), then the encrypted shuffle does not work and the submitted job fails.

Bug: MAPREDUCE-4669

Severity: Medium

Workaround: Use encrypted shuffle with Kerberos security without encrypted web UIs, or use encrypted shuffle with encrypted web UIs without Kerberos security.

— Link from ResourceManager to Application Master does not work when the Web UI over HTTPS feature is enabled.

In MRv2 (YARN), if hadoop.ssl.enabled is set to true (use HTTPS for web UIs), then the link from the ResourceManager to the running MapReduce Application Master fails with an HTTP Error 500 because of a PKIX exception.

A job can still be run successfully, and, when it finishes, the link to the job history does work.

Bug: YARN-113

Severity: Low

Workaround: Don't use encrypted web UIs.

— Hadoop client JARs don't provide all the classes needed for clean compilation of client code

The compile does succeed, but you may see warnings as in the following example:
 $ javac -cp '/usr/lib/hadoop/client/*' -d wordcount_classes WordCount.java
org/apache/hadoop/fs/Path.class(org/apache/hadoop/fs:Path.class): warning: Cannot find annotation method 'value()' 
in type 'org.apache.hadoop.classification.InterfaceAudience.LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found
1 warning 
  Note: This means that the example at the bottom of the page on managing Hadoop API dependencies (see "Using the CDH 4 Maven Repository" under CDH Version and Packaging Information will produce a similar warning.

Bug:

Severity: Low

Workaround: None

— The ulimits setting in /etc/security/limits.conf is applied to the wrong user if security is enabled.

Bug: https://issues.apache.org/jira/browse/DAEMON-192

Severity: Low

Anticipated Resolution: None

Workaround: To increase the ulimits applied to DataNodes, you must change the ulimit settings for the root user, not the hdfs user.

—Must set yarn.resourcemanager.scheduler.address to routable host:port when submitting a job from the ResourceManager

When you submit a job from the ResourceManager, yarn.resourcemanager.scheduler.address must be set to a real, routable address, not the wildcard 0.0.0.0.

Bug: None

Severity: Low

Workaround: Set the address, in the form host:port, either in the client-side configuration, or on the command line when you submit the job.

—Amazon S3 copy may time out

The Amazon S3 filesystem does not support renaming files, and performs a copy operation instead. If the file to be moved is very large, the operation can time out because S3 does not report progress to the TaskTracker during the operation.

Bug: MAPREDUCE-972

Severity: Low

Workaround: Use -Dmapred.task.timeout=15000000 to increase the MR task timeout.