This is the documentation for CDH 5.0.x. Documentation for other versions is available at Cloudera Documentation.

Apache Hadoop Incompatible Changes

HDFS

The following incompatible changes have been introduced in CDH 5:

  • The getSnapshottableDirListing() method returns null when there are no snapshottable directories. This is a change from CDH 5 Beta 2 where the method returns an empty array instead.
  • HDFS-5138 - The -finalize NameNode startup option has been removed. To finalize an in-progress upgrade, you should instead use the hdfs dfsadmin -finalizeUpgrade command while your NameNode is running, or while both NameNodes are running in a High Availability setup.
  • HDFS-2832 - The HDFS internal layout version has changed between CDH 5 Beta 1 and CDH 5 Beta 2, so a file system upgrade is required to move an existing Beta 1 cluster to Beta 2.
  • HDFS-4997 - libhdfs functions now return correct error codes in errno in case of an error, instead of always returning 255.
  • HDFS-4451: HDFS balancer command returns exit code 0 on success instead of 1.
  • HDFS-4659: Support setting execution bit for regular files.
    • Impact: In CDH 5, files copied out of copyToLocal may now have the executable bit set if it was set when they were created or copied into HDFS.
  • HDFS-4594: WebHDFS open sets Content-Length header to what is specified by length parameter rather than how much data is actually returned.
    • Impact: In CDH 5, Content-Length header will contain the number of bytes actually returned, rather than the request length.
  • HADOOP-10020: Disable symlinks temporarily.
  • Files named .snapshot or .reserved must not exist within HDFS.

Change in High-Availability Support

In CDH 5, the only high-availability (HA) implementation is Quorum-based storage; shared storage using NFS is no longer supported.

MapReduce

  Important: There is no separate tarball for MRv1. Instead, the MRv1 binaries, examples, etc., are delivered in the Hadoop tarball itself. The scripts for running MRv1 are in the bin-mapreduce1 directory in the tarball, and the MRv1 examples are in the examples-mapreduce1 directory. You need to do some additional configuration; follow the directions below.
To use MRv1 from a tarball installation, proceed as follows:
  1. Extract the files from the tarball.
      Note: In the steps that follow, install_dir is the name of the directory into which you extracted the files.
  2. Create a symbolic link as follows:
    ln -s install_dir/bin-mapreduce1 install_dir/share/hadoop/mapreduce1/bin
  3. Create a second symbolic link as follows:
    ln -s install_dir/etc/hadoop-mapreduce1 install_dir/share/hadoop/mapreduce1/conf
  4. Set the HADOOP_HOME and HADOOP_CONF_DIR environment variables in your execution environment as follows:
    $ export HADOOP_HOME=install_dir/share/hadoop/mapreduce1
    $ export HADOOP_CONF_DIR=$HADOOP_HOME/conf 
  5. Copy your existing start-dfs.sh and stop-dfs.sh scripts to install_dir/bin-mapreduce1
  6. For convenience, add install_dir/bin to the PATH variable in your execution environment .

Apache MapReduce 2.0 (YARN) Incompatible Changes

The following incompatible changes occurred for Apache MapReduce 2.0 (YARN) between CDH 4.x and CDH 5 Beta 2:
  • The CATALINA_BASE variable no longer determines whether a component is configured for YARN or MRv1. Use the alternatives command instead, and make sure CATALINA_BASE is not set; see the Oozie and Sqoop2 configuration sections for instructions.
  • YARN-1288 - YARN Fair Scheduler ACL change. Root queue defaults to everybody, and other queues default to nobody.
  • YARN High Availability configurations have changed. Configuration keys have been renamed among other changes.
  • The YARN_HOME property has been changed to HADOOP_YARN_HOME.
  • Note the following changes to configuration properties in yarn-site.xml:
    • The value of yarn.nodemanager.aux-services should be changed from mapreduce.shuffle to mapreduce_shuffle.
    • yarn.nodemanager.aux-services.mapreduce.shuffle.class has been renamed to yarn.nodemanager.aux-services.mapreduce_shuffle.class
    • yarn.resourcemanager.resourcemanager.connect.max.wait.secs has been renamed to yarn.resourcemanager.connect.max-wait.secs
    • yarn.resourcemanager.resourcemanager.connect.retry_interval.secs has been renamed to yarn.resourcemanager.connect.retry-interval.secs
    • yarn.resourcemanager.am. max-retries is renamed to yarn.resourcemanager.am.max-attempts
    • The YARN_HOME environment variable used in the yarn.application.classpathhas been renamed to HADOOP_YARN_HOME. Make sure you include $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* in the classpath. For more information, see Step 2: Configure YARN daemons in the instructions for deploying CDH with YARN in the CDH 5 Installation Guide.
  • A CDH 4 client cannot be used against a CDH 5 cluster and vice-versa. Note that YARN in CDH 4 is experimental, and suffers from the following major incompatibilities.
    • Almost all of the proto files have been renamed.
    • Several user-facing APIs have been modified as part of an API stabilization effort.
Page generated September 3, 2015.