Issues Fixed in CDH 5 Beta Releases
Issues Fixed in CDH 5 Beta 2
ResourceManager High Availability does not work on secure clusters
If JobTrackers in an High Availability configuration are shut down, migrated to new hosts, then restarted, no JobTracker becomes active. The logs show a Mismatched address exception.
Default port conflicts
By default, the Shuffle Handler (which runs inside the YARN NodeManager), the REST server, and many third-party applications, all use port 8080. This will result in conflicts if you deploy more than one of them without reconfiguring the default port.
Workaround: Make sure at most one service uses port 8080. To reconfigure the REST server, follow these instructions. To change the default port for the Shuffle Handler, set the value of mapreduce.shuffle.port in mapred-site.xml to an unused port.
JobTracker memory leak
The JobTracker has a memory leak caused by subtleties in the way UserGroupInformation interacts with the file-system cache. The number of cached file system objects can grow without bound.
Workaround: Set keep.failed.task.files to true, which will sidestep the memory leak but require job staging directories to be cleaned out manually.
Running a Hive Beeswax metastore on the same host as the Hue server will result in Simple Authentication and Security Layer (SASL) authentication failures on a Kerberos-enabled cluster
Workaround: The simple workaround is to run the metastore server remotely on a different host and make sure all Hive and Hue configurations properly refer to it. A more complex workaround is to adjust network configurations to ensure that reverse DNS properly resolves the host's address to its fully qualified-domain name (FQDN) rather than localhost.
The Pig shell does not work when NameNode uses a wildcard address
The Pig shell does not work from Hue if you use a wildcard for the NameNode's RPC or HTTP bind address. For example, dfs.namenode.http-address must be a real, routable address and port, not 0.0.0.0.<port>.
Workaround: Use a real, routable address and port, not 0.0.0.0.<port>, for the NameNode; or use the Pig application directly, rather than from Hue.
Oozie and Sqoop 2 may need additional configuration to work with YARN
In CDH 5, MRv2 (YARN) MapReduce 2.0 is recommended over the Hadoop 0.20-based MRv1. The default configuration may not reflect this in Oozie and Sqoop 2 in CDH 5 Beta 2, however, unless you are using Cloudera Manager.
Workaround: Check the value of CATALINA_BASE in /etc/oozie/conf/oozie-env.sh (if you are running an Oozie server) and /etc/default/sqoop2-server (if you are using a Sqoop 2 server). You should also ensure that CATALINA_BASE is correctly set in your environment if you are invoking /usr/bin/sqoop2-server directly instead of using the service init scripts. For Oozie, CATALINA_BASE should be set to /usr/lib/oozie/oozie-server for YARN, or /usr/lib/oozie/oozie-server-0.20 for MRv1. For Sqoop 2, CATALINA_BASE should be set to /usr/lib/sqoop2/sqoop-server for YARN, or /usr/lib/sqoop2/sqoop-server-0.20 on MRv1.
Apache Sentry (incubating)
Sentry allows unauthorized access to a directory whose name includes the scratch directory name as a prefix
As an example, if the scratch directory path is /tmp/hive, and you create a directory /tmp/hive-data, Sentry allows unauthorized read/write access to /tmp/hive-data.
Workaround: For external tables or data export location, do not use a pathname that includes the scratch directory name as a prefix. For example, if the scratch directory is /tmp/hive, do not locate external tables or exported data in /tmp/hive-data or any directory whose path uses "/tmp/hive-" as a prefix.
Oozie Hive action against HiveServer2 fails on a secure cluster