Known Issues and Workarounds in Cloudera Manager 4
The following sections describe the current known issues and fixed issues in each Cloudera Manager release.
Known Issues in the Current Release
The following sections describe the current known issues and fixed issues in each Cloudera Manager 4 release. For known issues and workarounds for Cloudera Backup and Disaster Recovery, see Known Issues for Cloudera Backup and Disaster Recovery.
Cloudera Manager reports a confusing version number if you have oozie-client, but not oozie installed on a CDH4.4 node
In CDH versions before 4.4, the metadata identifying Oozie was placed in the client, rather than the server package. Consequently, if the client package is not installed, but the server is, Cloudera Manager will report Oozie has been present but as coming from CDH 3 instead of CDH 4.
Workaround: Either install the oozie-client package, or upgrade to at least CDH 4.4. Parcel based installations are unaffected.
On CDH 4.1 secure clusters managed by Cloudera Manager 4.8.1 and higher, the Impala Catalog server needs safety valve update
Impala queries fail on CDH 4.1 when Hive "Bypass Hive Metastore Server" option is selected.
Workaround: Add the following to Impala catalog server safety valve for hive-site.xml, replacing <Hive_Metastore_Server_Host> with the host name of your Hive Metastore Server:
<property> <name>hive.metastore.local</name> <value>false</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://<Hive_Metastore_Server_Host>:9083</value> </property>
"Cancel Query" does not work for Impala if SSL is configured.
When Impala 1.2 is configured to use SSL, the "Cancel Query" function in the Impala Query monitor (accessed from the Activities menu) does not work.
Impala 1.1 is not supported in Cloudera Manager 4.8
Cloudera Manager 4.8.0 supports Impala 1.2 or later, and does not support Impala 1.1.1 or earlier. This is due to the introduction of the Impala Catalog Server with Impala 1.2, which is incompatible with Impala 1.1. If you upgrade Cloudera Manager to 4.8, you must also upgrade your Impala to version 1.2. See Upgrading Impala for instructions. If you do not upgrade, you will get a validation warning after the installation, and if you try to start the Impala service.
Workaround: Upgrade Impala to version 1.2.
Error reading .zip file created with the Collect Diagnostic Data command.
After collecting Diagnostic Data and using the Download Diagnostic Data button to download the created zip file to the local system, the zip file cannot be opened using the FireFox browser on a Macintosh. This is because the zip file is created as a Zip64 file, and the unzip utility included with Macs does not support Zip64. The zip utility must be version 6.0 or later. You can determine the zip version with unzip -v.
Workaround: Update the unzip utility to a version that supports Zip64.
After an upgrade from Cloudera Manager 4.6.3, Impala does not start.
After an upgrade from Cloudera Manager 4.6.3 to 4.7 or 4.8 when Navigator is used, Impala will fail to start because the Audit Log Directory property has not been set by the upgrade procedure.
Workaround: Manually set the property to /var/log/impalad/audit. See the Service Auditing Properties section of the Cloudera Navigator Installation and User Guide for more information.
Enabling wildcarding in a secure environment causes NameNode to fail to start.
In a secure cluster, you cannot use a wildcard for the NameNode's RPC or HTTP bind address, or the NameNode will fail to start. For example, dfs.namenode.http-address must be a real, routable address and port, not 0.0.0.0.<port>. In Cloudera Manager, the "Bind NameNode to Wildcard Address" property must not be enabled. This should affect you only if you are running a secure cluster and your NameNode needs to bind to multiple local addresses.
Workaround: Disable the "Bind NameNode to Wildcard Address" property found under the Configuration tab for the NameNode role group.
Upgrade from 4.6.0 with HA enabled may cause HDFS restarts/failovers to fail.
When upgrading from an installation of Cloudera Manager 4.6.0 to 4.6.1 or later with HDFS High Availability enabled, you must set the value of the NameNode Service RPC port (dfs.namenode.servicerpc-address) to 8022, or else HDFS failover or restart will fail. Restart the HDFS service after you have changed the property value.
- Go to the HDFS service, tab, .
- Type servicerpc in the search field to find the property.
- Change its value to 8022 and Save Changes.
- Restart the HDFS service.
Federation setup workflow may result in failure of NameNode format step.
When adding a Nameservice using the "Add Nameservice" workflow (and not choosing the "Enable NFS High Availability" option) to an HDFS service that has a Nameservice configured to use JournalNodes, the NameNode formatting step will fail because the new Nameservice is incorrectly configured to use the same journal name as the existing Nameservice.
Workaround: Configure the new NameNodes (via the safety valve) with a QuorumJournal URL that has a different journal name from the original Nameservice, and then manually perform the rest of the steps in the "Add Nameservice" workflow.
Impala log file is not rolling over per the max log size setting.
Impala logging uses two loggers -- GLog and log4j -- to perform logging into a single log file. GLog correctly rolls its logging to a new file per the Impala Daemon Max Log Size property, but log4j ignores that setting and continues to log into the original log file.
After JobTracker failover, complete jobs from the previous active JobTracker are not visible.
When a JobTracker failover occurs and a new JobTracker becomes active, the new JobTracker UI does not show the completed jobs from the previously active JobTracker (that is now the standby JobTracker). For these jobs the "Job Details" link does not work.
After JobTracker failover, information about rerun jobs is not updated in Activity Monitor.
When a JobTracker failover occurs while there are running jobs, jobs are restarted by the new active JobTracker by default. For the restarted jobs the Activity Monitor will not update the following: 1) The start time of the restarted job will remain the start time of the original job. 2) Any Map or Reduce task that had finished before the failure happened will not be updated with information about the corresponding task that was rerun by the new active JobTracker.
Impala Query Monitor shows queries as running even when they have finished.
Due to a problem in Hue, queries issued from the Impala query application in Hue will appear as running in Cloudera Manager's Impala Query Monitor and as Active in the Impala Daemon web UI even after they have finished and have been marked "expired" in Hue.
Secure bulk loading in HBase fails after upgrade if no coprocessors are configured.
In order to perform an upgrade of HBase to CDH 4.3 in a secure cluster, the org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint entry is required in the HBase Coprocessor Region Classes configuration property. By default Cloudera Manager leaves this property empty. If you do not configure this property, secure bulk loading jobs will fail after the upgrade to CDH 4.3.
Workaround: Add org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint to the HBase Coprocessor Region Classes property in every RegionServer Role Group. Then re-deploy the client configuration before upgrading.
WebHCat role logs cannot be written in CDH 4.2.
When using WebHCat with default configuration on CDH4.2, role logs cannot be written due to permission error on /var/log/hcatalog because it is owned by user hcatalog, not user hive.
Resolution: Fixed in CDH4.3, where /var/log/hcatalog is owned by the hive user by default.
Workaround: chown /var/log/hcatalog to the process user used by Hive Service, which is hive by default. Alternatively, change the webhcat log directory.
Errors when using HiveServer2 with CDH 4.1.
Cloudera Manager 4.5 or later supports HiveServer2 with CDH 4.2 only. While it is possible to add the HiveServer2 role in Cloudera Manager when using CDH4.1, you may experience errors such as missing log files or other problems.
Anticipated Resolution: Fixed in CDH 4.2. No plans to fix for CDH 4.1.
Workaround: Upgrade to CDH4.2 or later, or run HiveServer2 outside of Cloudera Manager.
Impala Queries fail when "Bypass Hive Metastore Server" option is selected.
Impala queries fail on CDH4.1 when Hive "Bypass Hive Metastore Server" option is selected. You can work around this by using the Impala Safety Valve for hive-site.xml, replacing <hive_metastore_server_host> with the name of your Hive metastore server host.
Anticipated Resolution: Fixed in CDH4.2. No plans to fix for CDH4.1.
Workaround: See the detailed instructions for the safety valve configuration in Installing Impala with Cloudera Manager.
Hive Table Stats configuration recommended for optimal performance.
Configuring Hive Table Stats is highly recommended when using Impala. It allows Impala to make optimizations that can result in significant (over 10x) performance improvements for some joins. If these are not available, Impala will still function, but at lower performance.
Workaround: See Installing Impala with Cloudera Manager in the Cloudera Manager Installation Guide for information on configuring Hive Table Stats.
Health Check for Navigator and Reports appears in the API results even if those roles are not configured.
The Cloudera Manager Navigator health check appears as "Not Available" in the Cloudera Manager API health results for the MGMT service, even if no Navigator role is configured. The same is true of the Reports Manager role. This can occur if you are running the Cloudera Standard version of Cloudera Manager. This can be safely ignored and may be removed in a future release.
Upgrading a secure CDH3 cluster to CDH4 fails due to missing HTTP principal in NameNode's keytab.
If you have set up a secure CDH3 cluster using a Cloudera Manager version before 4.5, upgrading the cluster to CDH4 will fail because the NameNode's hdfs.keytab file does not contain the HTTP principal that is required in CDH4 HDFS.
If using a custom keytab generating script with Cloudera Manager, the script should be modified to include the HTTP principal for CDH3 NameNodes to enable an upgrade to CDH4.
Severity: High if you used a pre-4.5 CM to set up a secure CDH3 cluster and want to upgrade it to CDH4. Otherwise N/A.
- Upgrade to Cloudera Manager 4.5 or later.
- From the Administration menu, select Kerberos.
- Select the NameNode's credentials and press the Regenerate button. This will cause the HTTP principal to be included in the NameNode's hdfs.keytab.
Note that if you set up a secure CDH3 cluster using Cloudera Manager 4.5, this workaround is not necessary and the bug does not manifest.
Java 6 GC bug leads to a memory leak.
Java 6 has a bug with finalizers that leads to a memory leak when using -XX:+ConcMarkSweepGC. This bug is fixed in Java6u32. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034. To work around this JVM bug, Cloudera Manager configures processes with both -XX:+ConcMarkSweepGC and -XX:-CMSConcurrentMTEnabled. This workaround has a slight performance penalty.
Workaround: As described above. If you have a JVM that does not exhibit this bug, you can remove -XX:-CMSConcurrentMTEnabled by configuring the JVM arguments for your services.
After upgrade from 4.5 beta to 4.5 GA, Hive service lacks a mandatory dependency on a MapReduce service.
After upgrading from 4.5 beta to 4.5 GA, the Hive service lacks a mandatory dependency on a MapReduce service.
Workaround: Navigate to the Hive service's configuration page (and dismiss any error popups in the meantime), and set the "MapReduce Service" configuration.
After upgrade to Cloudera Manager 4.5, the new Hive Metastore Server fails to start if Hue/Beeswax uses a Derby metastore.
When upgrading to Cloudera Manager 4.5, it creates a new Hive service to capture the Hive dependency of an existing Hue service. If Hue/Beeswax uses a Derby metastore, Hue will keep working, but the new Hive Metastore Server will fail to start because a Derby metastore cannot be shared between multiple services. This is harmless. But you should consider migrating away from a Derby metastore.
On a Cloudera Manager managed cluster, the NameNode doesn't listen on loopback by default.
By default, wildcards are disabled. To have a role (for example, NameNode) listen on loopback, enable wildcards for that role.
Workaround: To have a role listen on loopback (for example NameNode) enable wildcards for that role. This can be done from the Configuration tab for all HDFS roles and for the JobTracker.
If HDFS NameNode is configured to bind to a wildcard address, some Hue applications won't work.
If HDFS NameNode is configured to bind to a wildcard address using the property "Bind NameNode to Wildcard Address," certain Hue applications will not work. The Oozie, JobDesigner, FileBrowser, and Pig Shell applications fail if the NameNode is configured to bind to a wildcard address.
Workaround: Disable NameNode's configuration to bind to wildcard address.
Installing on AWS, you must use private EC2 hostnames.
When installing on an AWS instance, and adding hosts using their public names, the installation will fail when the hosts fail to heartbeat.
Use the Back button in the wizard to return to the original screen, where it prompts for a license.
Rerun the wizard, but choose "Use existing hosts" instead of searching for hosts. Now those hosts show up with their internal EC2 names.
Continue through the wizard and the installation should succeed.
After removing and then re-adding a service, the alternatives settings are incorrect.
After deleting a Cloudera Manager service, the alternatives settings are not cleaned up. If you then re-add the service, it will be given a new instance name, and a new set of configurations settings are added. However, because both the new and old (deleted) instances have the same alternatives priority, the original one will be used rather than the newer one.
Workaround: The simplest way to fix this is:
- Go to the Configuration tab for the new service instance in Cloudera Manager
- Search for "alternatives"
- Raise the priority value and Save your setting.
- Redeploy your client configuration (from the Actions menu).
New schema extensions have been introduced for Oozie in CDH4.1
In CDH4.1, Oozie introduced new versions for Hive, Sqoop and workflow schema. To use them, you must add the new schema extensions to the Oozie SchemaService Workflow Extension Schemas configuration property in Cloudera Manager.
Workaround: In Cloudera Manager, do the following:
- Go to the CDH4 Oozie service page.
- Go to the Configuration tab, View and Edit.
- Search for "Oozie Schema". This should show the Oozie SchemaService Workflow Extension Schemas property.
- Add the following to the Oozie SchemaService Workflow Extension Schemas property:
shell-action-0.2.xsd hive-action-0.3.xsd sqoop-action-0.3.xsd
- Save these changes.
Stop dependent HBase services before enabling HDFS Automatic Failover.
When enabling HDFS Automatic Failover, you need to first stop any dependent HBase services. The Automatic Failover configuration workflow restarts both NameNodes, which could cause HBase to become unavailable.
On Ubuntu 10.04, the Cloudera Manager agent will not run with an upgraded system python.
On Ubuntu 10.04, the Cloudera Manager agent will not run if the system python is upgraded to 2.6.5-1ubuntu6.1. (2.6.5-1ubuntu6 works correctly.) If you have upgraded, you must also rebuild your pre-prepared virtualenv.
Workaround: Run the following commands:
# apt-get install python-virtualenv # virtualenv /usr/lib64/cmf/agent/build/env
Cloudera Manager does not support encrypted shuffle.
Encrypted shuffle has been introduced in CDH4.1, but it is not currently possible to enable it through Cloudera Manager.
Enabling or disabling High Availability requires Hive Metastore modifications.
Enabling or disabling High Availability for HDFS NameNode requires the Hive Metastore to be modified. This is necessary if the cluster consists of services that depend on Hive, such as Impala and Hue. To modify the Hive Metastore before proceeding with Enabling or Disabling HDFS High Availability, see the Known Issue "Tables created in Hive/Beeswax before HDFS is converted to HA become inaccessible after a failover" in the CDH4 Release Notes for more information.
Workaround: Run the "Update Hive Metastore NameNodes" command under the Hive service.
Impala cannot be used with Federated HDFS
If your cluster is configured to use Federated HDFS, Impala queries will fail.
Links from the HBase Master Web UI to RegionServer Web UIs may be incorrect.
In order for the links from the HBase Master Web UI to the RegionServer Web UIs to be correct, all the RegionServer Web UI ports must be the same. These can be different from default value of 60030, but all must use the same port number. For the RegionServer Web UI port configuration, roletype and role level values should all be the same.
Workaround: Links from Cloudera Manager to the RegionServer Web UIs will be correct, and can be used to access the RegionServer Web UIs if the web ports cannot be the same.
If HDFS uses Quorum-based Storage without HA enabled, the SecondaryNameNode cannot checkpoint.
If HDFS is set up in non-HA mode, but with Quorum-based storage configured, the dfs.namenode.edits.dir is automatically configured to the Quorum-based Storage URI. However, the SecondaryNameNode cannot currently read the edits from a Quorum-based Storage URI, and will be unable to do a checkpoint.
Workaround: Add to the NameNode's safety valve the dfs.namenode.edits.dir property with both the value of the Quorum-based Storage URI as well as a local directory, and restart the NameNode. For example,
<property> <name>dfs.namenode.edits.dir</name> <value>qjournal://jn1HostName:8485;jn2HostName:8485;jn3HostName:8485/journalhdfs1,file:///dfs/edits</value> </property>
Changing the rack configuration may temporarily cause mis-replicated blocks to be reported.
A rack re-configuration will cause HDFS to report mis-replicated blocks until HDFS rebalances the system, which may take some time. This is a normal side-effect of changing the configuration.
Starting HDFS with HA and Automatic Failover enabled, one of the NameNodes might not start.
When starting an HDFS service with High Availability and Automatic Failover enabled, one of the NameNodes might might not start up.
Workaround: To fix this, start the NameNode that failed to start up after the remaining HDFS roles start up.
Cannot use '/' as a mount point with a Federated HDFS Nameservice.
A Federated HDFS Service doesn't support nested mount points, so it is impossible to mount anything at '/'. Note that because of this issue, the root directory will always be read-only, and any client application that requires a writeable root directory will fail.
- In the CDH4 HDFS Service > Configuration tab of the Cloudera Manager Admin Console, search for "nameservice".
- In the Mountpoints field, change the mount point from "/" to a list of mount points that are in the namespace that the Nameservice will manage. (You can enter this as a comma-separated list - for example, "/hbase, /tmp, /user" or by clicking the plus icon to add each mount point in its own field.) You can determine the list of mount points by running the command hadoop fs -ls / from the CLI on the NameNode host.
In the HDFS service, the default value for the Superuser Group setting has changed.
The default value for the Superuser Group setting (dfs.permissions.supergroup and dfs.permissions.superusergroup) has changed. In Cloudera Manager 3.7, the default value was hadoop. In Cloudera Manager 4.0, the default value is now superuser.
Workaround: If necessary, you can change the value for the Superuser Group by setting it in the HDFS service > Configuration tab of the Cloudera Manager Admin Console.
After upgrading to CM 4.1, roles may need to be restarted for Log Directory Monitoring to work.
After upgrading to Cloudera Manager 4.1, directory monitoring may show status "UNKNOWN" until roles are restarted. You can either restart the roles, or just ignore the unknown status until the next planned restart.
Historical disk usage reports do not work with federated HDFS.
(Applies to CDH4 only) Activity monitoring does not work on YARN activities.
(Applies to CDH3 only) Uninstalling Oozie components in the wrong order will cause the uninstall to fail.
If you uninstall hue-oozie-auth-plugin (which was originally installed with Cloudera Manager 3.7) after uninstalling Oozie, the uninstall hue-oozie-auth-plugin operation will fail and the hue-oozie-auth-plugin package will not be uninstalled.
Work-around: Uninstall hue-oozie-auth-plugin before uninstalling Oozie. If you already attempted to uninstall hue-oozie-auth-plugin after Oozie, you must reinstall Oozie, uninstall hue-oozie-auth-plugin, and then uninstall Oozie again.
HDFS monitoring configuration applies to all Nameservices
The monitoring configurations at the HDFS level apply to all Nameservices. So, if there are two federated Nameservices, it's not possible to disable a check on one but not the other. Likewise, it's not possible to have different thresholds for the two Nameservices.
Task details don't appear for CDH4 MR jobs in the Activity Monitor.
In the Activity Monitor, clicking on the Task details for a job sometimes returns "No results found. Try expanding the time range". This is because there is a time lag between when the Activity information appears and its Task details are available.
Workaround: Wait for bit and try again – results can take up to a full minute to appear.
In CDH 4.0 and 4.1, for secure clusters only, Hue cannot connect to the Hive Metastore Server.
Anticipated Resolution: Fixed in CDH4.2.
Workaround: There are three workarounds:
Upgrade to CDH4.2.
Use Hue's safety valve for hive-site.xml to configure Hue to directly connect to the Hive Metastore database. These configurations can easily be found by going to the Hive service, selecting a Hive Metastore Server, navigating to the processes page, expanding "show", then clicking on hive-site.xml. You should include the following:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>JDBC_URL</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>DRIVER_NAME</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>HIVE_DB_USER</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>HIVE_DB_PASSWORD</value> </property> <property> <name>hive.metastore.local</name> <value>true</value> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>false</value> </property> <property> <name>datanucleus.metadata.validate</name> <value>false</value> </property> <property> <name>hive.warehouse.subdir.inherit.perms</name> <value>true</value> </property>
Select the "Bypass Hive Metastore" option in Hive service configuration, in the Advanced group. This is not the preferred solution because this configures any Hive CLI to bypass the Hive Metastore Server, even though Hive CLI works with Hive Metastore Server.
Known Issues for Cloudera Backup and Disaster Recovery
Temp files created during metadata export for Hive replication are not deleted.
When doing a Hive replication, BDR creates temporary metadata export files in /tmp on the source cluster on the host where the metadata export task runs. These files are not automatically deleted after the replication has succeeded. The size of these files depends on the number of table partitions being replicated, but it is possible for these files to be a large as 500MB. In that case they can fill the /tmp filesystem. The filenames of these are of the form chive*.export. The workaround is to periodically remove these files manually. Do this at a time when no Hive replications are in progress, to ensure that you do not delete a file for a replication that has not completed.
Workaround: Find a time when no Hive replication tasks are in progress, and delete the chive*.export files from the /tmp directory.
Cannot replicate from CDH 5 to CDH 4 in CM 4.
It is not possible to replicate from a CDH 5 cluster (the source) managed by Cloudera Manager 5 to a CDH 4 cluster (the target) managed by Cloudera Manager 4. Note that the target cluster is always a cluster managed by the Cloudera Manager that you are logged into, and Cloudera Manager 4 does not support CDH 5.
Replication between encrypted and unencrypted clusters will fail.
Replication may fail if one cluster has encryption enabled and the other does not. This is due to a problem with how HDFS negotiates a secure connection with the NameNode.
Anticipated Resolution: Requires a fix in CDH.
Workaround: Both source and target clusters must have the same encryption status.
Hive replication fails if "Force Overwrite" is not set.
The Force Overwrite option, if checked, forces overwriting data in the target metastore if there are incompatible changes detected. For example, if the target metastore was modified and a new partition was added to a table, this option would force deletion of that partition, overwriting the table with the version found on the source. If the Force Overwrite option is not set, recurring replications may fail.
Workaround: Set the Force Overwrite option.
Cannot add a Peer cluster that is running Cloudera Manager Free Edition.
Replication is not supported with Cloudera Manager Free Edition (or Cloudera Standard), so attempting to add a as a peer a cluster managed by a Free Edition Cloudera Manager server will fail. As of Cloudera Manager 4.6, the Add Peer function will succeed, but this is not a supported configuration.
Replication between clusters in different Kerberos Realms may fail after upgrade.
Upon upgrade to Cloudera Manager 4.6.x from CM 4.5.x, existing and new replication jobs between clusters in different Kerberos realms may start failing. Due to a Java 6 bug with cross-realm authentication, the underlying functionality for replication between clusters in different Kerberos realms has changed in Cloudera Manager 4.6. Upon upgrade to Cloudera Manager 4.6, you should follow the steps at Enabling Replication Between Clusters in Different Kerberos Realms to set up replication between secure clusters in different realms.
Workaround: Follow the instructions at Enabling Replication Between Clusters in Different Kerberos Realms to set up replication between clusters in different Kerberos realms.
During HDFS replication, tasks may fail due to DataNode timeouts.
In CDH4.2, during an HDFS replication job (using Cloudera Manager's Backup and Data Recovery product) individual tasks in the Replication job may fail due to DataNode timeouts. If enough of these timeouts occur, the replication task may slow down, and the entire replication job could time out and fail.