This is the documentation for Cloudera Manager 4.8.2.
Documentation for other versions is available at Cloudera Documentation.

Upgrade Cloudera Manager 4 to the Latest Cloudera Manager

Upgrading from an earlier version of Cloudera Manager 4 (either Free or Enterprise Edition) to the latest version of Cloudera Manager is a relatively simple process, that primarily involves upgrading Cloudera Manager Server packages. This process applies to upgrading Cloudera Manager 4.0.x, 4.1.x, 4.5.x and 4.6.x to the latest available version of Cloudera Manager.

In most cases it is possible to complete the following upgrade without shutting down the CDH services, although you may need to stop some dependent services. CDH daemons can continue running, unaffected, while Cloudera Manager is upgraded.

Review Warnings and Notes

  Warning: Impala

Cloudera Manager 4.8 supports Impala 1.2.1, and does not support Impala 1.1.1 or earlier versions of Impala. The upgrade instructions below will work, but once the upgrade has completed, you will see a validation warning for your Impala Service. You will not be able to restart your Impala (or Hue) services until you upgrade your Impala service to 1.2.1. If you want to continue to use Impala 1.1.1 or earlier, do not upgrade to Cloudera Manager 4.8.

  Important: Hive

When upgrading from a version of Cloudera Manager prior to 4.5, Cloudera Manager automatically creates new Hive service(s) to capture the previous implicit Hive dependency from Hue and Impala. Your previous services will continue to function without impact.

Note that if Hue was using a Hive metastore of type Derby, then the newly created Hive service will also use Derby. But since Derby does not allow concurrent connections, Hue will continue to work, but the new Hive Metastore Server will fail to run. The failure is harmless (because nothing uses this new Hive Metastore Server at this point) and intentional, to preserve the set of cluster functionality as it was before upgrade. Cloudera discourages the use of a Derby metastore due to its limitations. You should consider switching to a different supported database type.

Cloudera Manager provides a Hive configuration option to bypass the Hive Metastore Server. When this configuration is enabled, Hive clients, Hue, and Impala connect directly to the Hive Metastore database. Prior to Cloudera Manager 4.5, Hue and Impala talked directly to the Hive Metastore database, so bypass mode is enabled by default when upgrading to Cloudera Manager 4.5 or later. This is to ensure the upgrade doesn't disrupt your existing setup. You should plan to disable bypass mode, especially when using CDH 4.2 or later. Using the Hive Metastore Server is the recommended configuration and the WebHCat Server role requires the Hive Metastore Server to not be bypassed. To disable bypass mode, see Disabling Bypass Mode.

Cloudera Manager 4.5 or later also supports HiveServer2 with CDH4.2. HiveServer2 is not added by default, but can be added as a new role under the Hive service (see Adding Role Instances).

  Note:
  • As of Cloudera Manager 4.6, the former Cloudera Manager Free Edition is now known as Cloudera Standard, and includes a number of features that were previously available only with Cloudera Manager Enterprise Edition. Specifically, service and activity monitoring features are now available, and require databases to be set up for their use. Thus, upon upgrading to Cloudera Manager 4.6, you will be asked for database information for these services. (You will have the option to use the embedded PostgreSQL database for this).
  • When an upgraded Cloudera Manager provides support for a new feature (for example, sqoop2, WebHCat, and so on), it does not install the software on which the new feature depends. If you installed from packages, you must add the package to your managed hosts first, before adding a service or role that supports the new feature.

Stop Selected Services as Needed

Stop the Cloudera Management Service, if it is running, and stop any services that depend on the Hive metastore.

If you are upgrading from the Enterprise Edition, you must stop the Cloudera Management service before upgrades can occur.

To stop the Cloudera Management Service:

  1. From the Services tab select All Services in the Cloudera Manager Admin Console.
  2. Choose Stop on the Actions menu for the Cloudera Management Services.
If you are upgrading from Cloudera Manager 4.5 to a newer version, and you are using the embedded PostgreSQL database, you must stop the services that have a dependency on the Hive Metastore (Hive, Hue, and Impala). You will not be able to stop the Cloudera Manager server's database while these services are running.
  • Choose Stop on the Actions menus for the Hive and Hue services. Do the same for Impala if you have it running.

Upgrade the Cloudera Manager Server Software

In this step, you upgrade the Cloudera Manager Server packages to the latest version. The Agents' packages will be updated in Deploy the Upgraded Agent Software.

  1. Stop the server and the server's database on the Cloudera Manager Server host using the following commands:
    $ sudo service cloudera-scm-server stop
  2. If you are using the embedded PostgreSQL database for Cloudera Manager, stop the database on the Cloudera Manager Server host:
    $ sudo service cloudera-scm-server-db stop
    If you are not using the embedded database, you should skip this step.
  3. Install the new version of the server. To install the new version, you can upgrade from Cloudera's repository at http://archive.cloudera.com/cm4/. Alternately, you can create your own repository, as described in Understanding Custom Installation Solutions. Creating your own repository is necessary if you are upgrading a cluster that does not have access to the Internet.
    1. Find Cloudera's repo file for your distribution by starting at http://archive.cloudera.com/cm4/ and navigating to the directory that matches your operating system. For example, for Red Hat or CentOS 6, you would navigate to http://archive.cloudera.com/cm4/redhat/6/x86_64/cm/. Within that directory, find the repo file that contains information including the repository's base URL and gpgkey. In the preceding example, the contents of the cloudera-manager.repo file might appear as follows:
      [cloudera-manager]
      # Packages for Cloudera Manager, Version 4, on RedHat or CentOS 5 x86_64
      name=Cloudera Manager
      baseurl=http://archive.cloudera.com/cm4/redhat/5/x86_64/cm/4/
      gpgkey = http://archive.cloudera.com/cm4/redhat/5/x86_64/cm/RPM-GPG-KEY-cloudera 
      gpgcheck = 1
      
      For Ubuntu or Debian systems, the repo file can be found by navigating to the appropriate directory, for example, http://archive.cloudera.com/cm4/debian/squeeze/amd64/cm. The repo file, in this case, cloudera.list, may appear as follows:
      # Packages for Cloudera's Distribution for Hadoop, Version 4, on Debian 6.0 x86_64
      deb http://archive.cloudera.com/cm4/debian/squeeze/amd64/cm squeeze-cm4 contrib
      deb-src http://archive.cloudera.com/cm4/debian/squeeze/amd64/cm squeeze-cm4 contrib
      

      Copy this repo file to the configuration location for the package management software for your system. For example, with Red Hat 6, you would copy the cloudera-manager.repo file to /etc/yum.repos.d/. For SLES, you would copy the cloudera-manager.repo file to /etc/zypp/repos.d/. For Ubuntu/Debian, you would copy the cloudera.list file, to /etc/apt/sources.list.d/.

    2. After verifying that you have the correct repo file, run the following commands:
      Operating System Commands
      RHEL
      $ sudo yum clean all
      $ sudo yum update 'cloudera-*' 
        Note:
      • yum clean all cleans up yum's cache directories, ensuring that you download and install the latest versions of the packages.
      • If your system is not up to date, and any underlying system components need to be upgraded before this yum update can succeed, yum will tell you what those are.
      SLES
      $ sudo zypper clean --all
      $ sudo zypper up -r http://archive.cloudera.com/cm4/sles/11/x86_64/cm/4/

      To download from your own repository:

      $ sudo zypper clean --all
      $ sudo zypper rr cm 
      $ sudo zypper ar -t rpm-md http://myhost.example.com/path_to_cm_repo/ cm  
      $ sudo zypper up -r http://myhost.example.com/path_to_cm_repo
      Ubuntu or Debian Use the following commands to clean cached repository information and update Cloudera Manager components:
      $ sudo apt-get clean
      $ sudo apt-get update
      $ sudo apt-get install cloudera-manager-server cloudera-manager-agent cloudera-manager-daemons

      As this process proceeds, you may be prompted concerning your configuration file version:

      Configuration file `/etc/cloudera-scm-agent/config.ini'
      ==> Modified (by you or by a script) since installation.
      ==> Package distributor has shipped an updated version.
      What would you like to do about it ? Your options are:
      Y or I : install the package maintainer's version
      N or O : keep your currently-installed version
      D : show the differences between the versions
      Z : start a shell to examine the situation
      The default action is to keep your current version.

      You will receive a similar prompt for /etc/cloudera-scm-server/db.properties. Answer N to both these prompts.

At the end of this process you should have the following packages, corresponding to the version of Cloudera Manager you installed, on the host that will become the Cloudera Manager Server host. For example, for CentOS,

$ rpm -qa 'cloudera-manager-*'
 cloudera-manager-agent.x86_64 0:4.8.0-1.cm480.p0.26.el6
  cloudera-manager-daemons.x86_64 0:4.8.0-1.cm480.p0.26.el6
  cloudera-manager-server.x86_64 0:4.8.0-1.cm480.p0.26.el6

For Ubuntu or Debian, you should have packages similar to those shown below.

~# dpkg-query -l 'cloudera-manager-*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                                       Version                                                    Description
+++-============================================-=================================================-==========================================================================================================
ii  cloudera-manager-daemons                     4.8.0-1.cm480.p0.175~squeeze-cm4.6.1              Provides daemons for monitoring Hadoop and related tools.
ii  cloudera-manager-repository                  4.0                                               Cloudera Manager
ii  cloudera-manager-server                      4.8.0-1.cm480.p0.175~squeeze-cm4.6.1              The Cloudera Manager Server
ii  cloudera-manager-server-db                   4.8.0-1.cm480.p0.175~squeeze-cm4.6.1              This package configures an "embedded" PostgreSQL server,running as user cloudera-scm on port 7432.

You may also see an entry for the cloudera-manager-server-db if you are using the embedded database, and additional packages for plugins, depending on what was previously installed on the Server host. If the commands to update the server complete without errors, you can assume the upgrade has completed as desired. For additional assurance, you will have the option to check that the server versions have been updated after you start the server. The process of checking the server version is described in Verify the Upgrade Succeeded.

Start the Server

On the Cloudera Manager Server host (the system on which you installed the cloudera-manager-server package) do the following:

If you are using the embedded PostgreSQL database for Cloudera Manager:

$ sudo service cloudera-scm-server-db start

This will set up the new database for Cloudera Navigator.

  Note: The sudo service cloudera-scm-server-db start command is not necessary if you are not using the embedded PostgreSQL database.
$ sudo service cloudera-scm-server start

You should see the following:

Starting cloudera-scm-server:                              [  OK  ]
  Note: If you have problems starting the server, such as database permissions problems, you can use the server's log /var/log/cloudera-scm-server/cloudera-scm-server.log to troubleshoot the problem.

Deploy the Upgraded Agent Software

Cloudera Manager can automatically upgrade existing Agents. After you upgrade Cloudera Manager, when it is started for the first time, it checks for any older versions of agents. If older agents are detected, Cloudera Manager provides the opportunity to automatically update agents, which is recommended.

  Important: All hosts in the cluster must have access to the Internet if you plan to use archive.cloudera.com as the source for installation files. If you do not have Internet access, create a custom repository.
  1. Log in to the Cloudera Manager Admin Console. If you have just restarted the Cloudera Manager server, you may need to log in again.
  2. On the Welcome screen, select whether you want to:
    • Install Cloudera Standard,
    • Try the Cloudera Enterprise with a 60-day trial license, or
    • Install a license you have purchased for Cloudera Enterprise.
  3. After you upload the Cloudera Manager license, or if you have elected to use a Trial license, restart the Cloudera Manager server.
    $ sudo service cloudera-scm-server restart
    As the Cloudera Manager server restarts, the UI indicates its progress, and presents the login page when the restart has completed.
  4. Click Continue to proceed to the Upgrade cluster hosts screen.
  5. On the Upgrade cluster hosts screen, click Start Upgrade to upgrade the existing managed hosts. Click Skip Host Upgrades to skip this step.
  6. Select the release of the Cloudera Manager Agent to install. Normally, this will be the Matched Release for this Cloudera Manager Server. However, if you used a custom repository for the Cloudera Manager server, select Custom Repository and provide the required information. Click Continue to proceed.
  7. Provide credentials for authenticating with hosts.
    1. Select root or enter the user name for an account that has password-less sudo permissions.
    2. Select an authentication method.
      • If you choose to use password authentication, enter and confirm the password.
      • If you choose to use public-key authentication provide a passphrase and path to the required key files.
      • You can choose to specify an alternate SSH port. The default value is 22.
      • You can specify the maximum number of host installations to run at once. The default value is 10.
  8. Click Start Installation to install and start Cloudera Manager Agents. The status of installation on each host is displayed on the page that appears after you click Start Installation. You can also click the Details link for individual hosts to view detailed information about the installation and error messages if installation fails on any hosts.
      Note: If you click the Abort Installation button while installation is in progress, it will halt any pending or in-progress installations and roll back any in-progress installations to a clean state. The Abort Installation button does not affect host installations that have already completed successfully or already failed.

    If installation fails on a host, you can click the Retry link next to the failed host to try installation on that host again. To retry installation on all failed hosts, click Retry Failed Hosts at the bottom of the screen.

  9. When the Continue button appears at the bottom of the screen, the installation process is complete. If the installation has completed successfully on some hosts but failed on others, you can click Continue if you want to skip installation on the failed hosts and continue to the next screen to start installing the Cloudera Management services on the successful hosts.
  10. The Host Inspector runs to inspect your managed hosts for correct versions and configurations. If there are problems, you can make changes and them re-run the inspector. When you are satisfied with the inspection results, click Continue to install the Cloudera Management services.
  11. On the next page, select the hosts where the Hive Metastore Server role should be installed.

    If you are upgrading from a version of Cloudera Manager prior to 4.5 this step will be skipped -- the Hive Metastore will already be set up.

    The Hive service is now managed by Cloudera Manager; you must select the host for the Hive Metastore Server. You should assign the Hive Metastore server to a single host.
  12. Review the configuration values for your Hive roles, and click Accept to continue.
      Note:

    If Hue is using a Hive metastore of type Derby (the default), then the newly created Hive service will also use Derby. However, since Derby does not allow concurrent connection, the new Hive Metastore Server will fail to start. The failure is harmless the Hive Metastore Server is not used at this point) and intentional, to preserve the cluster functionality that existed before the upgrade.

    If you are upgrading to CM 4.5 or later from a release prior to 4.5 (i.e. 4.1 or earlier) Hive's metastore bypass mode is enabled by default. You should plan to disable the Bypass Hive Metastore Server mode, especially when using CDH 4.2 or later. Using the Hive Metastore Server is the recommended configuration. After changing this configuration, you must re-deploy your client configurations, restart Hive, and restart any Hue or Impala services configured to use that Hive.

  13. Your services (except for Hive and the services you stopped in Step 1) should now be running.

Verify the Upgrade Succeeded

If the commands to update and start the server complete without errors, you can assume the upgrade has completed as desired. For additional assurance, you can check that the server versions have been updated:
  1. In the Cloudera Manager Admin console, click the Hosts tab.
  2. Click Host Inspector. On large clusters, the host inspector may take some time to finish running. You must wait for the process to complete before proceeding to the next step.
  3. Click Show Inspector Results.

    All results from the host inspector process are displayed including the currently installed versions. If this includes listings of current component versions, the installation completed as expected.

Add Hive Gateway Roles

You must add Hive Gateway roles to any hosts where Hive clients should run.

  Note: This step only applies if you are upgrading from a release prior to Cloudera Manager 4.5. If you are upgrading from 4.5 or later and you have Hive gateway roles already installed, you will not need to add them again.
To add Hive gateway roles:
  1. In the Cloudera Manager Admin console, pull down the Services tab and select the Hive service.
  2. Go to the Instances tab, and click the Add button. This opens the Add Role Instances page.
  3. Select the hosts on which you want a Hive Gateway role to run. This will ensure that the Hive client configurations are deployed on these hosts.

Upgrade the Impala Service

Cloudera Manager 4.8 does not support Impala 1.1; you must upgrade Impala to version 1.2. You will not be able to restart the Impala service (or the Hue service) until this is done.

To upgrade Impala, follow the instructions in Upgrading Impala. This includes instructions for doing the upgrade using either parcels or packages. Note that if your CDH was installed with packages, you must upgrade Impala using packages; you cannot mix parcels and packages in the same deployment.

Restart Services

You must restart the Management Service and any other services (Hive, Hue, Impala) that you stopped at the beginning of this procedure. You should also restart the MapReduce service, or certain functions on MapReduce roles will fail.

In addition, as of Cloudera Manager 4.1, health checks were introduced for the ZooKeeper service. If you are upgrading from a Cloudera Manager version older than 4.1 and have ZooKeeper installed, those new health checks will fail until you restart the ZooKeeper service.

To restart the ZooKeeper Service

  1. From the Services tab select All Services in the Cloudera Manager Admin Console.
  2. Choose Restart on the Actions menu for the ZooKeeper Service.
  Note: If for some reason you do not want to restart the ZooKeeper service at this point, you can disable the alerts for the failing health checks, or disable the health checks themselves. See Configuring Monitoring Settings. However, be sure to re-enable any checks you have disabled when you eventually restart the service. It is strongly recommended that you restart the service as soon as possible.

To start the services you stopped in Stop Selected Services as Needed:

  1. From the Services tab select All Services in the Cloudera Manager Admin Console.
  2. Choose Start on the Actions menu for the each service you need to start.

To start the Cloudera Management service:

  1. From the Services tab select All Services in the Cloudera Manager Admin Console.
  2. Choose Start on the Actions menu for the Cloudera Management Services.
  Note: If you change the hostname or port where the Cloudera Manager is running, or you enable TLS security, you must restart the Cloudera Management Services to update the URL to the Server.

To restart the MapReduce service:

  1. From the Services tab in the Cloudera Manager Admin Console, select the MapReduce service.
  2. Choose Restart on the Actions menu for the each service you need to start.

    If you do not restart MapReduce after an upgrade to Cloudera Manager 4.6,certain functions such as rolling restart, decommissioning TaskTrackers, or refreshing the JobTracker will fail. Once MapReduce has been restarted, these functions will work correctly from then on.

Test the Installation

When you have finished the upgrade to Cloudera Manager, you can test the installation to verify that the monitoring features are working as expected; follow instructions under Testing the Installation.

Enable Cloudera Navigator Auditing

If you have upgraded to Cloudera Enterprise or are running the 60-day Trial and want to try Cloudera Navigator, you must add it as a role under the management service. For information on Cloudera Navigator, see Cloudera Navigator documentation.

  1. From the Services page, select the Cloudera Manager management service.
  2. Click to the Instances tab, and click the Add button.
  3. In the table presented, scroll to the end and select the host where you want the Navigator Server role to be hosted, and click Continue.
  4. Because Cloudera Navigator is separately licensed, you are presented with a license statement. Click Accept to enable the trial license for this feature.
  5. Enter the credentials for the database to be used by the Navigator Server. Assuming you have not set up an external database, you can use the Embedded Database option. Click Test Connection to verify connectivity to the Database, the click Continue.
  6. Review and accept any configuration changes (typically there are none). Click Accept. This returns you to the Instances page.
  7. The Navigator Server role is added but not started. To start the role:
    1. Click the checkbox next to the role.
    2. From the Actions for Selected menu, click Start, and confirm that you want to start the role.
  Note: After you add the Cloudera Navigator role or you upgrade Cloudera Manager and already have Cloudera Navigator set up to audit services, you must restart the audited services.

Deploy Updated Client Configurations

During upgrades between major versions, resource locations may change. To ensure clients have current information about resources, update client configuration as described in Deploying Client Configuration Files.

(Optional) Upgrade CDH

Cloudera Manager 4.x can manage both CDH3 and CDH4, so upgrading existing CDH3 installations is not required, but to get the benefits of CDH4, you may want to upgrade to the latest version. See the following topics for more information on upgrading CDH: