Backing Up Before Upgrading from CDH 4 to CDH 5

Backing up CDH components before upgrading your Cloudera Manager and CDH software provides a way to roll back the upgrade. This topic provides procedures to back up your cluster so that you can roll back your cluster to its pre-upgrade state.

Backup Steps

Preparing to Back Up

Because many storage locations are configurable, you may need to use the Cloudera Manager Admin Console to determine the location of files you need to back up. Where applicable, the parameter names that specify these locations are provided in the backup sections for each component. If you have not changed these values for your cluster, use the provided default values. To find these parameter values:
  1. Open the Cloudera Manager Admin console.
  2. Go to the service where you need to look up a parameter (for example, HDFS, HBase, or ZooKeeper).
  3. Click the Configuration tab.
  4. Enter the name of the parameter in the search box.

    The parameter and its value display on the right.

For some services, you back up data stored in relational databases such as Oracle, MariaDB, MySQL, or PostgreSQL. See the documentation for those products to learn how to back up and restore the databases.

Stopping the Cluster

Stop the CDH cluster before performing the backup:
  1. Go to the Home page.
  2. In the drop-down list next to your cluster, select Stop.

Backing Up CDH 4 and Cloudera Manager Repository Files

If your cluster was installed using packages, back up the CDH 4 and Cloudera Manager repository files on all hosts from the system repository directory. If your cluster was installed using Cloudera parcels, back up only the Cloudera Manager repository file. Following are the typical locations for the repository directories:
Operating System Path
RHEL /etc/yum.repos.d
SLES /etc/zypp/repos.d
Ubuntu or Debian /etc/apt/sources.list.d

For example, on a RHEL or similar system, back up the files in /etc/yum.repos.d that have cloudera as part of their name.

Backing Up ZooKeeper

On all ZooKeeper hosts, back up the ZooKeeper data directory specified with the dataDir property in the ZooKeeper configuration. The default location is /var/lib/zookeeper.

Record the permissions of the files and directories; you will need these to roll back ZooKeeper.

Backing Up HDFS (With High Availability)

Follow this procedure to back up an HDFS deployment that has been configured for high availability.

  1. On both NameNode hosts, back up one of the NameNode data directories specified with the dfs.namenode.name.dir property.
  2. On each JournalNode, back up the JournalNode edits directory specified by the dfs.journalnode.edits.dir property. Note which JournalNode host the backup comes from.
  3. Back up the VERSION files for each DataNode, noting which DataNode you are backing up. There may be multiple data directories in each node, but you need to back up only one of them on each DataNode. The location of the data directories is specified with the dfs.datanode.data.dir property. The VERSION file is located in the current subdirectory. You will use the version files to get the storageID when you perform the rollback steps; for example (using the default path): /data/dfs/dn/current/VERSION. You only need this storageID when rolling back the DataNodes; copying the VERSION file is suggested as a convenience.

Backing Up HDFS (Without High Availability)

Use this procedure to back up an HDFS deployment that has not been configured for high availability.

  1. On the NameNode host, back up one of the NameNode data directories specified with the dfs.namenode.name.dir property.
  2. Back up the VERSION files for each DataNode, noting which DataNode you are backing up. There may be multiple data directories in each node, but you need to back up only one of them on each DataNode. The location of the data directories is specified with the dfs.datanode.data.dir property. The VERSION file is located in the current subdirectory. You will use the version files to get the storageID when you perform the rollback steps; for example (using the default path): /data/dfs/dn/current/VERSION. You only need this storageID when rolling back the DataNodes; copying the VERSION file is suggested as a convenience.

Backing Up HBase

Because the rollback procedure also rolls back HDFS, the data in HBase is also rolled back. In addition, HBase metadata stored in ZooKeeper is recovered as part of the ZooKeeper rollback procedure.

If your cluster is configured to use HBase replication, Cloudera recommends that you document all replication peers. If necessary (for example, because the HBase znode has been deleted), you can roll back HBase as part of the HDFS rollback without the ZooKeeper metadata. This metadata can be reconstructed in a fresh ZooKeeper installation, with the exception of the replication peers, which you must add back. For information on enabling HBase replication, listing peers, and adding a peer, see HBase Replication in the CDH 4 documentation.

Backing Up Hive

Back up the database that backs the Hive metastore. See Backing up Databases.

Backing Up Oozie

Back up the Oozie database. See Backing up Databases.

Backing Up Search

On each Solr node, back up the contents of the Solr Data directory and record the permissions for the directory. This location is specified with the Solr Data Directory property. The default location is:

/var/lib/solr

Search data on ZooKeeper is restored as part of the ZooKeeper rollback.

Backing Up Sqoop 2

If you are not using the default embedded Derby database for Sqoop 2, back up the database you have configured for Sqoop 2. Otherwise, back up the repository subdirectory of the Sqoop 2 metastore directory. This location is specified with the Sqoop 2 Server Metastore Directory property. The default location is: /var/lib/sqoop2. For this default location, Derby database files are located in /var/lib/sqoop2/repository.

Backing Up Hue

  1. Back up the Hue database. See Backing up Databases.
  2. Back up the app registry file, <HUE_HOME>/app.reg, where HUE_HOME is the location of your Hue installation. For package installs, this is usually /usr/lib/hue; for parcel installs, this is usually, /opt/cloudera/parcels/<parcel version>/lib/hue/.

Backing Up Cloudera Manager

  1. Stop Cloudera Management Services using Cloudera Manager:
    1. Select Clusters > Cloudera Management Service.
    2. Select Actions > Stop.
  2. Stop Cloudera Manager Server by running the following command on the Cloudera Manager Server host:
    sudo service cloudera-scm-server stop
  3. On the host where Cloudera Manager Server is running, back up the /etc/cloudera-scm-server/db.properties file.
  4. On the host where the Event Server role is configured to run, back up the contents of the directory specified with the Event Server Index Directory property (the default value is /var/lib/cloudera-scm-eventserver).
  5. Back up the /etc/cloudera-scm-agent/config.ini file on each host in the cluster.
  6. Back up the following Cloudera Manager-related databases; see Backing up Databases:
    • Cloudera Manager Server
    • Activity Monitor (depending on your deployment, this role may not be installed)
    • Reports Manager
    • Service Monitor
    • Host Monitor
    • Navigator Audit Server
    • Navigator Metadata Server

Backing Up Other CDH Components

No backups are required for the following components:
  • MapReduce
  • YARN
  • Spark
  • Pig
  • Sqoop
  • Impala

Backing up Databases

Several steps in the backup procedures require you to back up various databases used in a CDH cluster. The steps for backing up and restoring databases differ depending on the database vendor and version you select for your cluster and are beyond the scope of this document.