Backing Up CDH

This topic describes how to back up a CDH cluster managed by Cloudera Manager prior to upgrading the cluster. These procedures do not back up the data stored in the cluster. Cloudera recommends that you maintain regular backups of your data using the Backup and Disaster Recovery features of Cloudera Manager. See Backup and Disaster Recovery.

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

Loading Filters ... 6.0.0 5.15 5.14 5.13 5.12 5.11 5.10 5.9 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0 6.0.0 5.15 5.14 5.13 5.12 5.11 5.10 5.9 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0

Backing up CDH before you upgrade allows you to roll back the upgrade if necessary.

The following CDH components do not require backups:
  • MapReduce
  • YARN
  • Spark
  • Pig
  • Impala

Back Up HDFS Metadata on the NameNode

[Not required for CDH maintenance release upgrades.]

Back up HDFS metadata using the following command:

hdfs dfsadmin -fetchImage myImageName

Back Up HDFS Metadata on the NameNode

[Not required for CDH maintenance release upgrades.]

Back up HDFS metadata using the following command:

hdfs dfsadmin -fetchImage myImageName

Back Up the Repository Files

Backup the repository files on all cluster hosts:
RHEL / CentOS
sudo -E tar -cf CDH_BACKUP_DIR/repository-CM-CDH.tar /etc/yum.repos.d
SLES
sudo -E tar -cf CDH_BACKUP_DIR/repository-CM-CDH.tar /etc/zypp/repos.d
Debian / Ubuntu
sudo -E tar -cf CDH_BACKUP_DIR/repository-CM-CDH.tar /etc/apt/sources.list.d

Back Up Databases

Gather the following information:

  • Type of database (PostgreSQL, Embedded PostgreSQL, MySQL, MariaDB, or Oracle)
  • Hostnames of the databases
  • Credentials for the databases
Open the Cloudera Manager Admin Console to find the database information for any of the following services you have deployed in your cluster:
  • Sqoop, Oozie, and Hue – Go to Cluster Name > Configuration > Database Settings.
  • Hive Metastore – Go to the Hive service, select Configuration, and select the Hive Metastore Database category.
  • Sentry – Go to the Sentry service, select Configuration, and select the Sentry Server Database category.

To back up the databases

Perform the following steps for each database you back up:
  1. If not already stopped, stop the service.
    1. On the Home > Status tab, click to the right of the service name and select Stop.
    2. Click Stop in the next screen to confirm. When you see a Finished status, the service has stopped.
  2. Back up the database. Substitute the database name, hostname, port, user name, and backup directory path and run the following command:
    MySQL
    mysqldump --databases database_name --host=database_hostname --port=database_port -u database_username -p > backup_directory_path/database_name-backup-`date +%F`-CM.sql
    PostgreSQL/Embedded
    pg_dump -h database_hostname -U database_username -W -p database_port database_name > backup_directory_path/database_name-backup-`date +%F`-CM.sql
    Oracle
    Work with your database administrator to ensure databases are properly backed up.

    For additional information about backing up databases, see these vendor-specific links:

  3. Start the service.
    1. On the Home > Status tab, click to the right of the service name and select Start.
    2. Click Start in the next screen to confirm. When you see a Finished status, the service has started.

Back Up ZooKeeper

On all ZooKeeper hosts, back up the ZooKeeper data directory specified with the dataDir property in the ZooKeeper configuration. The default location is /var/lib/zookeeper. For example:
cp -rp /var/lib/zookeeper/ /var/lib/zookeper-CM-CDH

Record the permissions of the files and directories; you will need these to roll back ZooKeeper.

Back Up HDFS

Follow this procedure to back up an HDFS deployment.

  1. If high availability is enabled for HDFS, run the following command on all hosts running the JournalNode role:
    cp -rp /dfs/jn /dfs/jn_-CM-CDH
  2. On all NameNode hosts, back up the NameNode runtime directory. Run the following commands:
    mkdir -p /etc/hadoop/conf.rollback.namenode
    cd /var/run/cloudera-scm-agent/process/ && cd `ls -t1 | grep -e "-NAMENODE\$" | head -1`
    cp -rp * /etc/hadoop/conf.rollback.namenode/
    rm -rf /etc/hadoop/conf.rollback.namenode/log4j.properties
    cp -rp /etc/hadoop/conf.cloudera.HDFS_service_name/log4j.properties /etc/hadoop/conf.rollback.namenode/

    These commands create a temporary rollback directory. Later backup steps require you to modify files in this directory.

  3. Back up the runtime directory for all DataNodes. Run the following commands on all DataNodes:
    mkdir -p /etc/hadoop/conf.rollback.datanode/
    cd /var/run/cloudera-scm-agent/process/ && cd `ls -t1 | grep -e "-DATANODE\$" | head -1`
    cp -rp * /etc/hadoop/conf.rollback.datanode/
    rm -rf /etc/hadoop/conf.rollback.datanode/log4j.properties
    cp -rp /etc/hadoop/conf.cloudera.HDFS_service_name/log4j.properties /etc/hadoop/conf.rollback.datanode/
  4. If high availability is not enabled for HDFS, backup the runtime directory of the Secondary NameNode. Run the following commands on all Secondary NameNode hosts:
    mkdir -p /etc/hadoop/conf.rollback.secondarynamenode/
    cd /var/run/cloudera-scm-agent/process/ && cd `ls -t1 | grep -e "-SECONDARYNAMENODE\$" | head -1`
    cp -rp * /etc/hadoop/conf.rollback.secondarynamenode/
    rm -rf /etc/hadoop/conf.rollback.secondarynamenode/log4j.properties
    cp -rp /etc/hadoop/conf.cloudera.HDFS_service_name                       /log4j.properties /etc/hadoop/conf.rollback.secondarynamenode/

Back Up Key Trustee Server and Clients

Back Up HSM KMS

When running the HSM KMS in high availability mode, if either of the two nodes fails, a role instance can be assigned to another node and federated into the service by the single remaining active node. In other words, you can bring a node that is part of the cluster, but that is not running HSM KMS role instances, into the service by making it an HSM KMS role instance–more specifically, an HSM KMS proxy role instance and an HSM KMS metastore role instance. So each node acts as an online ("hot" backup) backup of the other. In many cases, this will be sufficient. However, if a manual ("cold" backup) backup of the files necessary to restore the service from scratch is desirable, you can create that as well.

To create a backup, copy the /var/lib/hsmkp and /var/lib/hsmkp-meta directories on one or more of the nodes running HSM KMS role instances.

To restore from a backup: bring up a completely new instance of the HSM KMS service, and copy the /var/lib/hsmkp and /var/lib/hsmkp-meta directories from the backup onto the file system of the restored nodes before starting HSM KMS for the first time.

Back Up Navigator Encrypt

It is recommended that you back up Navigator Encrypt configuration directory after installation, and again after any configuration updates.
  1. To manually back up the Navigator Encrypt configuration directory (/etc/navencrypt):
    $ zip -r --encrypt nav-encrypt-conf.zip /etc/navencrypt

    The --encrypt option prompts you to create a password used to encrypt the zip file. This password is also required to decrypt the file. Ensure that you protect the password by storing it in a secure location.

  2. Move the backup file (nav-encrypt-conf.zip) to a secure location.

Back Up HBase

Because the rollback procedure also rolls back HDFS, the data in HBase is also rolled back. In addition, HBase metadata stored in ZooKeeper is recovered as part of the ZooKeeper rollback procedure.

If your cluster is configured to use HBase replication, Cloudera recommends that you document all replication peers. If necessary (for example, because the HBase znode has been deleted), you can roll back HBase as part of the HDFS rollback without the ZooKeeper metadata. This metadata can be reconstructed in a fresh ZooKeeper installation, with the exception of the replication peers, which you must add back. For information on enabling HBase replication, listing peers, and adding a peer, see HBase Replication in the CDH 5 documentation.

Back Up Search

On each Solr node, back up the contents of the Solr Data directory and record the permissions for the directory. This location is specified with the Solr Data Directory property. The default location is:

/var/lib/solr

Search data on ZooKeeper is restored as part of the ZooKeeper rollback.

Back Up Sqoop 2

If you are not using the default embedded Derby database for Sqoop 2, back up the database you have configured for Sqoop 2. Otherwise, back up the repository subdirectory of the Sqoop 2 metastore directory. This location is specified with the Sqoop 2 Server Metastore Directory property. The default location is: /var/lib/sqoop2. For this default location, Derby database files are located in /var/lib/sqoop2/repository.

Back Up Hue

  1. On all hosts running the Hue Server role, back up the app registry file:
    cp -rp /opt/cloudera/parcels/CDH/lib/hue/app.reg /opt/cloudera/parcels_backup/app.reg_-CM-CDH 
    cp -rp /usr/lib/hue/app.reg /usr/lib/hue_backup/app.reg_-CM-CDH