Cloudera Manager Snapshot Policies

Minimum Required Role: BDR Administrator (also provided by Full Administrator)

Cloudera Manager enables the creation of snapshot policies that define the directories or tables to be snapshotted, the intervals at which snapshots should be taken, and the number of snapshots that should be kept for each snapshot interval. For example, you can create a policy that takes both daily and weekly snapshots, and specify that 7 daily snapshots and 5 weekly snapshots should be maintained.

Managing Snapshot Policies

To create a snapshot policy:

  1. Click the Backup tab in the top navigation bar and select Snapshots.

    Existing snapshot policies are shown in a list organized by service. Currently running policies (if any) are shown in the Running Policies area.

  2. To create a new policy, click Create. If no policies currently exist, click the Create snapshot policy link. This displays the Create Snapshot Policy pop-up.
  3. Select the service for which you want to create a policy from the pull-down list.
  4. Provide a name for the policy and optionally a description.
  5. Specify the directories or tables that should be included in the snapshot.
    • For an HDFS service, select the paths of the directories that you want to include in the snapshot. The pull-down list will allow you to select only directories that have been enabled for snapshotting. If no directories have been enabled for snapshotting, a warning is displayed.

      Click to add another path, to remove a path.

    • For an HBase service, list the tables you want included in your snapshot. You can use a Java regular expression to specify a set of tables. An example is finance.* which will match all tables with names starting with finance.
  6. Specify the snapshot schedule. You can schedule snapshots hourly, daily, weekly, monthly, or yearly, or any combination of those. Depending on the frequency you've selected, you can specify the time of day to take the snapshot, the day of the week, day of the month, or month of the year, and the number of snapshots to keep at each interval. Each time unit in the schedule information is shared with the time units of larger granularity. That is, the minute value is shared by all the selected schedules, hour by all the schedules for which hour is applicable, and so on. For example, if you specify that hourly snapshots are taken at the half hour, and daily snapshots taken at the hour 20, the daily snapshot will occur at 20:30.
    • To select an interval, check its box. The description will then display the current schedule and the number of snapshots to retain.
    • To edit the schedule (time of day, day of week and so on as relevant), and the number of snapshots to keep, click the edit icon () that appears at the end of the description once you check its box. This opens an area with fields you can edit. When you have made your changes, click the Close button at the bottom of this area. Your changes will be reflected in the schedule description.
  7. Click More Options to specify whether alerts should be generated for various state changes in the snapshot workflow. You can alert on failure, on start, on success, or when the snapshot workflow is aborted.
To edit or delete a snapshot policy:
  1. Click the Backup tab in the top navigation bar and select Snapshots.
  2. Click the Actions menu shown next to a policy and select Edit or Delete.

Orphaned Snapshots

When a snapshot policy includes a limit on the number of snapshots to keep, Cloudera Manager checks the total number of stored snapshots each time a new snapshot is added, and automatically deletes the oldest existing snapshot if necessary. When a snapshot policy is edited or deleted, files, directories, or tables that were previously included but have now been removed from the policy may leave "orphaned" snapshots behind that will no longer be deleted automatically because they are no longer associated with a current snapshot policy. Cloudera Manager will never select these snapshots for automatic deletion because selection for deletion only occurs when the policy causes a new snapshot containing those files, directories, or tables to be made.

Unwanted snapshots can be deleted manually through the Cloudera Manager interface or by creating a command-line script that uses the HDFS or HBase snapshot commands. Orphaned snapshots may be hard to locate for manual deletion. Snapshot policies are automatically given a prefix cm-auto followed by a globally unique identifier (guid). For a specific policy, all its snapshots can be located by searching for those whose names start with the prefix cm-auto- guid that is unique to that policy. The prefix is prepended to the names of all snapshots created by that policy.

To avoid orphaned snapshots, delete them before editing or deleting the associated snapshot policy, or make note of the identifying name for the snapshots you want to delete. This prefix is displayed in the summary of the policy in the policy list and appears in the delete dialog box. Making note of the snapshot names, including the associated policy prefix, is necessary because the prefix associated with a policy cannot be determined once the policy has been deleted, and snapshot names do not contain recognizable references to snapshot policies.

Viewing Snapshot History

  • To view the history of scheduled snapshot jobs, click a policy. This displays a list of the snapshot jobs, and their status.
  • Click a snapshot job to view an expanded status for that job. (Click to return to the previous view.)
  • From the expanded status, click the details link to view the details for the command. From here you can view error logs and or click Download Result Data to a JSON file named summary.json that captures information about the snapshot. For example:
    { "createdSnapshotCount" : 1,
      "createdSnapshots" : [ { "creationTime" : null,
            "path" : "/user/oozie",
            "snapshotName" : "cm-auto-f9299438-a6eb-4f6c-90ac-5e86e5b2e283_HOURLY_2013-11-05_05-25-04",
            "snapshotPath" : "/user/oozie/.snapshot/cm-auto-f9299438-a6eb-4f6c-90ac-5e86e5b2e283_HOURLY_2013-11-05_05-25-04"
          } ],
      "creationErrorCount" : 0,
      "creationErrors" : [  ],
      "deletedSnapshotCount" : 0,
      "deletedSnapshots" : [  ],
      "deletionErrorCount" : 0,
      "deletionErrors" : [  ],
      "processedPathCount" : 1,
      "processedPaths" : [ "/user/oozie" ],
      "unprocessedPathCount" : 0,
      "unprocessedPaths" : [  ]
    }

See Managing HDFS Snapshots and Managing HBase Snapshots for more information about managing snapshots.