Enabling Erasure Coding

Before You Begin

Before you enable Erasure Coding (EC), perform the following tasks:

  • Verify that the clusters run CDH 6.0 or higher.
  • Determine which EC policy you want to use.
  • Determine if you want to use EC for existing data or new data

Enabling Erasure Coding

Enable EC using the Cloudera Manager Admin Console:

  1. Select Clusters and choose the HDFS cluster you want to enable EC for.
  2. Navigate to the Configuration tab and select the Erasure Coding category.
  3. Configure the EC properties:
    • DataNode Striped Read Timeout: DataNode reconstruction striped read timeout in milliseconds.
    • DataNode Striped Read Threads: Number of threads used by the DataNode to read striped blocks during background reconstruction work.
    • Erasure Coding Reconstruction Weight: Relative weight of resources used by EC background recovery tasks, which require reading multiple blocks, 6 in the case of RS-6-3-1024k, compared to replicated block recovery, which only requires reading a single replica. Higher values result in fewer reconstruction tasks being able to run concurrently. Blocks required to be read to complete recovery are multiplied by this weight to determine the total weight of the recovery task. These units of weight count against the limit set in the dfs.namenode.replication.max-streams property.
    • Default Policy when Setting Erasure Coding: The erasure coding policy used when enabling erasure coding for a directory without specifying a policy.
    • Erasure Coding Enabled: Allows erasure coding policies to be enabled and set for directories. Note that erasure coding is currently not supported and is experimental only.
  4. Optionally, you can view the supported EC policies with the following command:
    hdfs ec -listPolicies
  5. Enable a supported EC policy from step 4:
    hdfs ec -enablePolicy <policy>
  6. Set the EC policy for a directory with the following command:
    hdfs ec -setPolicy -path <directory> [-policy <policyName>]
    • path. Required. Specify the HDFS directory you want to apply the EC policy to.
    • policy. Optional. The EC policy you want to use for the directory you specified. If you do not provide this parameter, the EC policy you specified in step 3 for the Default Policy when Setting Erasure Coding setting is used.
    This command applies the EC policy to data written to the directory after the command is run. It does not apply EC policies to existing data. See Using Erasure Coding for Existing Data for information about how to use EC with existing data.

Using Erasure Coding for Existing Data

To use EC with existing data, that data must be copied into a directory that has EC enabled. Use the distcp tool or Cloudera Manager's Backup and Disaster Recovery (BDR).

Using Erasure Coding for New Data

To use EC with new data, set the destination for the data to a directory with EC enabled. No action beyond that is required. When data is written to the directory, it will be erasure coded based on the policy you set.