Using EBS Volumes for Cloudera Manager and CDH

Cloudera Director 2.2 and above supports the use of Amazon EBS volumes with Cloudera Manager and CDH cluster instances. EBS volumes provide additional storage, for example, to store HDFS data, to stage data for processing, or to install other applications. EBS can provide an efficient and cost-effective alternative to S3 or other storage mechanisms.

EBS Volume Types

Cloudera Director supports the Amazon EBS volume types gp2, st1, and sc1, described in the following table:
EBS volume type minimum-maximum size Usage
gp2 1 GiB - 16 TiB General purpose SSD (solid state drive) volume that balances price and performance for a wide variety of transactional workloads.
st1 500 GiB - 16 TiB Low cost HDD (hard disk drive) volume designed for frequently accessed, throughput-intensive workloads.
sc1 500 GiB - 16 TiB Lowest cost HDD (hard disk drive) volume designed for less frequently accessed workloads.

For more information about EBS volume types, see Amazon EBS Volume Types.

Amazon EC2 Instance Stores

Instance stores provide another kind of block storage for EC2 instances, but they cannot be used together with EBS volumes. Instance store volumes are located on disks that are physically attached to the host computer, and are optionally included with many EC2 instance types.

If an instance type has instance store volumes and you do not specify EBS volumes, Cloudera Director automatically mounts all the instance store volumes that are available. If you do specify EBS volumes, then Cloudera Director does not mount instance store volumes.

For more information on EC2 instance stores, see Amazon EC2 Instance Stores in the AWS documentation.

Configuring EBS Volumes

EBS volumes are configured in the instance template in the web UI or in the instance section of the configuration file for clusters launched with the CLI and bootstrap-remote. To configure EBS, provide the following information:
  • Number of EBS volumes you want
  • Type of the EBS volumes (gp2, st1, or sc1). All EBS volumes for an instance must be of the same type.
  • Size of the volumes. Specifying a size outside the allowable size range shown in the table above will cause cluster deployment to fail.
  • Encryption
    • Whether or not to encrypt data in the EBS volume
    • Whether to use the default KMS key for the EBS service or use a custom KMS key

EBS volumes for a Cloudera Manager or CDH cluster instance have the same lifecycle as the instance. This means that EBS volumes are terminated upon instance termination. Repair of an instance does not result in the remounting of an existing EBS volume; a new volume will be used.

EBS Volume Encryption

Optionally, the data within EBS volumes can be encrypted at rest. There are two properties for configuring EBS encryption:
  • enableEbsEncryption: Labeled Enable EBS Encryption in the web UI. Set to true or false. If this value is set to true, the data on EBS volumes created with this instance template will be encrypted.
  • ebsKmsKeyId: Labeled EBS KMS Key ID in the web UI. The key used to encrypt data in the EBS volumes. KMS includes a default master key for each service that supports encryption, including EBS. If you leave this field empty, Cloudera Director will configure the EBS volumes to use the KMS default master key for EBS. Alternatively, you can import a custom master key from your own key management infrastructure into KMS and specify it here to be used for the EBS service. To specify a custom master key, enter the full Amazon Resource Name (ARN) of the custom master key that you have stored in KMS: arn:aws:kms:your_key_name. For example:
    arn:aws:kms:us-west-1:635144601417:key/39b8cdf2-923e-721b-9c6c-652a7e517d72

For more information about EBS encryption, see Amazon EBS Encryption in the AWS documentation. For more information about KMS, see AWS Key Management Service Details in the AWS documentation.

Configuring EBS Volume with the web UI

To configure EBS volumes in the web UI, provide the required values in the Advanced Options section of the instance template:


Configuring EBS Volumes with the Configuration File

To configure EBS volumes in the configuration file for launching clusters with bootstrap-remote, provide the required values and uncomment them in the EBS Volumes section of the file:
  #
  # EBS Volumes
  #
  # Director can create and attach additional EBS volumes to the instance. These volumes
  # will be automatically deleted when the associated instance is terminated. These
  # properties don't apply to the root volume.
  #
  # See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumes.html
  #
  # ebsVolumeCount : 0
  # ebsVolumeType: st1 # specify either st1, sc1 or gp2 volume type
  # ebsVolumeSizeGiB: 500
  #
  # EBS Volume Encryption
  #
  # Encryption can be enabled on the additional EBS volumes. An optional CMK can
  # be specified for volume encryption. Not setting a CMK means the default CMK
  # for EBS will be used. The encryption here does not apply to the root volume.
  #
  # See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html
  #
  # enableEbsEncryption: false
  # ebsKmsKeyId: arn:aws:kms:REPLACE-ME  # full ARN of the KMS CMK