Amazon Web Services (AWS) Security

Amazon Web Services (AWS) is Amazon's cloud solution that offers compute, storage, networking, and other infrastructure services that can be used for Cloudera cluster deployments, whether completely cloud-based or in combination with on-premises clusters.

For example, Amazon Elastic Compute Cloud (EC2) can be used for the instances that make-up the nodes of a Cloudera cluster deployed to the AWS cloud. Amazon's cloud-based storage solution, Amazon Simple Storage Service (Amazon S3), can be used by both on-premises and AWS-cloud-based clusters in various ways, including as storage for Impala tables for direct use by Hue and Hive, and other CDH components such as HDFS client, Hive, Impala, MapReduce.

As of release 5.11, Cloudera Manager supports Amazon's IAM-role based access to Amazon S3 storage, in addition to its prior support of AWS access key and secret key. See How to Configure AWS Credentials for details.

For any AWS service, including Amazon S3, you must obtain an Amazon Web Services account and have appropriate access to the AWS Management Console to set up the various services you want, including Amazon S3. Assuming you have an account for AWS, to provide access from your Cloudera cluster to Amazon S3 storage you must configure AWS credentials.

Getting Started with Amazon Web Services

To get started with AWS, including Amazon S3, you must have:
  1. An Amazon Web Services account. Both Amazon and Cloudera recommend that you do not use your primary Amazon account—known as the root account—for working with Amazon S3 and other AWS services. See the AWS IAM documentation for details about how to set up your AWS account.
  2. Access to the AWS Management Console and appropriate permissions to create and configure the AWS services needed for your use case, such as the following:
    1. AWS Elastic Compute Cloud (EC2) to deploy your cluster to the AWS cloud.
    2. AWS Identity and Access Management (IAM) to set up users and groups, or to set up an IAM role.
    3. Amazon S3 and the specific storage bucket (or buckets) for use with your cluster.
    4. Amazon DynamoDB to enable the database needed by Cloudera S3Guard, if you plan to enable S3Guard for your cluster. Cloudera S3Guard augments Amazon S3 with a database to track metadata changes so that the 'eventual consistency' model inherent to Amazon S3 does not pose a problem for transactions or other use cases in which changes may not be apparent to each other in real time. See Configuring and Managing S3Guard in Cloudera Administration for details. To use S3Guard, you will also need to set up the appropriate access policy (create table, read, write) to DynamoDB for the same AWS identity that owns the Amazon S3 storage.
    5. AWS Key Management Services (KMS) (AWS KMS) to create encryption keys for your Amazon S3 bucket if you plan to use SSE-KMS for server-side encryption (not necessary for SSE-S3 encryption. See How to Configure Encryption for Amazon S3 for details).

Configuration Properties Reference

This table provides reference documentation for the core-site.xml properties relevant for use with AWS and Amazon S3.

Property Description
fs.s3a.server-side-encryption-algorithm Enable server-side encryption for the Amazon S3 storage bucket associated with the cluster. Allowable values:
  • AES256 Specifies SSE-S3 server-side encryption for Amazon S3.
  • SSE-KMS Specifies SSE-KMS server-side encryption for Amazon S3. Requires adding the fs.s3a.server-side-encryption-key property with a valid value.
fs.s3a.server-side-encryption-key Specify the ARN, ARN plus alias, Alias, or globally unique ID of the key created in AWS Key Management Service for use with SSE-KMS.
fs.s3a.awsAccessKeyId Specify the AWS access key ID. This property is irrelevant and not used to access Amazon S3 storage from a cluster launched using an IAM role.
fs.s3a.awsSecretAccessKey Specify the AWS secret key provided by Amazon. This property is irrelevant and not used to access Amazon S3 storage from a cluster launched using an IAM role.
fs.s3a.endpoint Use this property only if the endpoint is outside the standard region (s3.amazonaws.com), such as regions and endpoints in China or in the US GovCloud. See AWS regions and endpoints documentation for details.
fs.s3a.connection.ssl.enabled Enables (true) and disables (false) TLS/SSL connections to Amazon S3. Default is true.