Overview of Cloudera Altus
Cloudera Altus is a cloud service platform with services that enable you to use CDH to analyze and process data at scale within a public cloud infrastructure. It is designed to provision clusters quickly and to make it easy for you to build and run your data workloads in the cloud.
Altus offers multiple distributed processing engine options, including MapReduce2 (MR2), Hive on MR2, Spark, and Hive on Spark (HoS), for different data engineering workloads. The processing engines allow you to manage workloads such as ETL, machine learning, and large scale data processing.
Altus works within the cloud service provider architecture. Altus creates clusters in a VPC in your AWS account and Altus jobs read input from and write output to Amazon S3.
Altus offers a command line interface (CLI) as well as a web user interface. You use the Altus console or the CLI to perform tasks, such as creating clusters and running jobs on the cluster. The Altus console also provides tools to facilitate administrative tasks, such as environment and account setup.
Altus provides a Data Engineering service that enables you to create clusters and run jobs specifically for data science and engineering workloads, including batch processing jobs. Altus offers a Cloudera Altus SDK for Java that you could use to connect to the Altus Data Engineering service and create and manage environments and clusters and run jobs from your application.
- An Altus environment defines the resources in your AWS account that are used for Altus clusters and jobs. You can set up and assign separate Altus environments to different users so
they can securely access only the AWS account and resources that you allow them to use.
- Ability to specify AWS accounts and resources for cluster deployment
- Ability to provision clusters into multiple AWS accounts from a single Altus account
- Ability to specify rights to an Altus environment on per user basis
- User Management
- You can assign roles to users to manage their Altus privileges and access to resources. Altus provides pre-defined roles that you can assign to data engineers and designated
- User authorization and access management
- Administrator and data engineer roles
- Altus provisions single-user, transient clusters in your AWS account. You can easily configure and create a cluster with the compute engine that you require to process your jobs.
- Fast and easy provisioning of data engineering clusters
- Support for the following compute engines: Hive, Spark, MapReduce2 (MR2), Hive on Spark
- Job queue per cluster
- Workflows with a single pipeline per cluster
- You can submit jobs to run on a cluster in your AWS account that contains the service you need.
- Job-centered model: submit, troubleshoot, clone, terminate, view history
- Optional single API call to create a cluster, process jobs, and terminate the cluster
- Optional ability to terminate job pipeline on error
- Reading from and writing to Amazon S3 for customer data processing
- Troubleshooting and monitoring:
- Access to Cloudera Manager monitoring, metrics, and job history servers
- Server and workload log files archived to Amazon S3
- Workload Analytics: monitor and optimize job performance and troubleshoot issues
- Ability to use Spot instances for compute worker nodes