Overview of Cloudera Altus
Cloudera Altus is a cloud service platform with services that enable you to use CDH to analyze and process data at scale within a public cloud infrastructure, including Amazon Web Services (AWS) and Microsoft Azure. Altus can provision clusters quickly and make it easy for you to build and run your data workloads in the cloud.
Altus works within the cloud service provider architecture. You can choose the cloud service provider on which Altus creates your clusters and runs your jobs. On AWS, Altus creates clusters in a VPC in your AWS account and Altus jobs read input from and write output to Amazon S3. On Azure, Altus creates clusters in a virtual network (VNet) in your Azure subscription and Altus jobs read input from and write output to Azure Data Lake Store (ADLS).
Altus offers a web user interface, a command line interface (CLI), as well as the Altus SDK for Java. You can use the Altus console, CLI, or SDK to create and manage environments and clusters, run jobs, and perform tasks in Altus. The Altus console provides tools to facilitate administrative tasks, such as setting up an environment and generating access keys. The Altus SDK for Java enables you to connect to the Altus Data Engineering service and perform the Altus tasks from your application.
- Altus Data Engineering service
- The Altus Data Engineering service enables you to create clusters and run jobs specifically for data science and engineering workloads. Altus offers multiple distributed processing engine options, including Hive, Spark, Hive on Spark, and MapReduce2 (MR2), which allow you to manage workloads in ETL, machine learning, and large scale data processing.
- Altus Data Warehouse service
- The Altus Data Warehouse service enables you to create clusters running the Impala SQL engine to access data in your cloud storage for business analysis and reporting. You can use the query editor in the Altus console to query the data or use standard business intelligence tools with ODBC or JDBC to connect to Data Warehouse clusters to query the data.
- Altus Shared Data Experience (SDX) service
- The Altus Shared Data Experience (SDX) service provides a consistent view of data for CDH clusters and workloads running on the cloud. The Altus SDX namespace externalizes cluster metadata into a shared, long-running service available to multiple clusters and workloads running on the cloud. You can use the Altus SDX namespace with workloads that run on CDH clusters on the cloud accessing data in Amazon S3 or Azure Data Lake Store (ADLS).
An Altus environment identifies the resources in your AWS account or Azure subscription to be used for Altus clusters and jobs. The environment allows you to create clusters in multiple AWS accounts or Azure subscriptions from a single Altus account.
You can set up and assign separate Altus environments to different users so they can access only the resources that you allow them to use.
- User authorization and access management
You can assign roles to users to manage their Altus privileges and access to resources. Altus provides pre-defined roles that you can assign to users and designated administrators.
- Clusters for Altus Data Engineering workloads
The Altus Data Engineering services provisions single-user, transient clusters in your AWS account or Azure subscription. You can easily configure and create a cluster with the compute engine that you require to process your jobs: Hive, Spark, Hive on Spark, or MapReduce2 (MR2). You can also create a cluster that supports multiple compute engines to run your jobs: Hive, Spark, and MapReduce2.
Each cluster has a job queue to manage the jobs that run on the cluster and supports a workflow with a single pipeline.
- Clusters for Altus Data Warehouse workloads
The Altus Data Warehouse service provisions clusters in your AWS account or Azure subscription that can be accessed by multiple Altus users. You can easily configure and create a cluster running the Impala SQL engine to enable you to iteratively access your data in your cloud object storage for analysis and reporting.
- Altus Data Engineering Jobs
You can submit jobs to run on a cluster that contains the service you need. On AWS, Altus jobs read input from and write output to Amazon S3. On Azure, Altus jobs read input from and write output to Azure Data Lake Store.
The Altus Data Engineering workflow centers on the job. You can submit a job, create a cluster on which to run the job, and terminate the cluster when the job completes, all in a single process. You can access Cloudera Manager where you can view the job history servers for the cluster. You can also generate reports in Workload Analytics to monitor and optimize job performance and troubleshoot issues.
- Altus SDX Namespaces
The Altus SDX namespace points to a database that stores metadata for data accessed by CDH clusters on the cloud, providing a common and consistent view of the data to the clusters. When an Altus SDX namespace is shared across multiple Altus clusters that access the same data, the clusters can immediately access the metadata without the need for each cluster to recreate the metadata.
You can use an Altus SDX namespace for clusters that access data in Amazon S3 or in Azure Data Lake Store (ADLS).
Cloudera Altus provides the following user interfaces:
- Altus console
- Command line interface (CLI)
- Altus SDK for Java
The Altus console is the web user interface that provides a visual way to administer Altus users and environments and to create clusters and run jobs.
If you are an Altus administrator, you can perform all tasks on the Altus console. If you are not an administrator, the role and environment assigned to you determine the areas of the console that you can access.
To access the Altus console, go to the following URL: https://console.altus.cloudera.com/
My Account Page
You can view information about your Altus account in the My Account page. To get to the My Account page, click your user name and select My Account.
If you are an Altus administrator, you can change the roles and resources assigned to you. You can create access keys and delete or deactivate the access keys that you have created. Deactivate a key if you do not want the key to be used to access Altus. You can reactivate a key at any time.
If you are not an administrator, you can view your user account information and your access keys.
Altus provides a command-line interface (CLI) through a Python client. If you are an Altus administrator, you can use the CLI to perform all tasks in Altus. If you are not an administrator, the role and environment assigned to you determine the commands that you can run.
For more information about setting up the CLI, see Cloudera Altus Client Setup.
Altus SDK for Java
You can use the Cloudera Altus SDK for Java to programmatically access Altus services and create and manage environments and clusters and to run jobs from your Java application.
For more information about using the Cloudera Altus SDK for Java, see Using the Altus SDK for Java.
The Cloudera Altus console is validated and tested against the latest version and supports recent versions of the following browsers
- Google Chrome
- Mozilla Firefox
- Internet Explorer 11
- Safari 9 or higher
- Microsoft Edge