Cloudera Primary User Personas

Cloudera has defined the following set of personas described in this topic. These personas are characters based on real people, where each persona represents a user type. This collection of personas helps define the goals and activities of typical users of Cloudera products. Defining personas for software products is a moving target because user types change over time. This collection is the result of a 2018 study collecting data from about fifteen leaders in Cloudera product management and engineering. These primary personas are being validated with some customers to ensure their accuracy and will be updated as needed.

Infrastructure
Data Ingest, ETL, and Metadata Management
Analytics and Machine Learning

Infrastructure

The personas in this group use either Cloudera Manager or Altus to manage CDH clusters on-premises or in the cloud.

Jim — Senior Hadoop Administrator

Skills and Background

Very strong knowledge of HDFS and Linux administration
Understanding of:
- Distributed/grid computing
- VMs and their capabilities
- Racks, disk topologies, and RAID
- Hadoop architecture
Proficiency in Java

Tools:

Cloudera

Cloudera Manager/CDH
Navigator
BDR
Workload XM

Third-party Tools: Configuration management tools, log monitoring tools, for example, Splunk, Puppet, Chef, Ganglia, or Grafana

Goals:

Achieve consistent high availability and performance on Hadoop clusters
User administration, including creating new users and updating access control rights upon demand

Typical Tasks:

Monitor cluster performance to ensure high percentage up time
Back up and replicate appropriate files to ensure disaster recovery
Schedule and perform cluster upgrades
Security: enable and check status of security services and configurations
Analyze query performance with Workload XM to ensure optimum cluster performance
Provision new clusters

Jen — Junior Hadoop Administrator

Skills and Background

Basic knowledge of HDFS
Limited knowledge of Linux (shell scripting mostly)
General understanding of:
- Distributed/grid computing
- VMs and their capabilities
- Racks, disk topologies, and RAID
- Hadoop architecture

Tools:

Cloudera

Cloudera Manager/CDH
Navigator
Workload XM

Third-party Tools: Configuration management tools, log monitoring tools, for example, Splunk, Puppet, Chef, Ganglia, or Grafana

Goals:

Maintain high availability and performance of Hadoop clusters

Typical Tasks:

Perform basic procedures to ensure clusters are up and running
Perform maintenance work flows

Sarah — Cloud Administrator

Skills and Background

Understands public cloud primitives (Virtual Private Cloud)
Understands security access policies (Identity Access Management)
Proficiency in Java

Tools:

Cloudera

Altus

Third-party Tools: Amazon Web Services, Microsoft Azure

Goals:

Maintain correct access to cloud resources
Maintain correct resource allocation to cloud resources, such as account limits

Typical Tasks:

Create the Altus environment for the organization

Data Ingest, ETL, and Metadata Management

The personas in this group typically use Navigator, Workload XM, HUE, Hive, Impala, and Spark.

Terence — Enterprise Data Architect or Modeler

Skills and Background

Experience with:
- ETL process
- Data munging
- Wide variety of data wrangling tools

Tools:

Cloudera

Navigator
Workload XM
HUE
Hive
Impala
Spark

Third-party Tools: ETL and other data wrangling tools

Goals:

Maintain organized/optimized enterprise data architecture to support the business needs
Ensure that data models support improved data management and consumption
Maintain efficient schema design

Typical Tasks:

Organize data at the macro level: set architectural principles, create data models, create key entity diagrams, and create a data inventory to support business processes and architecture
Organize data at the micro level: create data models for specific applications
Map organization use cases to execution engines (Impala, Spark, Hive)
Provide logical data models for the most important data sets, consuming applications, and data quality rules
Provide data entity descriptions
Ingest new data into the system: use ingest tools, monitor ingestion rate, data formatting, and partitioning strategies

Kara — Data Steward and Data Curator

Skills and Background

Experience with:
- ETL process
- Data wrangling tools

Tools:

Cloudera

Navigator
HUE data catalog

Third-party Tools: ETL and other data wrangling tools

Goals:

Maintain metadata (technical and custom)
Maintain data policies to support business processes
Maintain data lifecycle at Hadoop scale
Maintain data access permissions

Typical Tasks:

Manage technical metadata
Classify data at Hadoop scale
Create and manage custom and business metadata using policies or third-party tools that integrate with Navigator

Analytics and Machine Learning

The personas in this group typically use Cloudera Data Science Workbench (CDSW), HUE, HDFS, and HBase.

Song — Data Scientist

Skills and Background

Statistics
Related scripting tools, for example R
Machine learning models
SQL
Basic programming

Tools:

Cloudera

CDSW
HUE to build and test queries before adding to CDSW
HDFS
HBase

Third-party Tools: R, SAS, SPSS, and others. Command-line scripting languages such as Scala, Python, Tableau, Qlik, and some Java

Goals:

Solve business problems by applying advanced analytics and machine learning in an ad hoc manner

Typical Tasks:

Access, explore, and prepare data by joining and cleaning it
Define data features and variables to solve business problems as in data feature engineering
Select and adapt machine learning models or write algorithms to answer business questions
Tune data model features and hyper parameters while running experiments
Publish the optimized model for wider use as an API for BI Analysts or Data Owners to use as part of their reporting
Publish data model results to answer business questions for consumption by Data Owners and BI Analysts

Jason — Machine Learning Engineer

Skills and Background

Machine learning and big data skills
Software engineering

Tools:

Cloudera

Spark
HUE to build and test queries before adding to application
CDSW

Third-party Tools: Java

Goals:

Build and maintain production machine learning applications

Typical Tasks:

Set up big data machine learning projects at companies such as Facebook

Cory — Data Engineer

Skills and Background

Software engineering
SQL mastery
ETL design and big data skills
Machine learning skills

Tools:

Cloudera

CDSW
Spark/MapReduce
Hive
Oozie
Altus Data Engineering
HUE
Workload XM

Third-party Tools: IDE, Java, Python, Scala

Goals:

Create data pipelines (about 40% of working time)
Maintain data pipelines (about 60% of working time)

Typical Tasks:

Create data workflow paths
Create code repository check-ins
Create XML workflows for production system launches

Sophie — Application Developer

Skills and Background

Deep knowledge of software engineering to build real-time applications

Tools:

Cloudera

HBase

Third-party Tools: Various software development tools

Goals:

Applications developed run and successfully send workloads to the cluster. For example, connects a front-end to HBase on the cluster.

Typical Tasks:

Develops application features, but does not write the SQL workload. Rather writes the application that sends the workloads to the cluster.
Tests applications to ensure they run successfully

Abe — SQL Expert/SQL Developer

Skills and Background

Deep knowledge of SQL dialects and schemas

Tools:

Cloudera

HUE
Cloudera Manager to monitor Hive queries
Hive via command line or HUE
Impala via HUE, another BI tool, or the command line
Navigator via HUE
Sentry via HUE
Workload XM via HUE

Third-party Tools: SQL Studio, TOAD

Goals:

Create workloads that perform well and that return the desired results

Typical Tasks:

Create query workloads that applications send to the cluster
Ensure optimal performance of query workloads by monitoring the query model and partitioning strategies
Prepare and test queries before they are added to applications

Kiran — SQL Analyst/SQL User

Skills and Background

Has high-level grasp of SQL concepts, but prefers to drag and drop query elements
Good at data visualization, but prefers pre-populated tables and queries

Tools:

Cloudera

HUE
Cloudera Manager to monitor queries
Oozie to schedule workloads
Impala (rather than Hive)

Third-party Tools: Reporting and business intelligence tools like Cognos, Crystal Reports

Goals:

To answer business questions and problems based on data

Typical Tasks:

Create query workloads that applications send to the cluster
Ensure optimal performance of queries (query model, partitioning strategies)

Christine — BI Analyst

Skills and Background

Ability to:
- View reports and drill down into results of interest
- Tag, save, share reports and results

Tools:

Cloudera

HUE
Navigator via HUE

Third-party Tools: SQL query tools, Tableau, Qlik, Excel

Goals:

Apply data preparation and analytic skills to solve recurrent business problems. For example, to create a weekly sales report.
Provide reports for the Business/Data Owner

Typical Tasks:

Access, explore, and prepare data by joining and cleaning it
Create reports to satisfy requests from business stakeholders to solve business problems

Categories: Personas | User Definitions | User Personas | All Categories

Cloudera Introduction

CDH Overview