Introduction
Cloudera Data Platform (CDP) leverages the best tools for data security and governance - Apache Atlas and Apache Ranger. Administrators can easily define security policies based on Atlas metadata tags and apply a security policy in real-time to the entire hierarchy of entities, including databases, tables, and columns.
You will learn how to classify your data, who can access the data and how to mask the data.
Prerequisites
- Have access to Cloudera Data Platform (CDP) Public Cloud
- Be familiar with Cloudera Essentials for CDP (A Tour of the CDP User Interface)
Outline
- Environment Setup
- Create Classification (Atlas)
- Create Tag Based Policy (Ranger)
- Summary
- Further Reading
There are two ways to watch the tutorial-video:
- Watch entire video using the link below
- Watch section-by-section using the link under each related section
Our environment consists of
- One Hive table (employee_data) - focus on salary column.
- Three users (your environment will be different)
- gdeleon, administrator; associated with group cdp_sandbox-default
- joe_analyst, user; not associated with any group
- ivanna_eu_hr, user; not associated with any group
Let's begin:
Select Data Warehouse from Cloudera Data Platform (CDP) home page
Open DAS by first locating your virtual warehouse, then:
- Click on
- Open DAS

From Data Analytics Studio (DAS):
- Click Compose
- Enter the following code in the Worksheet
CREATE DATABASE IF NOT EXISTS dbgr; CREATE TABLE IF NOT EXISTS dbgr.employee_data ( id INT, first_name STRING, last_name STRING, email STRING, title STRING, salary DECIMAL(10,2) ); INSERT INTO dbgr.employee_data SELECT INLINE(array( struct(1 , "Patty" , "Harvison" , "PattyHarvison@somewhere.com" , "Accountant I" , 48532.04) ,struct(2 , "Abbey" , "Ledingham" , "AbbeyLedingham@somewhere.com" , "Marketing Assistant" , 58700.35) ,struct(3 , "Tricia" , "Budgey" , "TriciaBudgey@somewhere.com" , "Nuclear Power Engineer" , 48081.25) ,struct(4 , "Saraann" , "Corwin" , "SaraannCorwin@somewhere.com" , "Professor" , 49246.32) ,struct(5 , "Reese" , "Bownes" , "ReeseBownes@somewhere.com" , "Marketing Manager" , 70615.84) ,struct(6 , "Jennee" , "Hawson" , "JenneeHawson@somewhere.com" , "Clinical Specialist" , 61017.10) ,struct(7 , "Malinde" , "Kabsch" , "MalindeKabsch@somewhere.com" , "Developer I" , 48767.52) ,struct(8 , "Darline" , "Wagstaffe" , "DarlineWagstaffe@somewhere.com" , "Quality Engineer" , 61330.88) ,struct(9 , "Rhona" , "Damarell" , "RhonaDamarell@somewhere.com" , "Legal Assistant" , 42030.92) ,struct(10 , "Dagmar" , "Sandom" , "DagmarSandom@somewhere.com" , "Staff Scientist" , 74302.82) ,struct(11 , "Debora" , "Bielfelt" , "DeboraBielfelt@somewhere.com" , "Assistant Media Planner" , 59329.91) ,struct(12 , "Yule" , "Morigan" , "YuleMorigan@somewhere.com" , "Systems Administrator II" , 72053.94) ,struct(13 , "Clarette" , "Naptine" , "ClaretteNaptine@somewhere.com" , "GIS Technical Architect" , 74593.99) ,struct(14 , "Leonard" , "Petrik" , "LeonardPetrik@somewhere.com" , "Financial Analyst" , 49876.08) ,struct(15 , "Colver" , "Scudamore" , "ColverScudamore@somewhere.com" , "Media Manager IV" , 55048.58) ));
- EXECUTE

Have each user (gdeleon, joe_analyst and ivanna_eu_hr) run the query below. It should be successful for everyone.
SELECT * FROM dbgr.employee_data;

Open Atlas for your tenant:
Beginning from CDP home page > Data Warehouse:
- Click on Overview
- Search for your Database Catalog
- Click on
- Open Atlas

Let's create a new classification:
- Click on CLASSIFICATION
- Select PLUS symbol
Create a new classification, sensitive, with the following attributes:
- Name
sensitive
- Description
holds sensitive data
Search for the table we want to assign this new classification.
Use the following search criteria:
- Basic search
- Search By Type
hive_table
- Search By Text
employee_data
Click on Search
- Click on table name - employee_data

Let's assign our new classification, sensitive, to column salary:
- Click on Schema
- Click on + sign, next to column salary
- Select sensitive and Propagate option
- Click Add

Open Ranger for your tenant:
Beginning from CDP home page > Data Warehouse:
- Click on Overview
- Search for your Database Catalog
- Click on
- Open Ranger

Let's create a tag-based policy, also known as, Access-Based Attribute Control (ABAC).
- Click on Access Manager
- Select Tag Based Policies
- Click on cm_tag to edit existing service
Note: Your service name may be different from ours.

Access policies allow us to place restrictions on data columns that are specially marked. In this example, we will restrict our sensitive classified columns only to users in group cdp_sandbox-default and joe_analyst. No one else should be able to access or read data marked as sensitive.
Select Access tab, then Add New Policy.
Add a new policy using:
- Policy Type Access
- Policy Name
sensitive_access
- TAG
sensitive
- Description
access to sensitive classified columns
- Audit Logging YES
- enabled
- Allow Conditions #1: > Select Group > cdp_sandbox-default
- Allow Conditions #1: > Component Permissions > hive(all permissions)
- Allow Conditions #2: > Select User > joe_analyst
- Allow Conditions #2: > Component Permissions > hive(only select permissions)
- Deny All Other Accesses True
- click on Add

Have each user (gdeleon, joe_analyst and ivanna_eu_hr) re-run the query below.
SELECT * FROM dbgr.employee_data;
User gdeleon belongs to group cdp_sandbox-default, therefore it successfully ran.
User joe_analyst was explicitly given select access, therefore it successfully ran.
It failed for ivanna_eu_hr - Permission denied: user [ivanna_eu_hr] does not have [SELECT] privilege. This user does not belong to group cdp_sandbox-default nor was given select access.
Using the select statement below, let's modify the query by removing the sensitive column (salary); statement now runs successfully.
select id,first_name,last_name,email,title from dbgr.employee_data;
Knowledge growth questions/problems:
- Disable/Enable the policy, what happens?
- Modify the policy to allow ivanna_eu_hr select privileges
Masking Policy
We are going place viewing restrictions on our sensitive classified columns. Although a user may have access to the sensitive data, we may want mask the real data.
Only users in group cdp_sandbox-default should see real data. All others should see masked data.
Select Masking tab, then Add New Policy.
Add a new policy using:
- Policy Type Masking
- Policy Name
sensitive_masking
- TAG
sensitive
- Description
mask sensitive data
- Audit Logging YES
- enabled
- Mask Conditions #1: > Select Group > cdp_sandbox-default
- Mask Conditions #1: > Access Types > hive(select)
- Mask Conditions #1: > Select Masking Option > Unmasked(retain original value)
- Mask Conditions #2: > Select User > joe_analyst
- Mask Conditions #2: > Access Types > hive(select)
- Mask Conditions #2: > Select Masking Option > Nullify
- Click on Add

Have user (gdeleon) re-run the query below. It runs successfully - showing all data; no masking.
SELECT * FROM dbgr.employee_data;

Have user (joe_analyst) re-run the query below. It runs successfully. However, salary data is masked with nulls.
SELECT * FROM dbgr.employee_data;

Knowledge growth questions/problems:
- Disable/Enable the mask policy, what happens?
- Modify the masking policy to conceal data with a different option, other than nulls.
Visit Cloudera's Collections-SDX library of videos. They provide a great overview of Cloudera's Shared Data Experience (SDX). Here are two that related to this tutorial:
Cloudera OnDemand provides world-class training - anywhere, anytime.
Controlling and Auditing Data Access explains how CDP leverages the best tools for security and data governance, providing many example and scenarios.