X

Cloudera Tutorials

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products. Login or register below to access all Cloudera tutorials.

By registering or submitting your data, you acknowledge, understand, and agree to Cloudera's Terms and Conditions, including our Privacy Statement.
By checking this box, you consent to receive marketing and promotional communications about Cloudera’s products and services and/or related offerings from us, or sent on our behalf, in accordance with our Privacy Statement. You may withdraw your consent by using the unsubscribe or opt-out link in our communications.

Cloudera acquires Octopai's platform to enhance metadata management capabilities

Read the press release

Introduction

Cloudera Data Platform (CDP) leverages the best tools for data security and governance - Apache Atlas and Apache Ranger. Administrators can easily define security policies based on Atlas metadata tags and apply a security policy in real-time to the entire hierarchy of entities, including databases, tables, and columns.

You will learn how to classify your data, who can access the data and how to mask the data.

 

Prerequisites

 

  • Have access to Cloudera Data Platform (CDP) Public Cloud
  • Be familiar with Cloudera Essentials for CDP (A Tour of the CDP User Interface)

 

Outline

 

 

There are two ways to watch the tutorial-video:

  1. Watch entire video using the link below
  2. Watch section-by-section using the link under each related section

 

Environment Setup

 

Our environment consists of

  • One Hive table (employee_data) - focus on salary column.
  • Three users (your environment will be different)
    • gdeleon, administrator; associated with group cdp_sandbox-default
    • joe_analyst, user; not associated with any group
    • ivanna_eu_hr, user; not associated with any group

 

Let's begin:
Select Data Warehouse from Cloudera Data Platform (CDP) home page

cdp-dw.PNG

 

Open DAS by first locating your virtual warehouse, then:

  1. Click on 
  2. Open DAS
open-das

 

From Data Analytics Studio (DAS):

  1. Click Compose
  2. Enter the following code in the Worksheet
CREATE DATABASE IF NOT EXISTS dbgr;
CREATE TABLE IF NOT EXISTS dbgr.employee_data (
  id INT,
  first_name  STRING,
  last_name   STRING,
  email       STRING,
  title       STRING,
  salary      DECIMAL(10,2)
);

INSERT INTO dbgr.employee_data
  SELECT INLINE(array(
 struct(1  ,  "Patty"     ,  "Harvison"   ,  "PattyHarvison@somewhere.com"     ,  "Accountant I"              ,  48532.04)
,struct(2  ,  "Abbey"     ,  "Ledingham"  ,  "AbbeyLedingham@somewhere.com"    ,  "Marketing Assistant"       ,  58700.35)
,struct(3  ,  "Tricia"    ,  "Budgey"     ,  "TriciaBudgey@somewhere.com"      ,  "Nuclear Power Engineer"    ,  48081.25)
,struct(4  ,  "Saraann"   ,  "Corwin"     ,  "SaraannCorwin@somewhere.com"     ,  "Professor"                 ,  49246.32)
,struct(5  ,  "Reese"     ,  "Bownes"     ,  "ReeseBownes@somewhere.com"       ,  "Marketing Manager"         ,  70615.84)
,struct(6  ,  "Jennee"    ,  "Hawson"     ,  "JenneeHawson@somewhere.com"      ,  "Clinical Specialist"       ,  61017.10)
,struct(7  ,  "Malinde"   ,  "Kabsch"     ,  "MalindeKabsch@somewhere.com"     ,  "Developer I"               ,  48767.52)
,struct(8  ,  "Darline"   ,  "Wagstaffe"  ,  "DarlineWagstaffe@somewhere.com"  ,  "Quality Engineer"          ,  61330.88)
,struct(9  ,  "Rhona"     ,  "Damarell"   ,  "RhonaDamarell@somewhere.com"     ,  "Legal Assistant"           ,  42030.92)
,struct(10 ,  "Dagmar"    ,  "Sandom"     ,  "DagmarSandom@somewhere.com"      ,  "Staff Scientist"           ,  74302.82)
,struct(11 ,  "Debora"    ,  "Bielfelt"   ,  "DeboraBielfelt@somewhere.com"    ,  "Assistant Media Planner"   ,  59329.91)
,struct(12 ,  "Yule"      ,  "Morigan"    ,  "YuleMorigan@somewhere.com"       ,  "Systems Administrator II"  ,  72053.94)
,struct(13 ,  "Clarette"  ,  "Naptine"    ,  "ClaretteNaptine@somewhere.com"   ,  "GIS Technical Architect"   ,  74593.99)
,struct(14 ,  "Leonard"   ,  "Petrik"     ,  "LeonardPetrik@somewhere.com"     ,  "Financial Analyst"         ,  49876.08)
,struct(15 ,  "Colver"    ,  "Scudamore"  ,  "ColverScudamore@somewhere.com"   ,  "Media Manager IV"          ,  55048.58)
));




  1. EXECUTE
das-create-table

 

Have each user (gdeleon, joe_analyst and ivanna_eu_hr) run the query below. It should be successful for everyone.

SELECT * FROM dbgr.employee_data;
das-select-nopolicy

 

Create Classification (Atlas)

 

Open Atlas for your tenant:

Beginning from CDP home page > Data Warehouse:

  1. Click on Overview
  2. Search for your Database Catalog
  3. Click on 
  4. Open Atlas
atlas-ranger-open

 

Let's create a new classification:

  1. Click on CLASSIFICATION
  2. Select PLUS symbol
atlas-add-classification

 

Create a new classification, sensitive, with the following attributes:

  1. Name sensitive
  2. Description holds sensitive data
atlas-create-sensitive-classification

 

Search for the table we want to assign this new classification.

Use the following search criteria:

  1. Basic search
  2. Search By Type hive_table
  3. Search By Text employee_data
  4. Click on Search
  5. Click on table name - employee_data
atlas-search-table

 

Let's assign our new classification, sensitive, to column salary:

  1. Click on Schema
  2. Click on + sign, next to column salary
  3. Select sensitive and Propagate option
  4. Click Add
atlas-add-sensitive-to-column

 

Create Tag Based Policy (Ranger)

Open Ranger for your tenant:

Beginning from CDP home page > Data Warehouse:

  1. Click on Overview
  2. Search for your Database Catalog
  3. Click on 
  4. Open Ranger
atlas-ranger-open

 

Let's create a tag-based policy, also known as, Access-Based Attribute Control (ABAC).

  1. Click on Access Manager
  2. Select Tag Based Policies
  3. Click on cm_tag to edit existing service

Note: Your service name may be different from ours.

ranger-tag-based-policy

 

We have two policy types to choose from: Access and Masking. Let's look at both.

 

Access Policy

 

Access policies allow us to place restrictions on data columns that are specially marked. In this example, we will restrict our sensitive classified columns only to users in group cdp_sandbox-default and joe_analyst. No one else should be able to access or read data marked as sensitive.


Select Access tab, then Add New Policy.

 

Add a new policy using:

  1. Policy Type Access
  2. Policy Name sensitive_access
  3. TAG sensitive
  4. Description access to sensitive classified columns
  5. Audit Logging YES
  6. enabled
  7. Allow Conditions #1: > Select Group > cdp_sandbox-default
  8. Allow Conditions #1: > Component Permissions > hive(all permissions)
  9. Allow Conditions #2: > Select User > joe_analyst
  10. Allow Conditions #2: > Component Permissions > hive(only select permissions)
  11. Deny All Other Accesses True
  12. click on Add
ranger-access-policy

 

Have each user (gdeleon, joe_analyst and ivanna_eu_hr) re-run the query below.

SELECT * FROM dbgr.employee_data;

User gdeleon belongs to group cdp_sandbox-default, therefore it successfully ran.
User joe_analyst was explicitly given select access, therefore it successfully ran.

 

It failed for ivanna_eu_hr - Permission denied: user [ivanna_eu_hr] does not have [SELECT] privilege. This user does not belong to group cdp_sandbox-default nor was given select access.
Using the select statement below, let's modify the query by removing the sensitive column (salary); statement now runs successfully.

select id,first_name,last_name,email,title from dbgr.employee_data;

 

Knowledge growth questions/problems:

  • Disable/Enable the policy, what happens?
  • Modify the policy to allow ivanna_eu_hr select privileges

 

Masking Policy

 

We are going place viewing restrictions on our sensitive classified columns. Although a user may have access to the sensitive data, we may want mask the real data.

Only users in group cdp_sandbox-default should see real data. All others should see masked data.

Select Masking tab, then Add New Policy.

 

Add a new policy using:

  1. Policy Type Masking
  2. Policy Name sensitive_masking
  3. TAG sensitive
  4. Description mask sensitive data
  5. Audit Logging YES
  6. enabled
  7. Mask Conditions #1: > Select Group > cdp_sandbox-default
  8. Mask Conditions #1: > Access Types > hive(select)
  9. Mask Conditions #1: > Select Masking Option > Unmasked(retain original value)
  10. Mask Conditions #2: > Select User > joe_analyst
  11. Mask Conditions #2: > Access Types > hive(select)
  12. Mask Conditions #2: > Select Masking Option > Nullify
  13. Click on Add
ranger-mask-policy

 

Have user (gdeleon) re-run the query below. It runs successfully - showing all data; no masking.

SELECT * FROM dbgr.employee_data;
das-select-gdeleon-nomasking

 

Have user (joe_analyst) re-run the query below. It runs successfully. However, salary data is masked with nulls.

SELECT * FROM dbgr.employee_data;
das-select-masked

 

Knowledge growth questions/problems:

  • Disable/Enable the mask policy, what happens?
  • Modify the masking policy to conceal data with a different option, other than nulls.

 

Summary

Great job! You have learned to classify your data, created an access policy to restrict access and created a masking policy to preventing users for seeing sensitive data.

 

Further Reading

Visit Cloudera's Collections-SDX library of videos. They provide a great overview of Cloudera's Shared Data Experience (SDX). Here are two that related to this tutorial:

 

Cloudera OnDemand provides world-class training - anywhere, anytime.

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.