X

Cloudera Tutorials

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products. Login or register below to access all Cloudera tutorials.

Cloudera named a leader in 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems Get the report

Ready to Get Started?

Tag Based Policies with Apache Ranger and Apache Atlas

Overview
  1. Setting up the environment
  2. Assigning Tag Based policies with Atlas

 

NOTICE

 

As of January 31, 2021, this tutorial references legacy products that no longer represent Cloudera’s current product offerings.

Please visit recommended tutorials:

 

Introduction

In this section of the tutorial you will begin assigning policies to the users of our sandbox, you will be introduced to user accounts available and then you will assign permissions on data based on the persona's role.

Prerequisites

Outline

Concepts

The Sandbox's Hive policies are such that when a new table is created, everyone has access to it. This is convenient for us because the data in the tables we create is fictitious; however, image a scenario where a Hive table hold sensitive information (e.g. SSN, or Birthplace) we should be able to Govern the data and only give access to authorized users. In this section we will recreate a scenario where certain users do not have access to sensitive data; however, Raj our cluster operator has been approved to access the data, so we will create Tag Based Policies to granularly grant him access to the sensitive data.

Access Without Tag Based Policies

In this section you will create a brand new hive table called employee in the default database of our Sandbox.

Keep in mind, for this new table, no policies have been created to authorize what our sandbox users can access within this table and its columns.

1. Go to Data Analytics Studio or DAS and click on the Data Analytics Studio UI or go to sandbox-hdp.hortonworks.com:30800.

2. Create the employee table:

create table employee (ssn string, name string, location string)
row format delimited
fields terminated by ','
stored as textfile;

Then, click the green Execute button.

create-hive-table

3. Verify the table was created successfully by going to Database>Table tab:

list-hive-table

4. Now we will populate this table with data.

5. Enter the HDP Sandbox's CentOS command line interface by using the Web Shell Client at:

sandbox-hdp.hortonworks.com:4200

Login credentials are:

username = root password = hadoop

Note: hadoop is the initial password, but you will asked to change it after first sign in.

6. Create the employee file with the following data using the command:

printf "111-111-111,James,San Jose\\n222-222-222,Christian,Santa Clara\\n333-333-333,George,Fremont" > employeedata.txt

7. Copy the employeedata.txt file from your centOS file system to HDFS. The particular location the file will be stored in is Hive warehouse's employee table directory:

hdfs dfs -copyFromLocal employeedata.txt /warehouse/tablespace/managed/hive/employee

8. Go back to DAS Verify the hive table employee has been populated with data:

select * from employee;

Execute the hive query to the load the data.

load-employee-data

Notice you have an employee data table in Hive with ssn, name and location as part of its columns.

The ssn and location columns hold sensitive information and most users should not have access to it.

Create a Ranger Policy to Limit Access of Hive Data

Your goal is to create a Ranger Policy which allows general users access to the name column while excluding them access to the ssn and location columns.

This policy will be assigned to maria_dev and raj_ops.

1. Go to Ranger UI on and Click on sandbox_hive

sandbox-hdp.hortonworks.com:6080

Table 2: Ranger Login credentials

Username Password
admin hortonworks1

The Ranger UI homepage should look similar to the image below:

ranger-homepage-admin

2. Select grant-XXXXXXXXXXX (policy value will varies per sandbox)

ranger-maria-dev-permission

3. Ensure that the policy for table default is disabled as shown in the image below, then Save the changes.

ranger-maria-dev-permission-disabled

4. Now select Add New Policy:

new-sandbox-hive-policies

5. In the Policy Details field, enter following values:

Policy Name - Policy to Restrict Employee Data
Database - default
table - employee
Hive Column - ssn, location (NOTE : Do NOT forget to EXCLUDE these columns)
Description - Any description

6. In the Allow Conditions, it should have the following values:

Select Group – blank, no input
Select User – raj_ops, maria_dev
Permissions – Click on the + sign next to Add Permissions and click on select and then green tick mark.

add-permission

You should have your policy configured like this, then click on Add.

policy-restrict

7. You can see the list of policies that are present in Sandbox_hive.

employee-policy-added-admin

8. Disable the all - global Policy to take away raj_ops and maria_dev access to the employee table's ssn and location column data.

hive-global-policy-admin

Go inside this Policy, to the right of Policy Name there is an enable button that can be toggled to disabled. Toggle it. Then click save.

disabled-access

Verify Ranger Policy is in Effect

We are going to verify if maria_dev has access to the Hive employee table. To do this we will use beeline.

1. Login to Ambari with user/password: maria_dev/maria_dev

maria-dev-ambari-login

2. Go to Hive > HIVESERVER2 JDBC URL and click on the Clipboard at the end of the JDBC URL. This will copy the Hiverserver2 JDBC URL.

maria-hive-jdbc-url

3. Go to Shell-in-Box at:

sandbox-hdp.hortonworks.com:4200

username = root password = hadoop

Note: hadoop is the initial password, but you will asked to change it after first sign in.

4. Type the following command in beeline and paste the JDBC URL in between the quotes.

beeline -u "Paste the JDBC URL here" -n maria_dev

beeline-maria-dev-user

5. Enter the command below in beeline:

select * from employee;

6. An authorization error will appear. This is expected as the user maria_dev and raj_ops do not have access to 2 columns in this table (ssn and location).

load-data-authorization-error

7. For further verification, you can view the Audit tab in Ranger. Go back to Ranger and click on Audits=>Access and select user => maria_dev. You will see the entry of Access Denied for maria_dev. maria_dev tried to access data she didn't have authorization to view.

new-policy-audit

5. Return to beeline, try running a query to access the name column from the employee table. maria_dev should be able to access that data.

SELECT name FROM employee;

beeline-mariadev-access-successful

The query runs successfully. Even, raj_ops user cannot not see all the columns for the location and SSN. We will provide access to this user to all columns later via Atlas Ranger Tag Based Policies.

Create Atlas Tag to Classify Data

The goal of this section is to classify all data in the ssn and location columns with a PII* tag. So later when we create a Ranger Tag Based Policy, users who are associated with the PII tag can override permissions established in the Ranger Resource Board policy.

1. Reset Admin user password:

If you haven't already reset your Ambari Admin password we will use it to log into Atlas.

1. Login into Atlas UI:

sandbox-hdp.hortonworks.com:21000

username & password : admin/admin123

atlas_login

2. Go to Classification and press the + Create Tag button to create a new tag.

  • Name the tag: PII
  • Add Description: Personal Identifiable Information

create_new_tag

Press the Create button. Then you should see your new tag displayed on the Classification page.

atlas-pii-tag-created

3. Go to the Search tab. In Search By Type, write hive_table

search-hive-tables

4. employee table should appear. Select it.

employee-table-atlas

  • How does Atlas get Hive employee table?

Hive communicates information through Kafka, which then is transmitted to Atlas. This information includes the Hive tables created and all kinds of data associated with those tables.

5. View the details of the employee table by clicking on its name.

hive-employee-atlas-properties

6. View the Schema associated with the table. It'll list all columns of this table.

hive-employee-atlas-schema

7. Press the green + button to assign the PII tag to the ssn column. Click save.

add-pii-tag-to-ssn

8. Repeat the same process to add the PII tag to the location column.

added-pii-tag-to-ssn-and-location

We have classified all data in the ssn and location columns as PII.

Create Ranger Tag Based Policy

Head back to the Ranger UI and log in using

Username/Password: admin/hortonworks1

The tag and entity (ssn, location) relationship will be automatically inherited by Ranger. In Ranger, we can create a tag based policy by accessing it from the top menu. Go to Access Manager → Tag Based Policies.

ranger-tag-based-policies

You will see a folder called TAG that does not have any repositories yet.

Click + button to create a new tag repository.

new-tag-rajops

Name it Sandbox_tag and click Add.

add-sandbox-tag-rajops

Click on Sandbox_tag to add a policy.

added-sandbox-tag-rajops

Click on the Add New Policy button.

add-new-policy-rajops

Enter the following details:

Policy Name – PII column access policy
Tag – PII
Description – Any description
Audit logging – Yes

pii-column-access-policy-rajops

In the Allow Conditions, it should have the following values:

Select Group - blank
Select User - raj_ops
Component Permissions - Select hive

You can select the component permission through the following popup. Check the checkbox to the left of the word component to give raj_ops permission to select, update, create, drop, alter, index, lock, all, read, write, repladmin, service admin, temporary udf admin operations against the hive table employee columns specified by PII tag.

new-allow-permissions

Please verify that Allow Conditions section is looking like the image below:

allow-conditions-rajops

This signifies that only raj_ops is allowed to do any operation on the columns that are specified by PII tag. Click Add.

pii-policy-created-rajops

Now click on Access Manager > Resource Based Policies and edit Sandbox_hive repository by clicking on the button next to it.

editing-sandbox-hive

Click on Select Tag Service and select Sandbox_tag. Click on Save.

new-edited-sandbox-hive

The Ranger tag based policy is now enabled for raj_ops user. You can test it by running the query on all columns in employee table on beeline.

Type the following command in beeline and paste the JDBC URL in between the quotes.

!q
beeline -u "Paste the JDBC URL here" -n raj_ops

beeline-rajops-user

select * from employee;

rajops-has-access-to-employee

The query executes successfully. The query can be checked in the Ranger Audit log which will show the access granted and associated policy which granted access.

Clear the existing query and select User > raj_ops in the search bar.

audit-results-rajops

NOTE: There are 2 policies which provided access to raj_ops user, one is a tag based policy and the other is hive resource based policy. The associated tags (PII) is also denoted in the tags column in the audit record).

Summary

Ranger traditionally provided group or user based authorization for resources such as table, column in Hive or a file in HDFS. With the new Atlas - Ranger integration, administrators can conceptualize security policies based on data classification, and not necessarily in terms of tables or columns. Data stewards can easily classify data in Atlas and use in the classification in Ranger to create security policies. This represents a paradigm shift in security and governance in Hadoop, benefiting customers with mature Hadoop deployments as well as customers looking to adopt Hadoop and big data infrastructure for first time.

Further Reading



Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.