Tag Based Policies with Apache Ranger and Apache Atlas
Overview
NOTICE
As of January 31, 2021, this tutorial references legacy products that no longer represent Cloudera’s current product offerings.
Please visit recommended tutorials:
- How to Create a CDP Private Cloud Base Development Cluster
- All Cloudera Data Platform (CDP) related tutorials
Introduction
In this section of the tutorial you will begin assigning policies to the users of our sandbox, you will be introduced to user accounts available and then you will assign permissions on data based on the persona's role.
Prerequisites
- Downloaded and deployed the Hortonworks Data Platform (HDP) Sandbox
- Learning the Ropes of the HDP Sandbox
Outline
- Concepts
- Sandbox User Personas Policy
- Access Without Tag Based Policies
- Create a Ranger Policy to Limit Access of Hive Data
- Create Atlas Tag to Classify Data
- Create Ranger Tag Based Policy
- Summary
- Further Reading
Concepts
The Sandbox's Hive policies are such that when a new table is created, everyone has access to it. This is convenient for us because the data in the tables we create is fictitious; however, image a scenario where a Hive table hold sensitive information (e.g. SSN, or Birthplace) we should be able to Govern the data and only give access to authorized users. In this section we will recreate a scenario where certain users do not have access to sensitive data; however, Raj our cluster operator has been approved to access the data, so we will create Tag Based Policies to granularly grant him access to the sensitive data.
Access Without Tag Based Policies
In this section you will create a brand new hive table called employee
in the default
database of our Sandbox.
Keep in mind, for this new table, no policies have been created to authorize what our sandbox users can access within this table and its columns.
1. Go to Data Analytics Studio or DAS and click on the Data Analytics Studio UI or go to sandbox-hdp.hortonworks.com:30800.
2. Create the employee
table:
create table employee (ssn string, name string, location string)
row format delimited
fields terminated by ','
stored as textfile;
Then, click the green Execute
button.
3. Verify the table was created successfully by going to Database>Table tab:
4. Now we will populate this table with data.
5. Enter the HDP Sandbox's CentOS command line interface by using the Web Shell Client at:
sandbox-hdp.hortonworks.com:4200
Login credentials are:
username = root
password = hadoop
Note: hadoop is the initial password, but you will asked to change it after first sign in.
6. Create the employee
file with the following data using the command:
printf "111-111-111,James,San Jose\\n222-222-222,Christian,Santa Clara\\n333-333-333,George,Fremont" > employeedata.txt
7. Copy the employeedata.txt file from your centOS file system to HDFS. The particular location the file will be stored in is Hive warehouse's employee table directory:
hdfs dfs -copyFromLocal employeedata.txt /warehouse/tablespace/managed/hive/employee
8. Go back to DAS Verify the hive table employee has been populated with data:
select * from employee;
Execute the hive query to the load the data.
Notice you have an employee data table in Hive with ssn, name and location as part of its columns.
The ssn and location columns hold sensitive information and most users should not have access to it.
Create a Ranger Policy to Limit Access of Hive Data
Your goal is to create a Ranger Policy which allows general users access to the name column while excluding them access to the ssn and location columns.
This policy will be assigned to maria_dev and raj_ops.
1. Go to Ranger UI on and Click on sandbox_hive
sandbox-hdp.hortonworks.com:6080
Table 2: Ranger Login credentials
Username | Password |
---|---|
admin | hortonworks1 |
The Ranger UI homepage should look similar to the image below:
2. Select grant-XXXXXXXXXXX (policy value will varies per sandbox)
3. Ensure that the policy for table default is disabled as shown in the image below, then Save the changes.
4. Now select Add New Policy:
5. In the Policy Details field, enter following values:
Policy Name - Policy to Restrict Employee Data
Database - default
table - employee
Hive Column - ssn, location (NOTE : Do NOT forget to EXCLUDE these columns)
Description - Any description
6. In the Allow Conditions, it should have the following values:
Select Group – blank, no input
Select User – raj_ops, maria_dev
Permissions – Click on the + sign next to Add Permissions and click on select and then green tick mark.
You should have your policy configured like this, then click on Add
.
7. You can see the list of policies that are present in Sandbox_hive
.
8. Disable the all - global
Policy to take away raj_ops
and maria_dev
access to the employee table's ssn and location column data.
Go inside this Policy, to the right of Policy Name
there is an enable button
that can be toggled to disabled
. Toggle it. Then click save.
Verify Ranger Policy is in Effect
We are going to verify if maria_dev
has access to the Hive employee
table. To do this we will use beeline.
1. Login to Ambari with user/password: maria_dev/maria_dev
2. Go to Hive > HIVESERVER2 JDBC URL and click on the Clipboard at the end of the JDBC URL. This will copy the Hiverserver2 JDBC URL.
3. Go to Shell-in-Box at:
sandbox-hdp.hortonworks.com:4200
username = root
password = hadoop
Note: hadoop is the initial password, but you will asked to change it after first sign in.
4. Type the following command in beeline and paste the JDBC URL in between the quotes.
beeline -u "Paste the JDBC URL here" -n maria_dev
5. Enter the command below in beeline:
select * from employee;
6. An authorization error will appear. This is expected as the user maria_dev
and raj_ops
do not have access to 2 columns in this table (ssn and location).
7. For further verification, you can view the Audit tab in Ranger. Go back to Ranger and click on Audits=>Access
and select user => maria_dev
. You will see the entry of Access Denied for maria_dev. maria_dev tried to access data she didn't have authorization to view.
5. Return to beeline, try running a query to access the name
column from the employee
table. maria_dev
should be able to access that data.
SELECT name FROM employee;
The query runs successfully. Even, raj_ops user cannot not see all the columns for the location and SSN. We will provide access to this user to all columns later via Atlas Ranger Tag Based Policies.
Create Atlas Tag to Classify Data
The goal of this section is to classify all data in the ssn and location columns with a PII* tag. So later when we create a Ranger Tag Based Policy, users who are associated with the PII tag can override permissions established in the Ranger Resource Board policy.
1. Reset Admin user password:
If you haven't already reset your Ambari Admin password we will use it to log into Atlas.
1. Login into Atlas UI:
sandbox-hdp.hortonworks.com:21000
username & password : admin/admin123
2. Go to Classification and press the + Create Tag
button to create a new tag.
- Name the tag:
PII
- Add Description:
Personal Identifiable Information
Press the Create button. Then you should see your new tag displayed on the Classification page.
3. Go to the Search
tab. In Search By Type
, write hive_table
4. employee
table should appear. Select it.
- How does Atlas get Hive employee table?
Hive communicates information through Kafka, which then is transmitted to Atlas. This information includes the Hive tables created and all kinds of data associated with those tables.
5. View the details of the employee
table by clicking on its name.
6. View the Schema associated with the table. It'll list all columns of this table.
7. Press the green + button to assign the PII
tag to the ssn
column. Click save.
8. Repeat the same process to add the PII
tag to the location
column.
We have classified all data in the ssn and location
columns as PII
.
Create Ranger Tag Based Policy
Head back to the Ranger UI and log in using
Username/Password: admin/hortonworks1
The tag and entity (ssn, location) relationship will be automatically inherited by Ranger. In Ranger, we can create a tag based policy by accessing it from the top menu. Go to Access Manager → Tag Based Policies
.
You will see a folder called TAG that does not have any repositories yet.
Click +
button to create a new tag repository.
Name it Sandbox_tag
and click Add
.
Click on Sandbox_tag
to add a policy.
Click on the Add New Policy
button.
Enter the following details:
Policy Name – PII column access policy
Tag – PII
Description – Any description
Audit logging – Yes
In the Allow Conditions, it should have the following values:
Select Group - blank
Select User - raj_ops
Component Permissions - Select hive
You can select the component permission through the following popup. Check the checkbox to the left of the word component to give raj_ops
permission to select, update, create, drop, alter, index, lock, all, read, write, repladmin, service admin, temporary udf admin
operations against the hive table employee
columns specified by PII
tag.
Please verify that Allow Conditions section is looking like the image below:
This signifies that only raj_ops
is allowed to do any operation on the columns that are specified by PII tag. Click Add
.
Now click on Access Manager
> Resource Based Policies
and edit Sandbox_hive
repository by clicking on the button next to it.
Click on Select Tag Service
and select Sandbox_tag
. Click on Save
.
The Ranger tag based policy is now enabled for raj_ops user. You can test it by running the query on all columns in employee table on beeline.
Type the following command in beeline and paste the JDBC URL in between the quotes.
!q
beeline -u "Paste the JDBC URL here" -n raj_ops
select * from employee;
The query executes successfully. The query can be checked in the Ranger Audit
log which will show the access granted and associated policy which granted access.
Clear the existing query and select User
> raj_ops
in the search bar.
NOTE: There are 2 policies which provided access to raj_ops user, one is a tag based policy and the other is hive resource based policy. The associated tags (PII) is also denoted in the tags column in the audit record).
Summary
Ranger traditionally provided group or user based authorization for resources such as table, column in Hive or a file in HDFS. With the new Atlas - Ranger integration, administrators can conceptualize security policies based on data classification, and not necessarily in terms of tables or columns. Data stewards can easily classify data in Atlas and use in the classification in Ranger to create security policies. This represents a paradigm shift in security and governance in Hadoop, benefiting customers with mature Hadoop deployments as well as customers looking to adopt Hadoop and big data infrastructure for first time.
Further Reading
- For more information on Ranger and Solr Audit integration, refer to Install and Configure Solr For Ranger Audits
- How Ranger provides Authorization for Services within Hadoop, refer to Ranger FAQ
- HDP Security Doc
- Integration of Atlas and Ranger Classification-Based Security Policies