Enabling Sentry Authorization for Search using the Command Line
Sentry enables role-based, fine-grained authorization for Cloudera Search. Sentry can apply a range of restrictions to various tasks, such as accessing data or creating collections. These restrictions are consistently applied, regardless of the way users attempt to complete actions. For example, restricting access to data in a collection restricts that access whether queries come from the command line, from a browser, Hue, or through the admin console.
- You can use either Cloudera Manager or the following command-line instructions to complete this configuration.
- This information applies specifically to CDH 5.4.x. If you use an earlier version of CDH, see the documentation for that version located at Cloudera Documentation.
For information on enabling Sentry authorization using Cloudera Manager, see Configuring Sentry Policy File Authorization Using Cloudera Manager.
This document describes configuring Sentry for Cloudera Search. For information about alternate ways to configure Sentry or for information about installing Sentry for other services, see:
Roles and Collection-Level Privileges
Sentry uses a role-based privilege model. A role is a set of rules for accessing a given Solr collection. Access to each collection is governed by privileges: Query, Update, or All (*).
engineer_role = collection=hive_logs->action=Query, collection=hbase_logs->action=Query, collection=current_bugs->action=Update
Users and Groups
- A user is an entity that is permitted by the Kerberos authentication system to access the Search service.
- A group connects the authentication system with the authorization system. It is a set of one or more users who have been granted one or more authorization roles. Sentry allows a set of roles to be configured for a group.
- A configured group provider determines a user’s affiliation with a group. The current release supports HDFS-backed
groups and locally configured groups. For example,
dev_ops = dev_role, ops_role
Here the group dev_ops is granted the roles dev_role and ops_role. The members of this group can complete searches that are allowed by these roles.
User to Group Mapping
You can configure Sentry to use either Hadoop groups or groups defined in the policy file.
To configure Hadoop groups:
Set the sentry.provider property in sentry-site.xml to org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider.
By default, this uses local shell groups. See the Group Mapping section of the HDFS Permissions Guide for more information.
In this case, Sentry uses the Hadoop configuration described in Configuring LDAP Group Mappings. Cloudera Manager automatically uses this configuration. In a deployment not managed by Cloudera Manager, manually set these configuration parameters parameters in the hadoop-conf file that is passed to Solr.
To configure local groups:
- Define local groups in a [users] section of the Sentry Policy file. For example:
[users] user1 = group1, group2, group3 user2 = group2, group3
- In sentry-site.xml, set search.sentry.provider as follows:
<property> <name>sentry.provider</name> <value>org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider</value> </property>
Setup and Configuration
This release of Sentry stores the configuration as well as privilege policies in files. The sentry-site.xml file contains configuration options such as privilege policy file location. The Policy File contains the privileges and groups. It has a .ini file format and should be stored on HDFS.
Sentry is automatically installed when you install Cloudera Search for CDH or Cloudera Search 1.1.0 or later.
The sections that follow contain notes on creating and maintaining the policy file.
Storing the Policy File
Considerations for storing the policy file(s) include:
- Replication count - Because the file is read for each query, you should increase this; 10 is a reasonable value.
- Updating the file - Updates to the file are only reflected when the Solr process is restarted.
role1 = privilege1 role1 = privilege2
This section provides a sample configuration.
[groups] # Assigns each Hadoop group to its set of roles engineer = engineer_role ops = ops_role dev_ops = engineer_role, ops_role hbase_admin = hbase_admin_role [roles] # The following grants all access to source_code. # "collection = source_code" can also be used as syntactic # sugar for "collection = source_code->action=*" engineer_role = collection = source_code->action=* # The following imply more restricted access. ops_role = collection = hive_logs->action=Query dev_ops_role = collection = hbase_logs->action=Query #give hbase_admin_role the ability to create/delete/modify the hbase_logs collection hbase_admin_role = collection=admin->action=*, collection=hbase_logs->action=*
Sentry Configuration File
The following is an example of a sentry-site.xml file.
<configuration> <property> <name>hive.sentry.provider</name> <value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value> </property> <property> <name>sentry.solr.provider.resource</name> <value>/path/to/authz-provider.ini</value> <!-- If the HDFS configuration files (core-site.xml, hdfs-site.xml) pointed to by SOLR_HDFS_CONFIG in /etc/default/solr point to HDFS, the path will be in HDFS; alternatively you could specify a full path, e.g.:hdfs://namenode:port/path/to/authz-provider.ini --> </property>
Enabling Sentry in Cloudera Search for CDH 5
You can enable Sentry using Cloudera Manager or by manually modifying files. For more information on enabling Sentry using Cloudera Manager, see Configuring Sentry Policy File Authorization Using Cloudera Manager and Enabling Sentry Authorization for Solr.
- In a Cloudera Manager deployment, these properties are added automatically when you click Enable Sentry Authorization in the Solr configuration page in Cloudera Manager.
- In a deployment not managed by Cloudera Manager, you must make these changes yourself. The variable SOLR_AUTHORIZATION_SENTRY_SITE specifies the path to
sentry-site.xml. The variable SOLR_AUTHORIZATION_SUPERUSER specifies the first part of SOLR_KERBEROS_PRINCIPAL. This is solr for the majority of users, as solr is the default. Settings are of the form:
To enable sentry collection-level authorization checking on a new collection, the instancedir for the collection must use a modified version of solrconfig.xml with Sentry integration. Each collection has a separate solrconfig.xml file, meaning you can define different behavior for each collection. The command solrctl instancedir --generate generates two versions of solrconfig.xml: the standard solrconfig.xml without sentry integration, and the sentry-integrated version called solrconfig.xml.secure. To use the sentry-integrated version, replace solrconfig.xml with solrconfig.xml.secure before creating the instancedir.
You can enable Sentry on an existing collection by modifying the settings that are stored in instancedir. For example, you might have an existing collection named foo and a standard solrconfig.xml. By default, collections are stored in instancedirs that use the collection's name, which is foo in this case. In such a situation, you could enable Sentry from the command line by executing the following commands:
# generate a fresh instancedir solrctl instancedir --generate foosecure # download the existing instancedir from ZK into subdirectory foo solrctl instancedir --get foo foo # replace the existing solrconfig.xml with the sentry-enabled one cp foosecure/conf/solrconfig.xml.secure foo/conf/solrconfig.xml # update the instancedir in ZK solrctl instancedir --update foo foo # reload the collection solrctl collection --reload foo
If you have an existing collection using a version of solrconfig.xml that you have modified, contact Support for assistance.
Providing Document-Level Security Using Sentry
For role-based access control of a collection, an administrator modifies a Sentry role so it has query, update, or administrative access, as described above.
Collection-level authorization is useful when the access control requirements for the documents in the collection are the same, but users may want to restrict access to a subset of documents in a collection. This finer-grained restriction could be achieved by defining separate collections for each subset, but this is difficult to manage, requires duplicate documents for each collection, and requires that these documents be kept synchronized.
Document-level access control solves this issue by associating authorization tokens with each document in the collection. This enables granting Sentry roles access to sets of documents in a collection.
Document-Level Security Model
Document-level security depends on a chain of relationships between users, groups, roles, and documents.
- Users are assigned to groups.
- Groups are assigned to roles.
- Roles are stored as "authorization tokens" in a specified field in the documents.
Document-level security supports restricting which documents can be viewed by which users. Access is provided by adding roles as "authorization tokens" to a specified document field. Conversely, access is implicitly denied by omitting roles from the specified field. In other words, in a document-level security enabled environment, a user might submit a query that matches a document; if the user is not part of a group that has a role has been granted access to the document, the result is not returned.
For example, Alice might belong to the administrators group. The administrators group may belong to the doc-mgmt role. A document could be ingested and the doc-mgmt role could be added at ingest time. In such a case, if Alice submitted a query that matched the document, Search would return the document, since Alice is then allowed to see any document with the "doc-mgmt" authorization token.
Similarly, Bob might belong to the guests group. The guests group may belong to the public-browser role. If Bob tried the same query as Alice, but the document did not have the public-browser role, Search would not return the result because Bob does not belong to a group that is associated with a role that has access.
Note that collection-level authorization rules still apply, if enabled. Even if Alice is able to view a document given document-level authorization rules, if she is not allowed to query the collection, the query will fail.
Roles are typically added to documents when those documents are ingested, either via the standard Solr APIs or, if using morphlines, the setValues morphline command.
Enabling Document-Level Security
Cloudera Search supports document-level security in Search for CDH 5.1 and later. Document-level security requires collection-level security. Configuring collection-level security is described earlier in this topic.
Document-level security is disabled by default, so the first step in using document-level security is to enable the feature by modifying the solrconfig.xml.secure file. Remember to replace the solrconfig.xml with this file, as described in Enabling Sentry in Cloudera Search for CDH 5.
To enable document-level security, change solrconfig.xml.secure. The default file contents are as follows:
<searchComponent name="queryDocAuthorization"> <!-- Set to true to enabled document-level authorization --> <bool name="enabled">false</bool> <!-- Field where the auth tokens are stored in the document --> <str name="sentryAuthField">sentry_auth</str> <!-- Auth token defined to allow any role to access the document. Uncomment to enable. --> <!--<str name="allRolesToken">*</str>--> </searchComponent>
- The enabled Boolean determines whether document-level authorization is enabled. To enable document level security, change this setting to true.
- The sentryAuthField string specifies the name of the field that is used for storing authorization information. You can use the default setting of sentry_auth or you can specify some other string to be used for assigning values during ingest.
- The allRolesToken string represents a special token defined to allow any role access to the document. By default, this feature is disabled. To enable this feature, uncomment the specification and specify the token. This token should be different from the name of any sentry role to avoid collision. By default it is "*". This feature is useful when first configuring document level security or it can be useful in granting all roles access to a document when the set of roles may change. See Best Practices for additional information.
You may want to grant every user that belongs to a role access to certain documents. One way to accomplish this is to specify all known roles in the document, but this requires updating or re-indexing the document if you add a new role. Alternatively, an allUser role, specified in the Sentry .ini file, could contain all valid groups, but this role would need to be updated every time a new group was added to the system. Instead, specifying allRolesToken allows any user that belongs to a valid role to access the document. This access requires no updating as the system evolves.
In addition, allRolesToken may be useful for transitioning a deployment to use document-level security. Instead of having to define all the roles upfront, all the documents can be specified with allRolesToken and later modified as the roles are defined.
Consequences of Document-Level Authorization Only Affecting Queries
Document-level security does not prevent users from modifying documents or performing other update operations on the collection. Update operations are only governed by collection-level authorization.
Document-level security can be used to prevent documents being returned in query results. If users are not granted access to a document, those documents are not returned even if that user submits a query that matches those documents. This does not have affect attempted updates.
Consequently, it is possible for a user to not have access to a set of documents based on document-level security, but to still be able to modify the documents via their collection-level authorization update rights. This means that a user can delete all documents in the collection. Similarly, a user might modify all documents, adding their authorization token to each one. After such a modification, the user could access any document via querying. Therefore, if you are restricting access using document-level security, consider granting collection-level update rights only to those users you trust and assume they will be able to access every document in the collection.
Limitations on Query Size
By default queries support up to 1024 Boolean clauses. As a result, queries containing more that 1024 clauses may cause errors. Because authorization information is added by Sentry as part of a query, using document-level security can increase the number of clauses. In the case where users belong to many roles, even simple queries can become quite large. If a query is too large, an error of the following form occurs:
org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
For maxBooleanClauses to be applied as expected, make any change to this value to all collections and then restart the service. You must make this change to all collections because this option modifies a global Lucene property, affecting all Solr cores. If different solrconfig.xml files have different values for this property, the effective value is determined per host, based on the first Solr core to be initialized.
Enabling Secure Impersonation
Secure Impersonation is a feature that allows a user to make requests as another user in a secure way. For example, to allow the following impersonations:
- User hue can make requests as any user from any host.
- User foo can make requests as any member of group bar, from host1 or host2.
Configure the following properties in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr:
SOLR_SECURITY_ALLOWED_PROXYUSERS=hue,foo SOLR_SECURITY_PROXYUSER_hue_HOSTS=* SOLR_SECURITY_PROXYUSER_hue_GROUPS=* SOLR_SECURITY_PROXYUSER_foo_HOSTS=host1,host2 SOLR_SECURITY_PROXYUSER_foo_GROUPS=bar
Debugging Failed Sentry Authorization Requests
- In Cloudera Manager, add log4j.logger.org.apache.sentry=DEBUG to the logging settings for your service through the corresponding Logging Safety Valve field for the Impala, Hive Server 2, or Solr Server services.
- On systems not managed by Cloudera Manager, add log4j.logger.org.apache.sentry=DEBUG to the log4j.properties file on each host in the cluster, in the appropriate configuration directory for each service.
FilePermission server..., RequestPermission server...., result [true|false]which indicate each evaluation Sentry makes. The FilePermission is from the policy file, while RequestPermission is the privilege required for the query. A RequestPermission will iterate over all appropriate FilePermission settings until a match is found. If no matching privilege is found, Sentry returns false indicating "Access Denied" .
Appendix: Authorization Privilege Model for Search
The tables below refer to the request handlers defined in the generated solrconfig.xml.secure. If you are not using this configuration file, the below may not apply.
admin is a special collection in sentry used to represent administrative actions. A non-administrative request may only require privileges on the collection on which the request is being performed. This is called collection1 in this appendix. An administrative request may require privileges on both the admin collection and collection1. This is denoted as admincollection1 in the tables below.
|Request Handler||Required Privilege||Collections that Require Privilege|
|Collection Action||Required Privilege||Collections that Require Privilege|
|Collection Action||Required Privilege||Collections that Require Privilege|
|Request Handler||Required Privilege||Collections that Require Privilege|
|LogginHandler||QUERY, UPDATE (or *)||admin|