Enabling Sentry Service Authorization

Prerequisites

  • Cloudera Director 1.1.x
  • CDH 5.1.x (or higher) managed by Cloudera Manager 5.1.x (or higher).
  • Kerberos authentication implemented on your cluster.

Setting Up the Sentry Service Using the Cloudera Director CLI

For this method, you use the Cloudera Director client and the bootstrap-remote command to send a configuration file to the Cloudera Director server to deploy clusters. See Submitting a Cluster Configuration File for more details. Make sure you add SENTRY to the array of services to be launched. This is specified in the configuration file as:
services: [HDFS, YARN, ZOOKEEPER, HIVE, OOZIE, HUE, IMPALA, SENTRY]
To specify a database, use the databases setting as follows:
cluster { 
...
  databases {          
      SENTRY: {            
        type: mysql            
        host: sentry.db.example.com            
        port: 3306          
        user: <database_username>            
        password: <database_password>           
        name: <database_name>        
      }    
  }
}

If you don't include an entry for Sentry in the databases section of the configuration file, the Cloudera Director default database, PostgreSQL, will be used, rather than the Cloudera Manager default database for Sentry, which is MySQL.

The Sentry service also requires the following custom configuration for the MapReduce, YARN, HDFS, Hive, and Impala Services.
  • MapReduce: Set the Minimum User ID for Job Submission property to zero (the default is 1000) for every TaskTracker role group that is associated with Hive.
    MAPREDUCE {
        TASKTRACKER {                
            taskcontroller_min_user_id: 0 
        }         
    }
  • YARN: Ensure that the Allowed System Users property, for every NodeManager role group that is associated with Hive, includes the hive user.
    YARN { 
        NODEMANAGER {           
            container_executor_allowed_system_users: hive, impala, hue 
        }         
    }
  • HDFS: Enable HDFS extended ACLs.
    HDFS {             
        dfs_permissions: true   
        dfs_namenode_acls_enabled: true       
    }
    With Cloudera Manager 5.3 and CDH 5.3, you can enable synchronization of HDFS and Sentry permissions for HDFS files that are part of Hive tables. For details on enabling this feature using Cloudera Manager, see Synchronizing HDFS ACLs and Sentry Permissions.
  • Hive: Make sure Sentry policy file authorization has been disabled for Hive.
    HIVE {            
        sentry_enabled: false          
    }
  • Impala: Make sure Sentry policy file authorization has been disabled for Impala.
    IMPALA {            
        sentry_enabled: false           
    }

Set Permissions on the Hive Warehouse

Once setup is complete, configure the following permissions on the Hive warehouse. For Sentry authorization to work correctly, the Hive warehouse directory (/user/hive/warehouse or any path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
  • Permissions on the warehouse directory must be set as follows:
    • 771 on the directory itself (for example, /user/hive/warehouse)
    • 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
    • All files and subdirectories must be owned by hive:hive
    For example:
    $ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
    $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse

Setting up the Sentry Service Using the Cloudera Director API

You can use the Cloudera Director API to set up Sentry. Define the ClusterTemplate to include Sentry as a service, along with the configurations specified above, but in JSON format.

Set permissions on the Hive warehouse as described above.

Related Links

For detailed instructions on adding and configuring the Sentry service, see Installing and Upgrading the Sentry Service and Configuring the Sentry Service.

Examples on using Grant/Revoke statements to enforce permissions using Sentry are available at Hive SQL Syntax.