Open Source, Fine-Grained Access Control for Apache Hive™ and Cloudera Impala
Apache Sentry (incubating) is the next step in enterprise-grade big data security and delivers fine-grained authorization to data stored in Apache Hadoop. An independent security module that integrates with open source SQL query engines Apache Hive and Cloudera Impala, Sentry delivers advanced authorization controls to enable multi-user applications and cross-functional processes for enterprise data sets.
While Hadoop has strong security at the file system level, Sentry introduces the granularity required to secure access to data for the majority of SQL and BI tools and use cases.
- Role-Based Administration: –Database administrators can unlock key role-based access control (RBAC) requirements and define what users and applications can do with data within a server, database, table, or view.
- Data Classification:– Content producers and owners can intersperse sensitive data with non-sensitive data in the same data set.
- Improved Regulatory Compliance:–Business teams can leverage the power of Hadoop while aligning with regulatory mandates like HIPAA, SOX, and PCI.
- Expanded User Base:–Operations staff can open Hadoop data systems to a more diverse set of users, extending the power of Hadoop and making it suitable for new industries, organizations, and enterprise usage.
Sentry utilizes the existing Hive metastore and offers an extensible plug-in for HiveServer2 that expands the foundation for Hadoop security, building upon the existing capabilities of concurrency and Kerberos-based authentication.
- Gain comprehensive control of user access to subsets of data
- Simplify permissions management based on functional roles
- Delegate security management to individual administrators
- Benefit from open source innovation for Hive, Impala, and more
Make Hadoop safer, more compliant, and ready for enterprise use, in even the most highly regulated industries, with Sentry.
Key Benefits of Sentry
Precise Data Access
Ensure that the right resources have the proper and relevant permissions to appropriate data or subsets of data and SQL activities in Hive and Impala.
Simplify administration by granting sets of permissions to resources within the organization based on functional roles within a Hive or Impala database.
Store sensitive data alongside non-sensitive data in the same data set within Hadoop without replication and ensure usage and data compliance for regulations and governance policies.
Empower new and varied users and data within the enterprise and alleviate security concerns by building on the foundations of concurrency, authentication, and authorization provided by Hive, Impala, and Sentry.
Build multi-user applications on top of Hive and Impala by segregating access to data sets for appropriate users and delegating the permissions management to local database administrators.
Avoid sub-optimal choices for authorization like self-regulated, “benevolent” advisory authorization or “all-or-nothing,” coarse-grained, file-based access./p>
Reuse and Extensibility
Build on existing systems like the Hive metastore and establish a solid, open, and extensible framework for fine-grain authorization and security beyond SQL on Hadoop.
Key Features of Sentry
- Fine-grained authorization for Hive and Impala
- Specify security for SERVER, DATABASE, TABLE, and VIEW
- SELECT privileges on VIEW, TABLE
- INSERT privilege on TABLE
- TRANSFORM privilege on SERVER
- ALL privilege on SERVER, DATABASE, TABLE, and VIEW
- ALL privilege needed to create and modify schema within scope
- Separate authorization policies per database/schema
- Supported in HiveServer2 and Impala 1.1; available with CDH 4.3
- Supports current Hive metastore architecture
- 100% Apache licensed, 100% open source