Securing Hive and Impala Without Sentry
Audience: System Administrators
Content Summary: Immuta offers both fine- and coarse-grained protection for Hive and Impala tables for users who access data via the Immuta Query Engine or the Spark Integration. However, additional protections are required to ensure that users cannot gain unauthorized access to data by connecting to Hive or Impala directly. Cloudera recommends using the Sentry service to secure access to Hive and Impala. As an alternative, this guide details steps that CDH cluster administrators can take to lock down Hive and Impala access without running the Sentry service.
Info
Each section in this guide is a required step to ensure that access to Hive and Impala is secured.
Restricting Access to Hive
After installing Immuta on your cluster, users will still be able to connect to Hive via the hive shell,
beeline
, or JDBC/ODBC connections. To prevent users from circumventing Immuta and gaining unauthorized
access to data, you can leverage HDFS Access control lists (ACLs) without running Sentry.
Enable HDFS Access Control Lists in Cloudera Manager
See the official Cloudera Documentation to complete this step.
Enable Hive Impersonation in Cloudera Manager
In order to leverage ACLs to secure Hive, Hive impersonation must be enabled.
To enable Hive impersonation in Cloudera manager, set
hive.server2.enable.impersonation, hive.server2.enable.doAs
to true
in the Hive
service configuration.
Configure Access Control Lists
Tip
Group in this context refers to Linux groups, not Sentry groups.
You must configure ACLs for each location in HDFS that Hive data will be stored in to restrict access to
hive
, impala
, and data owners that belong to a particular group.
You can accomplish this by running the commands below.
hadoop fs -setfacl -m other::--- /user/hive/warehouse
hadoop fs -setfacl -m user::rwx /user/hive/warehouse
hadoop fs -setfacl -m group::rwx /user/hive/warehouse
hadoop fs -setfacl -m group:hive:rwx /user/hive/warehouse
hadoop fs -setfacl -m group:examplegroup:rwx /user/hive/warehouse
In this example, we are allowing members of the hive
and examplegroup
to select & insert on tables in hive.
Note that the hive
group only contains the hive
and impala
users, while examplegroup
contains the
privileged users who would be considered potential data owners in Immuta.
By default, Hive stores data in HDFS under /user/hive/warehouse
. However, you can change this directory
in the above example if you are using a different data storage location on your cluster.
Restricting Access to Impala
After installing Immuta on your cluster, users will still be able to connect to Impala via
impala-shell
or JDBC/ODBC connections. To prevent users from circumventing Immuta and gaining unauthorized
access to data, you can leverage policy configuration files for Impala without running Sentry.
Create Policy Configuration File
Tip
Group in this context refers to Linux groups, not Sentry groups.
The policy configuration file that will drive Impala's security must be in .ini
format. The example below will
grant users in group examplegroup
the ability to read and write data in the default
database. You can add
additional groups and roles that correspond to different databases or tables.
[groups]
examplegroup = example_insert_role, example_select_role
[roles]
example_insert_role = server=server1->db=default->table=*->action=insert
example_select_role = server=server1->db=default->table=*->action=select
This policy configuration file assigns the group called examplegroup
to the roles example_insert_role
and example_select_role
, which grant insert and select (read and write) privileges on all tables
in the default
database.
See the official Impala documentation for a detailed guide on policy configuration files. Note that while the guide mentions Sentry, running the Sentry service is not required to leverage policy configuration files.
Next, place the policy configuration file (we will call it policy.ini
) in HDFS. The policy file should
be owned by the impala
user, and should only be accessible by the impala
user. See below for an example.
hadoop fs -copyFromLocal /tmp/policy.ini /user/impala/
hadoop fs -chown impala:impala /user/impala/policy.ini
hadoop fs -chmod o-rwx /user/impala/policy.ini
Configure Impala to use Policy Configuration File
You can configure Impala to leverage your new policy file by navigating to Impala's configuration
in Cloudera Manager and modifying
Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve)
with the snippet below.
-server_name=server1
-authorization_policy_file=/user/impala/policy.ini
You must restart the Impala service in Cloudera Manager to implement the policy changes.
Note that server_name
should correspond to the server
that you define in your policy roles. Also note that
each key-value pair should be placed on its own line in the configuration snippet.