Connecting to Hive
This section contains details for Hive connections.
General information
| Field | Description |
|---|---|
| Data source | Hive |
| Supported versions |
2.6.19.1022 |
| Connection string | jdbc:hive2://
|
| Packaged? |
|
| Certified? |
|
| Supported features | |
|
Estimate job
|
|
|
Analyze data
|
|
|
Schedule
|
|
| Processing capabilities | |
|
Pushdown
|
|
|
Spark agent
|
|
|
Yarn agent
|
|
|
Parallel JDBC
|
|
| Java Platform version compatibility | |
|
JDK 8
|
|
|
JDK 11
|
|
Minimum user permissions
In order to bring your Hive data into Collibra Data Quality & Observability, you need the following permissions.
- The Kerberos user has read permissions on Hive tables.
- ROLE_ADMIN assigned to your user in Collibra DQ.
Recommended and required connection properties
| Required | Connection Property | Type | Value |
|---|---|---|---|
|
|
Name | Text | The unique name of your connection. Ensure that there are no spaces in your connection name. |
|
|
Is Hive | Option |
Uses the Hive server engine for distributed speed and scale. This option sets |
|
|
Connection URL | String |
The connection string path of your Hive connection. When referring to the example below, replace the Example |
|
|
Driver Name | String |
The driver class name of your connection.
|
|
|
Port | Integer |
The port number to establish a connection to the datasource. The default port is |
|
|
Source Name | String | N/A |
|
|
Target Agent | Option | The Agent that submits your Spark job for processing. |
|
|
Auth Type | Option |
The method to authenticate your connection. Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types. |
|
|
Properties | String |
The configurable driver properties for your connection. Multiple properties must be comma delimited. For example, abc=123,test=true Optionally add the following driver property to remove the enforcement of unique column names:
Optionally add the following driver properties to set the Spark conf spark.hadoop.hive.metastore.uris and allow Spark to read through a warehouse catalog, such as Hive:
|
Authentication
Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.
| Required | Field | Description |
|---|---|---|
|
|
Principal | The Kerberos entity to authenticate and grant access to your connection. |
|
|
Keytab |
The file path of the keytab file that contains the encrypted key for a Kerberos principal. Example /tmp/keytab/hive_user.keytab |
|
|
Password | The secret credential associated with your Kerberos principal. |
|
|
Script |
The file path that contains the script file used to interact with and authenticate a Kerberos user. Example /tmp/keytab/hive_pwd_mgr.sh |
|
|
Param $1 | Optional. Additional Kerberos parameter. |
|
|
Param $2 | Optional. Additional Kerberos parameter. |
|
|
Param $3 | Optional. Additional Kerberos parameter. |
|
|
TGT Cache | The ticket-granting ticket cache that stores the TGT to authenticate your connection. |