Connecting to Hive

This section contains details for Hive connections.

General information

Field	Description
Data source	Hive
Supported versions	2.6.19.1022
Connection string	`jdbc:hive2://`
Packaged?	No
Certified?	Yes
Supported features
Estimate job	Yes
Analyze data	Yes
Schedule	Yes
Processing capabilities
Pushdown	No
Spark agent	Yes
Yarn agent	Yes
Parallel JDBC	Yes
Java Platform version compatibility
JDK 8	Yes
JDK 11	Yes

Minimum user permissions

In order to bring your Hive data into Collibra Data Quality & Observability, you need the following permissions.

The Kerberos user has read permissions on Hive tables.
ROLE_ADMIN assigned to your user in Collibra DQ.

Recommended and required connection properties

Required	Connection Property	Type	Value
Yes	Name	Text	The unique name of your connection. Ensure that there are no spaces in your connection name.
No	Is Hive	Option	Uses the Hive server engine for distributed speed and scale. This option sets `hiveNative` options for Spark Jobs.
Yes	Connection URL	String	The connection string path of your Hive connection. When referring to the example below, replace the `${value}` sections of the connection URL with your actual value. Example `jdbc:hive2://${host}:10000/default;AuthMech=1;KrbHostFQDN=_HOST;KrbServiceName=hive;SSL=1;AllowSelfSignedCerts=1`
Yes	Driver Name	String	The driver class name of your connection. `com.cloudera.hive.jdbc41.HS2Driver`
Yes	Port	Integer	The port number to establish a connection to the datasource. The default port is `10000`
No	Source Name	String	N/A
No	Target Agent	Option	The Agent that submits your Spark job for processing.
Yes	Auth Type	Option	The method to authenticate your connection. Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types.
No	Properties	String	The configurable driver properties for your connection. Multiple properties must be comma delimited. For example, abc=123,test=true Optionally add the following driver property to remove the enforcement of unique column names: `hive.resultset.use.unique.column.names=false` Optionally add the following driver properties to set the Spark conf spark.hadoop.hive.metastore.uris and allow Spark to read through a warehouse catalog, such as Hive: `dq.storage.cxn=${connection-name},dq.metastore.host=${hostname},dq.metastore.port=9083`

Authentication

Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.

Required	Field	Description
Yes	Principal	The Kerberos entity to authenticate and grant access to your connection.
Yes	Keytab	The file path of the keytab file that contains the encrypted key for a Kerberos principal. Example /tmp/keytab/hive_user.keytab
Yes	Password	The secret credential associated with your Kerberos principal.
Yes	Script	The file path that contains the script file used to interact with and authenticate a Kerberos user. Example /tmp/keytab/hive_pwd_mgr.sh
No	Param $1	Optional. Additional Kerberos parameter.
No	Param $2	Optional. Additional Kerberos parameter.
No	Param $3	Optional. Additional Kerberos parameter.
Yes	TGT Cache	The ticket-granting ticket cache that stores the TGT to authenticate your connection.