Connecting to Hive

This section contains details for Hive connections.

General information

Field Description
Data source Hive
Supported versions

2.6.19.1022

Connection string jdbc:hive2://
Packaged?

No

Certified?

Yes

Supported features
Estimate job

Yes

Analyze data

Yes

Schedule

Yes

Processing capabilities
Pushdown

No

Spark agent

Yes

Yarn agent

Yes

Parallel JDBC

Yes

Java Platform version compatibility
JDK 8

Yes

JDK 11

Yes

Minimum user permissions

In order to bring your Hive data into Collibra Data Quality & Observability, you need the following permissions.

  • The Kerberos user has read permissions on Hive tables.
  • ROLE_ADMIN assigned to your user in Collibra DQ.

Recommended and required connection properties

Required Connection Property Type Value

Yes

Name Text The unique name of your connection. Ensure that there are no spaces in your connection name.

No

Is Hive Option

Uses the Hive server engine for distributed speed and scale. This option sets hiveNative options for Spark Jobs.

Yes

Connection URL String

The connection string path of your Hive connection.

When referring to the example below, replace the ${value} sections of the connection URL with your actual value.

Example jdbc:hive2://${host}:10000/default;AuthMech=1;KrbHostFQDN=_HOST;KrbServiceName=hive;SSL=1;AllowSelfSignedCerts=1

Yes

Driver Name String

The driver class name of your connection.

com.cloudera.hive.jdbc41.HS2Driver

Yes

Port Integer

The port number to establish a connection to the datasource.

The default port is 10000

No

Source Name String N/A

No

Target Agent Option The Agent that submits your Spark job for processing.

Yes

Auth Type Option

The method to authenticate your connection.

Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types.

No

Properties String

The configurable driver properties for your connection. Multiple properties must be comma delimited. For example, abc=123,test=true

Optionally add the following driver property to remove the enforcement of unique column names:

hive.resultset.use.unique.column.names=false

Optionally add the following driver properties to set the Spark conf spark.hadoop.hive.metastore.uris and allow Spark to read through a warehouse catalog, such as Hive:

dq.storage.cxn=${connection-name},dq.metastore.host=${hostname},dq.metastore.port=9083

Authentication

Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.

Required Field Description

Yes

Principal The Kerberos entity to authenticate and grant access to your connection.

Yes

Keytab

The file path of the keytab file that contains the encrypted key for a Kerberos principal.

Example /tmp/keytab/hive_user.keytab

Yes

Password The secret credential associated with your Kerberos principal.

Yes

Script

The file path that contains the script file used to interact with and authenticate a Kerberos user.

Example /tmp/keytab/hive_pwd_mgr.sh

No

Param $1 Optional. Additional Kerberos parameter.

No

Param $2 Optional. Additional Kerberos parameter.

No

Param $3 Optional. Additional Kerberos parameter.

Yes

TGT Cache The ticket-granting ticket cache that stores the TGT to authenticate your connection.