Connecting to Apache Impala
This section contains details for Impala connections.
Select an option from the dropdown menu to display information for a particular driver class.
General information
Field | Description |
---|---|
Data source | Apache Impala |
Supported versions |
|
Connection string | jdbc:impala://
|
Packaged? |
Yes |
Certified? |
Yes |
Supported features | |
Estimate job
|
|
Analyze data
|
|
Schedule
|
|
Processing capabilities | |
Pushdown
|
No |
Spark agent
|
Yes |
Yarn agent
|
Yes |
Parallel JDBC
|
Yes |
Java Platform version compatibility | |
JDK 8
|
Yes |
JDK 11
|
Yes |
Minimum user permissions
In order to bring your Impala data into Collibra Data Quality & Observability, you need the following permissions.
- The Kerberos user has read permissions on Impala tables.
- ROLE_ADMIN assigned to your user in Collibra DQ.
Recommended and required connection properties
Required | Connection Property | Type | Value |
---|---|---|---|
Yes |
Name | Text | The unique name of your connection. Ensure that there are no spaces in your connection name. |
No |
Hive Direct Eligible | Option | Uses the Hive server engine for distributed speed and scale. |
Yes |
Connection URL | String |
The connection string path of your Impala connection. When referring to the example below, replace the Example |
Yes |
Driver Name | String |
The driver class name of your connection.
|
Yes |
Port | Integer |
The port number to establish a connection to the datasource. The default port is |
No |
Source Name | String | N/A |
No |
Target Agent | Option | The Agent that submits your Spark job for processing. |
Yes |
Auth Type | Option |
The method to authenticate your connection. Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types. |
No |
Driver Properties | String |
The configurable driver properties for your connection. Multiple properties must be comma delimited. For example, abc=123,test=true Optionally add the following driver property to remove the enforcement of unique column names:
|
Authentication
Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.
Configuring Access Token Manager authentication
Prerequisites
You have added a script file that contains or can retrieve an access token to a folder accessible to Collibra Data Quality & Observability.
Steps
- Update the Connection URL to the following format:
jdbc:impala://${host}:${port}/${catalog}?SSL=true&source=jdbc:impala&accessToken=${accessToken}
- Select Access Token Manager from the Authentication Type dropdown menu.
- Enter the User ID of the IdP account in the Username input field.
- Enter the file path containing the access token script file in the Script input field.
For example, /opt/owl/config/get_impala.sh - Click Submit.
Required | Field | Description |
---|---|---|
Yes |
Principal | The Kerberos entity to authenticate and grant access to your connection. |
Yes |
Keytab |
The file path of the keytab file that contains the encrypted key for a Kerberos principal. Example /tmp/keytab/impala_user.keytab |
Yes |
Password | The secret credential associated with your Kerberos principal. |
No |
Script |
The file path that contains the script file used to interact with and authenticate a Kerberos user. Example /tmp/keytab/impala_pwd_mgr.sh |
No |
Param $1 | Optional. Additional Kerberos parameter. |
No |
Param $2 | Optional. Additional Kerberos parameter. |
No |
Param $3 | Optional. Additional Kerberos parameter. |
Yes |
TGT Cache | The ticket-granting ticket cache that stores the TGT to authenticate your connection. |
Yes |
Username |
The User ID of the IdP account. |
Yes |
Script |
The file path containing the access token script file in the Script input field. Example /opt/owl/config/get_impala.sh |
No |
Param $1 | Optional. Additional parameter. |
No |
Param $2 | Optional. Additional parameter. |
No |
Param $3 | Optional. Additional parameter. |