Connecting to Databricks

This section contains details for Databricks connections.

Select an option from the dropdown menu to display information for a particular driver class.

General information

Field Description
Data source Databricks
Supported versions 2.6.36 21.0.8137.0
Connection string jdbc:databricks://jdbc:cdata:Databricks:
Packaged?

No Yes

Certified?

Yes

Supported features
Estimate job

Yes

Analyze data

Yes

Schedule

No

Processing capabilities
Pushdown

Yes

Spark agent

Yes

Yarn agent

Yes

Parallel JDBC

No Yes

Java Platform version compatibility
JDK 8

Yes

JDK 11

Yes

Minimum user permissions

In order to bring your Databricks data into Collibra Data Quality & Observability, you need the following permissions.

  • Read access on your Unity Catalog.
  • Access to the cluster endpoint that you use.
  • ROLE_ADMIN assigned to your user in Collibra DQ.

Recommended and required connection properties

Use the dropdown menu to select a driver class.

Required Connection Property Type Value

Yes

Name Text The unique name of your connection. Ensure that there are no spaces in your connection name.

Yes

Connection URL String

The connection string path of your Databricks connection.

Example jdbc:databricks://${host}:${port}/default;transportMode=http;ssl=1;HttpPath=${clusterHttpPath};AuthMech=3;UID=${username};PWD=${token}jdbc:cdata:Databricks:Server=https://${server}.cloud.databricks.com;httpPath=${path};User=token;Token=${token};

Use the following format when connecting to Databricks SQL Warehouse:

jdbc:databricks://[Host]:[Port]/[Schema];[Property1]=[Value]; [Property2]=[Value];...

When referring to the example below, replace the ${value} sections of the connection URL with your actual value.

Example 
jdbc:databricks://<example-account>.cloud.databricks.com:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/xxx;UID=token;PWD=<your-token-here>

Use the following format when connecting to Databricks Unity Catalog:

jdbc:databricks://[Host]:[Port]/[Schema];[Property1]=[Value]; [Property2]=[Value];...ConnCatalog=catalog_name

When referring to the example below, replace the ${value} sections of the connection URL with your actual value.

Example 
jdbc:databricks://${host}:${port}/${Schema};transportMode=http;ssl=1;AuthMech=3;HttpPath=${clusterHttpPath};UID=token;PWD=${access_token};ConnCatalog=${catalog}

Specify ${access_token} on the Properties tab where Property Name is set to access_token and Value is the password. This is so Collibra DQ treats the password as a sensitive property and does not display its actual value in the connection URL.

When referring to the example below, replace the ${value} sections of the connection URL with your actual value.

Note 
If you do not specify ConnCatalog=${catalog} in the JDBC connection URL, a default catalog is not set. Instead, you must include the catalog name in the query, as shown in the following example.
select * from catalog.schema.table

When you specify a catalog in the connection URL, you do not need to explicitly reference it when you query schema and data in the catalog, as shown in the following example.
select * from schema.table

Yes

Driver Name String

The driver class name of your Databricks connection.

com.databricks.client.jdbc.Drivercdata.jdbc.databricks.DatabricksDriver

Yes

Port Integer

The port number to establish a connection to the datasource.

The default port is 0

No

Source Name String N/A

No

Target Agent Option The Agent that submits your Spark job for processing.

Yes

Auth Type Option

The method to authenticate your connection.

Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types.

No

Properties String

The configurable driver properties for your connection. Multiple properties must be comma delimited. For example, abc=123,test=true

No

Variables String

When you specify ${access_token} in the Connection URL, set the following properties on the Variables tab.

Variable Name is set to access_token.

Value is the password.

Authentication

Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.

Required Field Description

Yes

Username

The username of your Databricks service account.

Set Username to token.

Yes

Password

The password of your Databricks service account.

Enter the token value that you entered in the Connection URL.

Note To successfully establish a Databricks connection, verify that your token is valid, because they generally expire in 90 days.

Yes

Script

The file path containing the script file that the password manager uses to interact with and authenticate a user account.

Example /tmp/keytab/databricks_pwd_mgr.sh

No

Param $1 Optional. An additional parameter to authenticate your Databricks connection.

No

Param $2 Optional. An additional parameter to authenticate your Databricks connection.

No

Param $3 Optional. An additional parameter to authenticate your Databricks connection.

Yes

Tenant ID

The tenant ID of your Microsoft account.

Yes

Client ID

The client ID of your Microsoft account.

Yes

Client Secret The client secret of your Microsoft account.

JDBC Driver Jar

if necessary, you can download the Databricks JDBC zip file from one of the following links.

Note Databricks JDBC driver version 2.6.27 is packaged as part of both standalone and Kubernetes download packages.

Notebook (Supported)

Databricks no longer supports Runtime 6.5 or 10.3. Therefore, Collibra DQ Profile 2.45 is not runnable on Databricks.

https://docs.databricks.com/release-notes/runtime/10.3ml.html

The following table shows the latest supported versions of Collibra DQ Profiles and their matching Databricks Runtimes.

Spark Submit (Not Supported)

Note While these are not officially supported, there is a reference to architecture and implementation pattern for how to do a Databricks Job submission.

Limitations

  • When Archive Break Records is enabled for Azure Databricks Pushdown connections authenticated over EntraID, the data preview does not display column names correctly and shows 0 columns in the metadata bar. Therefore, Archive Break Records is not supported for Azure Databricks Pushdown connections that use EntraID authentication.