Registering a Databricks file system via Databricks JDBC connector and Edge

You can register a Databricks data source using the Databricks JDBC connector to ingest your metadata in Collibra. This process creates assets that represent your Databricks tables and columns, providing a clear view of your data landscape. You can also configure the connection to retrieve sample data, profile your data, and set up data classification.

Prerequisites

Steps

 

Step

What

Description

Results

Preparation 1

Add a Databricks JDBC connection to your Edge or Collibra Cloud site

Adds a Databricks JDBC connection to your Edge or Collibra Cloud site.

2

Add the following capabilities:

Adds the required capabilities to the Databricks connection.

Setup

3

Register the data source

Registering a data source creates the metadata structure in Collibra.

  • A Physical Data Dictionary domain containing a Database asset is created.
  • A list of available schemas is created on the Configuration tab of the Database asset.
4

Configure the synchronization of your data source

Making a selection of schemas and tables that you want to ingest.

The information on the Configuration tab of the Database asset is filled.

Registration

5

Synchronizing the schema of a registered data source to make the metadata available in Collibra.

Schema, Table, Column, and Foreign Keys assets are created in the specified domain, and registration data becomes available.

6 Optionally, profile the synchronized data.

Data Profiling creates a summary of a data source in Data Catalog and determines the data type of columns in the data source. The summary mainly contains statistics and graphics to give the user an idea what the data is about.

Data Profiling is available for registered JDBC data sources and for Databricks Unity Catalog and Dataplex Catalog data sources integrated via Edge.

The Table and Column assets contain profiling information.
7 Optionally, classify the synchronized data. Creates data classification suggestions for the Column assets. The Column assets are classified.

Helpful resources

For general information on working with Databricks in Collibra, go to Ways to work with Databricks.

For more information about Databricks, go to Databricks documentation.