Registering a Databricks file system via the Databricks JDBC connector and Edge

If you register a specific Databricks data source via the Databricks JDBC connector, the resulting assets represent the columns and the tables in the Databricks database.
You can retrieve sample data, and can profile and classify the data.

Before you begin

Steps

 

Step

What?

Description

Results

Preparation 1

Add a Databricks JDBC connection to your Edge site

Adds a Databricks JDBC connection to your Edge site.

2

Add the following capabilities:

Adds the required capabilities to the Databricks connection

Setup

3

Register the data source

Registering a data source creates the structure for the metadata in Collibra.

  • A Physical Data Dictionary domain containing a Database asset is created.
  • A list of available schemas is created on the Configuration tab page of the Database asset.
4

Configure the synchronization of your data source

Making a selection of schemas and tables that you want to ingest.

The information on the Configuration tab page of the Database asset is filled in.

Registration

5

Synchronizing the schema of a registered data source to make the metadata available in Collibra.

Schema, Table, Column, and Foreign Keys assets are created in the specified domain, and registration data becomes available.

6 If needed, profile the synchronized data.

Data profiling creates a summary of a data source that is registered with Data Catalog and determines the data type of columns in the data source. The summary mainly contains statistics and graphics to give the user an idea what the registered data is about.

The Table and Column assets contain profiling information.

For general information on working with Databricks, go to Ways to work with Databricks.