Steps: Integrate Databricks Unity Catalog via Edge

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

The steps differ depending on whether you want to be able to profile and classify the column data after the Databricks Unity Catalog integration.

Steps to integrate metadata and allow for sampling, profiling, and classification (beta)

# Step Description
1 Give the Edge Site user the required permissions. Ensures the Edge Site user can integrate the metadata.
2 Create the required connections

 

2a
Create a Databricks connection to your Edge site. Creates a Databricks connection to Databricks in an Edge site, which will be used during the metadata synchronization.
2b
Create a Databricks JDBC connection to your Edge site. Creates a JDBC Databricks connection to Databricks in an Edge site, which will be used during profiling and classification.
3 Add the required capabilities  
3a
Add the Databricks Unity Catalog synchronization capability to the Edge site.

Adds the Databricks Unity Catalog capability to the Edge connections you created for Databricks.
The capability allows you to retrieve metadata from Databricks Unity Catalog and links the Databricks connection and the JDBC Databricks connection to each other.

In the latest UI, we support the integration of Databricks AI models via Edge. If you want to integrate Databricks AI models, make sure to enable AI Governance. If you don't enable AI Governance, the AI integration functionality is limited.
3b
Add the JDBC Catalog Ingestion capability to the Edge site. Adds the JDBC Catalog Ingestion capability to the JDBC Databricks connection. The capability will allow to retrieve the available databases and schemas in Databricks Unity Catalog during profiling and classification.
4 Synchronize Databricks Unity Catalog.

You can manually synchronize Databricks Unity Catalog or add a synchronization schedule.

Once the synchronization is completed, the metadata is integrated.

5 Set up and configure data profiling Goes through the required permission and steps to prepare Edge and Collibra to profile columns in Databricks Unity Catalog.
6 Enable and set up Unified Data Classification Goes through the required permission and steps to prepare Edge and Collibra to classify columns in Databricks Unity Catalog via the Unified Data Classification method.
7 Set up and configure the use of sample data Goes through the required permissions and steps to prepare Edge and Collibra to show sample data for columns in Databricks Unity Catalog.
Result

Users with the correct permissions can now re-synchronize the metadata, configure the profiling options and profile the data, automatically classify the data, or request sample data.

 

Steps to only integrate the metadata

# Step Description
1 Give the Edge Site user the required permissions. Ensures the Edge Site user can integrate the metadata.
2 Create a Databricks connection to your Edge site. Creates a connection to Databricks in an Edge site.

3

Add the Databricks Unity Catalog capability to the Edge site. Adds the Databricks Unity Catalog capability to the Edge connection. The capability allows you to retrieve data from Databricks Unity Catalog.
4 Synchronize Databricks Unity Catalog.

You can manually synchronize Databricks Unity Catalog or add a synchronization schedule.

Once the synchronization is completed, the metadata is integrated.

 

# Step Description
1 Give the Edge Site user the required permissions. Ensures the Edge Site user can integrate the metadata.
2 Create a Databricks connection to your Edge site. Creates a connection to Databricks in an Edge site.

3

Add the Databricks Unity Catalog capability to the Edge site. Adds the Databricks Unity Catalog capability to the Edge connection. The capability allows you to retrieve data from Databricks Unity Catalog.
4 Synchronize Databricks Unity Catalog.

You can manually synchronize Databricks Unity Catalog or add a synchronization schedule.

Once the synchronization is completed, the metadata is integrated.