Steps: Integrate Databricks Unity Catalog via Edge
In Collibra 2024.05, we launched a new user interface (UI) for Collibra Platform! You can learn more about this latest UI in the UI overview.
Use the following options to see the documentation in the latest UI or in the previous, classic UI:
Use the following steps to integrate Databricks Unity Catalog. You can choose to set up sampling, profiling, and classification as needed. This feature is in preview.
If you previously used both the Databricks Unity Catalog integration and the Databricks JDBC synchronization for some databases, and you want to use only the Databricks Unity Catalog integration, complete the steps in Switching to working exclusively with the Databricks Unity Catalog integration (in preview).
Steps to integrate metadata
# | Step | Description |
---|---|---|
1 | Create the required connections |
|
1a
|
Create a Databricks connection to your Edge or Collibra Cloud site. | Creates a Databricks connection to Databricks in an Edge or Collibra Cloud site, which will be used during the metadata synchronization. |
1b
|
Optionally, create a Databricks JDBC connection to your Edge or Collibra Cloud site. |
Creates a JDBC Databricks connection to Databricks in an Edge or Collibra Cloud site. Create a Databricks JDBC connection only if you want to profile and classify the integrated data. If you created a Databricks JDBC connection previously, you can use that JDBC connection. |
2 | Add the Databricks Unity Catalog synchronization capability to the Edge or Collibra Cloud site. |
Adds the Databricks Unity Catalog capability to the Edge connections you created for Databricks. If you want to profile and classify the integrated data, and request sample data, select the Databricks JDBC connection on the Databricks Unity Catalog synchronization capability. |
3 | Synchronize Databricks Unity Catalog. |
You can manually synchronize Databricks Unity Catalog or add a synchronization schedule. If you selected a JDBC connection in the previous step, the synchronization process automatically creates the Catalog JDBC ingestion, JDBC profiling, and Catalog Data Classification capabilities if they do not already exist. When the synchronization is completed, the metadata is integrated and the Profiling tab is available on the Database asset page. |
4 | Optionally, set up and configure data profiling | Goes through the required permission and steps to prepare Edge and Collibra to profile columns in Databricks Unity Catalog. |
5 | Optionally, enable and set up Unified Data Classification | Goes through the required permission and steps to prepare Edge and Collibra to classify columns in Databricks Unity Catalog via the Unified Data Classification method. |
6 | Optionally, set up and configure the use of sample data | Goes through the required permissions and steps to prepare Edge and Collibra to show sample data for columns in Databricks Unity Catalog. |
Result |
Users with the correct permissions can now configure the profiling options and profile the data, automatically classify the data, or request sample data. |
Integration workflow
The following graphic shows the process of integrating Databricks Unity Catalog, profiling and classifying the data, and requesting sample data (in preview).
# | Step | Description |
---|---|---|
1 | Create a Databricks connection to your Edge or Collibra Cloud site. | Creates a connection to Databricks in an Edge or Collibra Cloud site. |
2 |
Add the Databricks Unity Catalog capability to the Edge or Collibra Cloud site. | Adds the Databricks Unity Catalog capability to the Edge connection. The capability allows you to retrieve data from Databricks Unity Catalog. |
3 | Synchronize Databricks Unity Catalog. |
You can manually synchronize Databricks Unity Catalog or add a synchronization schedule. Once the synchronization is completed, the metadata is integrated. |