Ways to work with Databricks
In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.
Use the following options to see the documentation in the latest UI or in the previous, classic UI:
In Collibra Data Intelligence Platform, you can work with Databricks in two ways.
You can register individual Databricks databases via the Databricks JDBC driver, and you can integrate all metadata of the databases from Databricks Unity Catalog.
It is important to understand the difference between these ways of working because the result in Collibra is different.
Possible way to work with Databricks | Result in Collibra |
---|---|
If you integrate Databricks Unity Catalog, you integrate the metadata of all databases in the Databricks Unity Catalog metastore into Collibra Data Intelligence Platform. The resulting assets represent the Databricks databases, schemas, tables and columns. Note
Important
You can integrate Databricks Unity Catalog only via Edge. You cannot integrate Databricks Unity Catalog via Jobserver. Use the Databricks Unity Catalog connector if you want to integrate lots of databases at the same time and in a short amount of time or if Databricks Unity Catalog is activated in your organization. With JDBC, you need to register the data, database by database. |
|
Registering a Databricks data source via the Databricks JDBC connector | If you register a specific Databricks data source via the Databricks JDBC connector, the resulting assets represent the columns and the tables in the Databricks database. You can retrieve sample data, and can profile and classify the data. Use the JDBC driver for Databricks if you want to profile, classify, and request sample data for the data source. |
Combining the two ways of working with Databricks
The two possibilities don't cancel each other out. You can use both ways to show the information you want in Collibra Data Intelligence Platform. You can use the integration of Databricks Unity Catalog to quickly get an overview of all your Databricks databases in Collibra Data Intelligence Platform. Once you have a better view on the important databases, you can register them individually via the JDBC driver.
Combining the two ways of working with Databricks |
---|
|
Example
Your Databricks Unity Catalog consists of the three databases: A, B, C.
|
Important Use the same System asset for the integration and the registration. Otherwise, assets will be duplicated.
For more information about Databricks, go to the Databricks documentation.