Synchronize Databricks Unity Catalog lineage
You can synchronize your technical lineage manually or automatically by adding a synchronization schedule.
If you want to synchronize technical lineage by using the Collibra Catalog Cloud Ingestions API, use the
/genericIntegration/{ingestibleId}/run API, where {ingestibleId} is the capability ID.
If you include multiple databases in your technical lineage capability template, you don't have to synchronize each database separately. One synchronization job will harvest the metadata from all relevant databases.
Steps
-
On the main toolbar, click
→
Catalog.
The Catalog homepage opens. -
In the tab bar, click
Integrations.
The Integrations page opens. - Click the
Integration Configuration tab.
- Locate the Databricks connection that you used when you added the technical lineage capability, and click the link in the Capabilities column. If multiple capabilities exist for the Databricks connection, expand them to locate your technical lineage capability. The technical lineage capability configuration page opens.
- In the Configuration Section section, click Add Configuration.
- Complete the fields as needed.
Field Action System Select the System asset in which the Databricks assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins. Catalog Name If you want to create technical lineage using views in a custom catalog instead of system tables, enter the custom catalog name in this field.
For more information about the custom catalog, see the Requirements and permissions section.Include Filter To include specific workspaces, catalogs, or schemas in technical lineage, click Add Include pattern under Include Filter. If you do not specify an include filter, technical lineage includes all lineage from Databricks Unity Catalog. The following rules apply when you enter the include pattern:
- Enter the plain string format, for example,
WorkspaceId > CatalogName > SchemaName. - You can use the ? and * wildcards in the workspace IDs, catalog names, and schema names.
- If a workspace, catalog, or schema matches multiple lines, the most detailed match is taken into account.
- Lineage is collected only when both the source and the target are in the schema, catalog, or workspace specified in the include filter.
Show examples* > HR: Includes all schemas under theHRcatalog.* > Orders > fk*: Includes schemas starting withfkunder theOrderscatalog.* > * > profiling: Includes all schemas namedprofilingunder any catalog.
Exclude Filter To exclude certain workspaces, catalogs, or schemas from technical lineage, click Add Exclude pattern under Exclude Filter. If you do not specify an exclude filter, technical lineage includes all lineage from Databricks Unity Catalog. The following rules apply when you enter the exclude pattern:
- Enter the plain string format, for example,
WorkspaceId > CatalogName > SchemaName. - The exclude filter takes precedence over the include filter.
- You can use the ? and * wildcards in the workspace IDs, catalog names, and schema names.
- If a workspace, catalog, or schema matches multiple lines, the most detailed match is taken into account.
Show examples* > testDB: Excludes all schemas under thetestDBcatalog.
SQL Sources Limit Specify the limit on the number of SQL statements included for each relation in the technical lineage graph. The default value is
5.Include SQL transformations Select this option to enable Collibra Data Lineage to extract transformation logic from notebooks, jobs, SQL queries, and dashboards, and include it in the technical lineage viewer. You can view the transformation logic in the Source code pane of the technical lineage viewer.
Clear the checkbox if you do not want Collibra Data Lineage to ingest transformation logic.
Note If the personal access token, OAuth client, or Entra ID principal used for the Databricks connection does not have SELECT permission on thesystem.query.historytable, you must clear this checkbox. Otherwise, connection errors might occur.Include external locations Select this option for Collibra Data Lineage to collect lineage information from the external locations to create end-to-end lineage.
Clear the checkbox to exclude metadata from external locations.
Ingest Volumes (In preview) Select this option to collect lineage information from volumes to create end-to-end lineage. Only lineage relationships are collected; volume assets are not created in Data Catalog.
Clear the checkbox to exclude metadata volume lineage.
Ingest Notebooks (In preview) Select this option to collect lineage information for transformations from different entities like notebooks.
Clear the checkbox to exclude lineage information for transformations from different entities.
- Enter the plain string format, for example,
- Click Save.
- In the Configuration Section section, click Synchronize now.A notification indicates synchronization has started.The synchronization job is started. Collibra Data Lineage ingests the metadata from Databricks Unity Catalog and processes the metadata to create technical lineage.
-
On the main toolbar, click
→
Catalog.
The Catalog homepage opens. -
In the tab bar, click
Integrations.
The Integrations page opens. - Click the
Integration Configuration tab.
- Locate the Databricks connection that you used when you added the technical lineage capability, and click the link in the Capabilities column. If multiple capabilities exist for the Databricks connection, expand them to locate your technical lineage capability. The technical lineage capability configuration page opens.
- In the Configuration Section section, click Add Configuration.
- Complete the fields as needed.
Field Action System Select the System asset in which the Databricks assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins. Catalog Name If you want to create technical lineage using views in a custom catalog instead of system tables, enter the custom catalog name in this field.
For more information about the custom catalog, see the Requirements and permissions section.Include Filter To include specific workspaces, catalogs, or schemas in technical lineage, click Add Include pattern under Include Filter. If you do not specify an include filter, technical lineage includes all lineage from Databricks Unity Catalog. The following rules apply when you enter the include pattern:
- Enter the plain string format, for example,
WorkspaceId > CatalogName > SchemaName. - You can use the ? and * wildcards in the workspace IDs, catalog names, and schema names.
- If a workspace, catalog, or schema matches multiple lines, the most detailed match is taken into account.
- Lineage is collected only when both the source and the target are in the schema, catalog, or workspace specified in the include filter.
Show examples* > HR: Includes all schemas under theHRcatalog.* > Orders > fk*: Includes schemas starting withfkunder theOrderscatalog.* > * > profiling: Includes all schemas namedprofilingunder any catalog.
Exclude Filter To exclude certain workspaces, catalogs, or schemas from technical lineage, click Add Exclude pattern under Exclude Filter. If you do not specify an exclude filter, technical lineage includes all lineage from Databricks Unity Catalog. The following rules apply when you enter the exclude pattern:
- Enter the plain string format, for example,
WorkspaceId > CatalogName > SchemaName. - The exclude filter takes precedence over the include filter.
- You can use the ? and * wildcards in the workspace IDs, catalog names, and schema names.
- If a workspace, catalog, or schema matches multiple lines, the most detailed match is taken into account.
Show examples* > testDB: Excludes all schemas under thetestDBcatalog.
SQL Sources Limit Specify the limit on the number of SQL statements included for each relation in the technical lineage graph. The default value is
5.Include SQL transformations Select this option to enable Collibra Data Lineage to extract transformation logic from notebooks, jobs, SQL queries, and dashboards, and include it in the technical lineage viewer. You can view the transformation logic in the Source code pane of the technical lineage viewer.
Clear the checkbox if you do not want Collibra Data Lineage to ingest transformation logic.
Note If the personal access token, OAuth client, or Entra ID principal used for the Databricks connection does not have SELECT permission on thesystem.query.historytable, you must clear this checkbox. Otherwise, connection errors might occur.Include external locations Select this option for Collibra Data Lineage to collect lineage information from the external locations to create end-to-end lineage.
Clear the checkbox to exclude metadata from external locations.
Ingest Volumes (In preview) Select this option to collect lineage information from volumes to create end-to-end lineage. Only lineage relationships are collected; volume assets are not created in Data Catalog.
Clear the checkbox to exclude metadata volume lineage.
Ingest Notebooks (In preview) Select this option to collect lineage information for transformations from different entities like notebooks.
Clear the checkbox to exclude lineage information for transformations from different entities.
- Enter the plain string format, for example,
- Click Save.
- On the Synchronization Schedule tab pane, click Add Schedule.
- Enter the required information and click Save:
Field Description Repeat The interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression. CronThe Quartz Cron expression that determines when the synchronization takes place.
This field is only visible if you select
Cron expressionin the Repeat field.EveryThe day on which you want to synchronize, for example, Sunday.
This field is only visible if you select
Weeklyin the Repeat field.Every firstThe day of the month on which you want to synchronize, for example, Tuesday.
This field is only visible if you select
Monthlyin the Repeat field.At
The time at which you want to synchronize automatically, for example, 14:00.
- You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
- This field is only visible if you select
Daily,Weekly, orMonthlyin the Repeat field.
Time zone The time zone for the schedule.