Synchronize Databricks Unity Catalog lineage

You can synchronize your technical lineage manually or automatically by adding a synchronization schedule.

If you want to synchronize technical lineage by using the Collibra Catalog Cloud Ingestions API, use the /genericIntegration/{ingestibleId}/run API, where {ingestibleId} is the capability ID.

If you include multiple databases in your technical lineage capability template, you don't have to synchronize each database separately. One synchronization job will harvest the metadata from all relevant databases.

Warning During the ingestion process, relations of the type "Data Element targets / sources Data Element" are automatically created between certain assets. Any relations of this type that you manually create between assets will be deleted during the synchronization process. If you want to manually create such relations and ensure that they are maintained, you can create a custom technical lineage.

Steps

  1. On the main toolbar, click Products icon Catalog.
    The Catalog homepage opens.
  2. In the tab bar, click Integrations.
    The Integrations page opens.
  3. Click the Integration Configuration tab.
  4. Locate the Databricks connection that you used when you added the technical lineage capability, and click the link in the Capabilities column. If multiple capabilities exist for the Databricks connection, expand them to locate your technical lineage capability.
    The technical lineage capability configuration page opens.
  5. In the Configuration Section section, click Add Configuration.
  6. Complete the fields as needed.
    FieldAction
    SystemSelect the System asset in which the Databricks assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins.
    Catalog Name If you want to create technical lineage using views in a custom catalog instead of system tables, enter the custom catalog name in this field.
    For more information about the custom catalog, see the Requirements and permissions section.
    Include FilterTo include specific workspaces, catalogs, or schemas in technical lineage, click Add Include pattern under Include Filter. If you do not specify an include filter, technical lineage includes all lineage from Databricks Unity Catalog.

    The following rules apply when you enter the include pattern:

    • Enter the plain string format, for example, WorkspaceId > CatalogName > SchemaName.
    • You can use the ? and * wildcards in the workspace IDs, catalog names, and schema names.
    • If a workspace, catalog, or schema matches multiple lines, the most detailed match is taken into account.
    • Lineage is collected only when both the source and the target are in the schema, catalog, or workspace specified in the include filter.
    Exclude FilterTo exclude certain workspaces, catalogs, or schemas from technical lineage, click Add Exclude pattern under Exclude Filter. If you do not specify an exclude filter, technical lineage includes all lineage from Databricks Unity Catalog.

    The following rules apply when you enter the exclude pattern:

    • Enter the plain string format, for example, WorkspaceId > CatalogName > SchemaName.
    • The exclude filter takes precedence over the include filter.
    • You can use the ? and * wildcards in the workspace IDs, catalog names, and schema names.
    • If a workspace, catalog, or schema matches multiple lines, the most detailed match is taken into account.
    SQL Sources Limit

    Specify the limit on the number of SQL statements included for each relation in the technical lineage graph. The default value is 5.

    Include SQL transformations

    Select this option to enable Collibra Data Lineage to extract transformation logic from notebooks, jobs, SQL queries, and dashboards, and include it in the technical lineage viewer. You can view the transformation logic in the Source code pane of the technical lineage viewer.

    Clear the checkbox if you do not want Collibra Data Lineage to ingest transformation logic.

    Note If the personal access token, OAuth client, or Entra ID principal used for the Databricks connection does not have SELECT permission on the system.query.history table, you must clear this checkbox. Otherwise, connection errors might occur.
    Include external locations

    Select this option for Collibra Data Lineage to collect lineage information from the external locations to create end-to-end lineage.

    Clear the checkbox to exclude metadata from external locations.

    Ingest Volumes (In preview)

    Select this option to collect lineage information from volumes to create end-to-end lineage. Only lineage relationships are collected; volume assets are not created in Data Catalog.

    Clear the checkbox to exclude metadata volume lineage.

    Ingest Notebooks (In preview)

    Select this option to collect lineage information for transformations from different entities like notebooks.

    Clear the checkbox to exclude lineage information for transformations from different entities.

  7. Click Save.
  8. In the Configuration Section section, click Synchronize now.
    A notification indicates synchronization has started.
    The synchronization job is started. Collibra Data Lineage ingests the metadata from Databricks Unity Catalog and processes the metadata to create technical lineage.
  1. On the main toolbar, click Products icon Catalog.
    The Catalog homepage opens.
  2. In the tab bar, click Integrations.
    The Integrations page opens.
  3. Click the Integration Configuration tab.
  4. Locate the Databricks connection that you used when you added the technical lineage capability, and click the link in the Capabilities column. If multiple capabilities exist for the Databricks connection, expand them to locate your technical lineage capability.
    The technical lineage capability configuration page opens.
  5. In the Configuration Section section, click Add Configuration.
  6. Complete the fields as needed.
    FieldAction
    SystemSelect the System asset in which the Databricks assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins.
    Catalog Name If you want to create technical lineage using views in a custom catalog instead of system tables, enter the custom catalog name in this field.
    For more information about the custom catalog, see the Requirements and permissions section.
    Include FilterTo include specific workspaces, catalogs, or schemas in technical lineage, click Add Include pattern under Include Filter. If you do not specify an include filter, technical lineage includes all lineage from Databricks Unity Catalog.

    The following rules apply when you enter the include pattern:

    • Enter the plain string format, for example, WorkspaceId > CatalogName > SchemaName.
    • You can use the ? and * wildcards in the workspace IDs, catalog names, and schema names.
    • If a workspace, catalog, or schema matches multiple lines, the most detailed match is taken into account.
    • Lineage is collected only when both the source and the target are in the schema, catalog, or workspace specified in the include filter.
    Exclude FilterTo exclude certain workspaces, catalogs, or schemas from technical lineage, click Add Exclude pattern under Exclude Filter. If you do not specify an exclude filter, technical lineage includes all lineage from Databricks Unity Catalog.

    The following rules apply when you enter the exclude pattern:

    • Enter the plain string format, for example, WorkspaceId > CatalogName > SchemaName.
    • The exclude filter takes precedence over the include filter.
    • You can use the ? and * wildcards in the workspace IDs, catalog names, and schema names.
    • If a workspace, catalog, or schema matches multiple lines, the most detailed match is taken into account.
    SQL Sources Limit

    Specify the limit on the number of SQL statements included for each relation in the technical lineage graph. The default value is 5.

    Include SQL transformations

    Select this option to enable Collibra Data Lineage to extract transformation logic from notebooks, jobs, SQL queries, and dashboards, and include it in the technical lineage viewer. You can view the transformation logic in the Source code pane of the technical lineage viewer.

    Clear the checkbox if you do not want Collibra Data Lineage to ingest transformation logic.

    Note If the personal access token, OAuth client, or Entra ID principal used for the Databricks connection does not have SELECT permission on the system.query.history table, you must clear this checkbox. Otherwise, connection errors might occur.
    Include external locations

    Select this option for Collibra Data Lineage to collect lineage information from the external locations to create end-to-end lineage.

    Clear the checkbox to exclude metadata from external locations.

    Ingest Volumes (In preview)

    Select this option to collect lineage information from volumes to create end-to-end lineage. Only lineage relationships are collected; volume assets are not created in Data Catalog.

    Clear the checkbox to exclude metadata volume lineage.

    Ingest Notebooks (In preview)

    Select this option to collect lineage information for transformations from different entities like notebooks.

    Clear the checkbox to exclude lineage information for transformations from different entities.

  7. Click Save.
  8. On the Synchronization Schedule tab pane, click Add Schedule.
  9. Enter the required information and click Save:
    FieldDescription
    RepeatThe interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression.
    Cron

    The Quartz Cron expression that determines when the synchronization takes place.

    This field is only visible if you select Cron expression in the Repeat field.

    Every

    The day on which you want to synchronize, for example, Sunday.

    This field is only visible if you select Weekly in the Repeat field.

    Every first

    The day of the month on which you want to synchronize, for example, Tuesday.

    This field is only visible if you select Monthly in the Repeat field.

    At

    The time at which you want to synchronize automatically, for example, 14:00.

    • You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
    • This field is only visible if you select Daily, Weekly, or Monthly in the Repeat field.
    Time zoneThe time zone for the schedule.