Databricks Unity Catalog: Supported transformation details

Collibra Data Lineage retrieves lineage information from Databricks Unity Catalog system tables to provide column-level visibility into data transformations and lineage across assets in the Databricks workspace.

Function scope

The integration captures lineage for the following Databricks assets:

  • Catalog hierarchy
    Databases, Schemas, Tables, and Columns.
  • Delta Live Tables (DLT)
    Technical lineage for Streaming Tables and Materialized Views at both table and column levels.
  • External delta tables
    Lineage is generated for external Delta tables referenced by external paths when direct queries are used.
    Example 

    If the following SQL is used in Databricks Unity Catalog, lineage will be created in Collibra.
    CREATE OR REPLACE TABLE table_from_direct_delta_query AS (SELECT * FROM delta.`s3://kktesting/testfolder`)

  • Volumes (in preview)
    Collects lineage information from volumes. Only lineage relationships are ingested; volume assets are not created in Data Catalog. Automatic stitching for volumes is planned for a future release.
  • Notebooks (in preview)
    Captures lineage relationships for notebooks, including identifiers and direct URLs to link to sources in Databricks. Notebook content and full metadata are not ingested into Data Catalog.

Lineage extraction mechanism

Collibra Data Lineage uses a system-driven approach to lineage extraction rather than manual code parsing:

  • Lineage is extracted from the system.access.column_lineage table.
    Because Collibra relies on these internal Databricks records, all languages supported by Unity Catalog (including SQL, Python, R, and Scala) are supported.
    For examples of how Unity Catalog captures and presents data lineage, go to Capture and view data lineage with Unity Catalog in the Databricks documentation.
  • Transformation details and source code are extracted and displayed in the technical lineage viewer for notebooks, jobs, SQL queries, and dashboards.
    To extract SQL source code, the system.query.history table must be enabled in the Databricks environment.
  • Collibra Data Lineage ingests cumulative lineage.
    Collibra Data Lineage extracts column lineage from the system.access.column_lineage table in Databricks Unity Catalog. The system.access.column_lineage table records lineage over time. This ensures the lineage graph reflects transformations that occur over a defined time window, not just the most recent execution.

Synchronization configuration

To ensure a complete lineage representation, verify that the following settings are configured.

Feature Requirement
SQL transformation code The system.query.history table must be enabled in Databricks and the Include SQL transformations option must be selected on the synchronization page when you synchronize your technical lineage.
Notebooks

Select the Ingest Notebooks (In preview) option on the synchronization page.

Volumes

Select the Ingest Volumes (In preview) option on the synchronization page.

Stitching Yes