Databricks Unity Catalog: Supported transformation details

Collibra Data Lineage retrieves lineage metadata from Databricks Unity Catalog system tables to provide column-level visibility into data transformations and lineage across assets in the Databricks workspace.

Function scope

The integration captures metadata for the following Databricks assets:

  • Catalog hierarchy
    Databases, Schemas, Tables, and Columns.
  • Delta Live Tables (DLT)
    Technical lineage for Streaming Tables and Materialized Views at both table and column levels.
  • External delta tables
    Lineage is generated for external Delta tables referenced by external paths when direct queries are used.
    Example 

    If the following SQL is used in Databricks Unity Catalog, lineage will be created in Collibra.
    CREATE OR REPLACE TABLE table_from_direct_delta_query AS (SELECT * FROM delta.`s3://kktesting/testfolder`)

  • Volumes (in preview)
    Ingests volume metadata. Automatic stitching for volumes is planned for a future release.
  • Notebooks (in preview)
    Ingests notebook identifiers and direct URLs. Collibra Data Lineage links to the notebook source in Databricks rather than ingesting raw notebook content.

Lineage extraction mechanism

Collibra Data Lineage uses a system-driven approach to lineage extraction rather than manual code parsing:

  • Lineage is extracted from the system.access.column_lineage table.
    Because Collibra relies on these internal Databricks records, all languages supported by Unity Catalog (including SQL, Python, R, and Scala) are supported.
    For examples of how Unity Catalog captures and presents data lineage, go to Capture and view data lineage with Unity Catalog in the Databricks documentation.
  • Transformation details and source code are extracted and displayed in the technical lineage viewer for notebooks, jobs, SQL queries, and dashboards.
    To extract SQL source code, the system.query.history table must be enabled in the Databricks environment.
  • Collibra Data Lineage ingests cumulative lineage.
    Collibra Data Lineage extracts column lineage from the system.access.column_lineage table in Databricks Unity Catalog. The system.access.column_lineage table records lineage over time. This ensures the lineage graph reflects transformations that occur over a defined time window, not just the most recent execution.

Synchronization configuration

To ensure a complete lineage representation, verify that the following settings are configured.

Feature Requirement
SQL transformation code The system.query.history table must be enabled in Databricks and the Include SQL transformations option must be selected on the synchronization page when you synchronize your technical lineage.
Notebooks

Select the Ingest Notebooks (In preview) option on the synchronization page.

Volumes

Select the Ingest Volumes (In preview) option on the synchronization page.

Stitching Yes