Databricks Unity Catalog: Supported transformation details
Collibra Data Lineage retrieves lineage metadata from Databricks Unity Catalog system tables to provide column-level visibility into data transformations and lineage across assets in the Databricks workspace.
Function scope
The integration captures metadata for the following Databricks assets:
- Catalog hierarchy
Databases, Schemas, Tables, and Columns. - Delta Live Tables (DLT)
Technical lineage for Streaming Tables and Materialized Views at both table and column levels. - External delta tables
Lineage is generated for external Delta tables referenced by external paths when direct queries are used.ExampleIf the following SQL is used in Databricks Unity Catalog, lineage will be created in Collibra.
CREATE OR REPLACE TABLE table_from_direct_delta_query AS (SELECT * FROM delta.`s3://kktesting/testfolder`) - Volumes (in preview)
Ingests volume metadata. Automatic stitching for volumes is planned for a future release. - Notebooks (in preview)
Ingests notebook identifiers and direct URLs. Collibra Data Lineage links to the notebook source in Databricks rather than ingesting raw notebook content.
Lineage extraction mechanism
Collibra Data Lineage uses a system-driven approach to lineage extraction rather than manual code parsing:
- Lineage is extracted from the system.access.column_lineage table.
Because Collibra relies on these internal Databricks records, all languages supported by Unity Catalog (including SQL, Python, R, and Scala) are supported.
For examples of how Unity Catalog captures and presents data lineage, go to Capture and view data lineage with Unity Catalog in the Databricks documentation. - Transformation details and source code are extracted and displayed in the technical lineage viewer for notebooks, jobs, SQL queries, and dashboards.
To extract SQL source code, thesystem.query.historytable must be enabled in the Databricks environment. - Collibra Data Lineage ingests cumulative lineage.
Collibra Data Lineage extracts column lineage from thesystem.access.column_lineagetable in Databricks Unity Catalog. Thesystem.access.column_lineagetable records lineage over time. This ensures the lineage graph reflects transformations that occur over a defined time window, not just the most recent execution.
Synchronization configuration
To ensure a complete lineage representation, verify that the following settings are configured.
| Feature | Requirement |
|---|---|
| SQL transformation code | The system.query.history table must be enabled in Databricks and the Include SQL transformations option must be selected on the synchronization page when you synchronize your technical lineage. |
| Notebooks |
Select the Ingest Notebooks (In preview) option on the synchronization page. |
| Volumes |
Select the Ingest Volumes (In preview) option on the synchronization page. |
| Stitching | Yes |