Supported transformation details

Updated: August 7, 2025

Collibra Data Lineage supports the most commonly used transformations in the following sources:

Apache Airflow (via OpenLineage), AWS Glue (via OpenLineage), and OpenLineage
Azure Data Factory
Databricks Unity Catalog
dbt
Google Dataplex
IBM DataStage
Informatica PowerCenter
Informatica Intelligent Cloud Services
Snowflake
SQL Server Integration Services

OpenLineage, Apache Airflow (via OpenLineage), and AWS Glue (via OpenLineage)

You can create technical lineage for OpenLineage on Edge. Collibra Data Lineage creates technical lineage for Airflow by using the OpenLineage Airflow integration and AWS Glue by using the OpenLineage Spark integration.

Collibra Data Lineage supports table-level lineage for jobs, which shows the inputs and outputs for each job.

Tip To view table-level lineage for jobs, switch to the Objects view. This information is not available in the Attributes view.

Collibra Data Lineage also supports column-level lineage, as described in Column Level Lineage Dataset Facet in the OpenLineage documentation. The level of support varies across integrations. Additionally, Collibra Data Lineage parses and analyzes the SQL statements as part of the SQL Job Facet.

Apache Airflow: Supports column-level lineage for specific classes. For details, see Supported classes in the Airflow documentation.
AWS Glue: Supports column-level lineage for Spark SQL DataFrames only, because the OpenLineage Spark plugin cannot extract data lineage from AWS Glue Spark Jobs that use AWS Glue DynamicFrames. For details, see Data lineage in Amazon DataZone in the AWS documentation or Quickstart with AWS Glue in the OpenLineage documentation.
OpenLineage: Support depends on how the lineage files are created.

When OpenLineage files contain SQL statements that need to be analyzed for lineage extraction, Collibra Data Lineage parses and analyzes the SQL statements instead of using the OpenLineage SQL Parser. This is because Collibra Data Lineage supports more SQL dialects and advanced SQL features.

Azure Data Factory

Collibra Data Lineage supports the most commonly used transformations and data sources in Azure Data Factory.

Pipelines

Technical lineage for Azure Data Factory retrieves and processes pipeline metadata and includes it in the technical lineage graph as follows:

In DataStage, you can have multiple jobs, and jobs can call other jobs. CollibraData Lineage extracts lineage based on whether jobs are specified in the source configuration. Additionally, parameter resolution depends on whether parameter names and values are specified for the jobs in the source configuration.

If a pipeline definition does not contain parameters, Collibra Data Lineage processes the definition and includes the associated assets under the pipeline definition folder in the technical lineage tree in the technical lineage viewer:
- Pipeline Definition
  - Assets
If a pipeline definition contains parameters, Collibra Data Lineage processes pipeline runs and pipeline triggers to resolve parameter values. The assets are then included in the following structure in the technical lineage tree:
- Pipeline Definition
  - Pipeline Runs
    - Date
      - Pipeline run ID
        Assets
  - Pipelines Triggers
    - Pipeline name and run count
      - Assets

Note that only the definitions of pipeline triggers are processed and included in the technical lineage graph.

Collibra Data Lineage processes only parameter values that are explicitly provided during pipeline execution.
To exclude pipeline run and pipeline trigger metadata from technical lineage, set the Pipeline Runs Days To Look Back field to 0 in the technical lineage for ADF capability. In this case, only pipeline definitions are processed and included in the technical lineage graph. You can also use this field to specify how many days of pipeline run metadata to be collected and processed. For more information, go to Create a technical lineage via Edge for Azure Data Factory

Supported transformations

The following tables shows a non-exhaustive list of supported and unsupported transformations.

Supported transformations	Unsupported transformations
Aggregate¹ Alter Row Assert Derived Column¹ Exists External Call² Filter Flatten¹ Join Lookup Parse¹ Pivot³ Rank Select¹ Sink⁴ Sort Source Split Stringify Surrogate Key Union Unpivot Window¹	Some reserved variables names, for example {@context} Flowlets
Limitations Transformations that contain column patterns or rule-based mappings can only be partially analyzed because they generate column names on the fly during the actual data flow run. If technical lineage is detected from a dynamically generated column, it is given the placeholder Dynamic Column in the technical lineage viewer. In the Mapping section of the editor, column patterns are not supported and not displayed in the technical lineage graph. Note that Auto mapping uses column patterns behind the scenes and is therefore not supported either. Pivoted columns can only be inferred when explicit values are provided in the Pivot Key tab. When columns cannot be inferred, a placeholder Pivoted Columns is added. The SQL scripts and rule-based mappings in the transformation are not supported.

Supported transformations

Unsupported transformations

Aggregate¹
Alter Row
Assert
Derived Column¹
Exists
External Call²
Filter
Flatten¹
Join
Lookup
Parse¹
Pivot³
Rank
Select¹
Sink⁴
Sort
Source
Split
Stringify
Surrogate Key
Union
Unpivot
Window¹

Some reserved variables names, for example {@context}
Flowlets

Limitations

Transformations that contain column patterns or rule-based mappings can only be partially analyzed because they generate column names on the fly during the actual data flow run. If technical lineage is detected from a dynamically generated column, it is given the placeholder Dynamic Column in the technical lineage viewer.
In the Mapping section of the editor, column patterns are not supported and not displayed in the technical lineage graph. Note that Auto mapping uses column patterns behind the scenes and is therefore not supported either.
Pivoted columns can only be inferred when explicit values are provided in the Pivot Key tab. When columns cannot be inferred, a placeholder Pivoted Columns is added.
The SQL scripts and rule-based mappings in the transformation are not supported.

Supported data sources

The following table shows a non-exhaustive list of supported sources with the corresponding dataset and linked service types.

CollibraData Lineage supports all data format types that are supported in Azure Data Factory, including binary, Excel file, Delimited text, JSON, Parquet, and so on.

Data sources	Dataset type	Linked service type
Amazon Redshift	AmazonRedshiftTable	AmazonRedshift
Azure Blob storage	AzureBlob	AzureBlobStorage
Azure Data Lake Storage Gen2	AzureBlobFSFile	AzureBlobFS
Azure Data Lake Store	AzureDataLakeStoreFile	AzureDataLakeStore
Azure Databricks Delta Lake	AzureDatabricksDeltaLake	AzureDatabricksDeltaLake
Azure SQL Managed Instance	AzureSqlMITable	AzureSqlMI
Azure SQL Server database	AzureSqlTable	AzureSqlDatabase
Azure Synapse Analytics	AzureSqlDWTable	AzureSqlDW
DB2 data source	Db2Table	Db2
Google Cloud Storage	GoogleCloudStorageLocation	GoogleCloudStorage
Microsoft Access	MicrosoftAccessTable	MicrosoftAccess
Microsoft Azure Cosmos Database	CosmosDbSqlApiCollection	CosmosDb
Open Database Connectivity (ODBC)	OdbcTable	Odbc
On-premises Oracle database	OracleTable	Oracle
REST	RestResource	RestService
Salesforce	SalesforceObject	Salesforce
Salesforce Marketing Cloud	SalesforceMarketingCloudObject	SalesforceMarketingCloud
Salesforce Service Cloud	SalesforceServiceCloudObject	SalesforceServiceCloud
SAP Business Warehouse (open hub)	SapOpenHubTable	SapBW
SFTP server	SftpLocation	Sftp
Snowflake	SnowflakeTable	Snowflake
SQL Server	SqlServerTable	SqlServer

Supported activity types

A Data Factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. There are three groupings of activities: data movement activities, data transformation activities, and control activities. For a complete list of Azure Data Factory activity types and descriptions, see Microsoft's documentation on pipelines and activities.

Collibra Data Lineage currently supports the following activity types:

Activity type	Activity group
Append Variable	Control flow
Copy	Data movement
Data Flow	Data transformation
Execute Pipeline	Control flow
For Each	Control flow
Get Metadata	Control flow
If Condition	Control flow
Lookup	Control flow
Set Variable	Control flow
Switch	Control flow
Until	Control flow
Web	Control flow

Databricks Unity Catalog

Collibra Data Lineage retrieves lineage information from the lineage system tables that build on the Unity Catalog's data lineage feature, and visualizes lineage down to the column level. Specifically, Collibra Data Lineage ingests lineage for Databases, Schemas, Tables, and Columns, but does not ingest any other assets such as Notebooks or Workflows. So, while Collibra Data Lineage retrieves lineage information from notebooks, Collibra Data Lineage does not ingest or include the notebook assets in the technical lineage.

Note Collibra Data Lineage retrieves lineage information from DLT (Delta Live Tables) and captures lineage for Databricks Streaming Tables and Materialized Views at both table and column levels.
Collibra Data Lineage retrieves lineage information from the lineage system tables and does not parse the language used to develop notebooks and jobs in Databricks to generate technical lineage. Therefore, you can use any supported language in Databricks. For examples of how Unity Catalog captures and presents data lineage, go to Capture and view data lineage with Unity Catalog in the Databricks documentation.
Collibra Data Lineage extracts column lineage from the system.access.column_lineage table in Databricks Unity Catalog. Since the system.access.column_lineage table records lineage over time, Collibra Data Lineage ingests cumulative lineage for a given time frame rather than just the latest version.
Collibra Data Lineage for Databricks Unity Catalog extracts SQL source code from Databricks Unity Catalog and includes the source code in the technical lineage viewer. To extract source code, ensure that the system.query.history system table is enabled. SQL source code is captured and becomes accessible only once the system.query.history table is enabled.
Collibra Data Lineage for Databricks Unity Catalog supports external delta tables referenced by external paths.

Example
If the following SQL is used in Databircks Unity Catalog, lineage will be created in Collibra.
CREATE OR REPLACE TABLE table_from_direct_delta_query AS (SELECT * FROM delta.`s3://kktesting/testfolder`)

dbt

Collibra Data Lineage supports the following adapters in dbt:

Azure Synapse
Databricks
Google BigQuery
Greenplum
Hive
IBM Db2
Microsoft SQL Server
MySQL
Oracle
Postgres
Redshift
Snowflake
Spark
Teradata

dbt Cloud

Collibra Data Lineage supports materialization, and tables and views are treated like tables by default. You can customize the setting in one of the following ways so that the tables and views are treated like views:

If you use technical lineage via Edge, specify the materializedMapping property in the Source Configuration field in the Technical Lineage for dbt Cloud capability.
If you use the lineage harvester (deprecated), specify the materializedMapping property in the <source ID> configuration file.

Google Dataplex

Collibra Data Lineage visualizes lineage for Google Dataplex down to table level. To view the technical lineage for Google Dataplex, ensure that you select Objects in the toolbar of your technical lineage graph.
Collibra Data Lineage ingests lineage from BigQuery and other Google Cloud services supported by the data lineage feature in Dataplex. However, only the lineage for Column, Table, and File assets is processed and included in the technical lineage for Dataplex.
Technical lineage for Google Dataplex can start from GCS or BigQuery and end in BigQuery.
You can choose to create table-level lineage or column-level lineage for Google Dataplex when you synchronize the Technical Lineage for Google Dataplex capability. Stitching works for the column-level lineage, regardless of whether you integrated Google Dataplex Catalog or registered Google BigQuery databases by using the BigQuery JDBC connector.
Transformations are ingested by calling the GCP Process and subsequently the GCP Jobs. Therefore, the Service Account user that is defined in the Edge connection requires, at a minimum, the bigquery.jobs.get permission, and optionally the bigquery.admin role, which lets the capability ingest the details of all the jobs in the project.

Differences between technical lineage for Google Dataplex and Google BigQuery

You can create technical lineage for Google BigQuery by using a JDBC connection or for Google Dataplex by using a Google Cloud Platform (GCP) connection. Consider the following differences to determine which data source and connection type to use.

Feature	Support in technical lineage for Google Dataplex	Support in technical lineage for Google BigQuery
SQL transformation code	Yes when creating column-level lineage	Yes
Executed SQL in stored procedures	Yes	No
Ingest lineage from...	BigQuery and other Google Cloud services supported by the data lineage feature in Dataplex	BigQuery

IBM DataStage

IBM DataStage uses jobs with stages instead of transformations. IBM Datastage has three job types: parallel jobs, sequence jobs and server jobs. For a list of all job stages per job type in IBM DataStage, read the IBM documentation.

Technical lineage for DataStage supports the following parameters and expressions:

Runtime parameters in parameter set files.
To include the runtime parameters, ensure to export DataStage files with executables. For more information about exporting DataStage files, go to Prepare an external directory folder for the lineage harvester (deprecated) if you use the lineage harvester (deprecated), or Create a technical lineage via Edge for DataStage.
Parameter sets.
To include parameters, export the parameter sets as part of your environment file. For more information about exporting DataStage files, go to Prepare an external directory folder for the lineage harvester (deprecated) if you use the lineage harvester (deprecated), or Create a technical lineage via Edge for DataStage.
Expression format.
The analysis result displays the DATASTAGE_EXPRESSION message when a complex format with advanced functions is parsed.

For details about how Collibra Data Lineage extracts lineage and resolves parameters from DataStage, see Transformation logic and common errors for DataStage.

Informatica PowerCenter transformations

The following table shows a non-exhaustive list of supported and unsupported transformations in Informatica PowerCenter.

Supported transformations	Unsupported transformations
Aggregator Expression¹ Filter Input Joiner Lookup Mapplet² Normalizer Output Pre- and post-session SQL commands Rank Router Sorter Source SQL in the `translate_db_type` function Target Transaction Control Union Update Strategy	Data Masking Java Sequence Generator Stored Procedure³ Web Services XML
Note The transformation is shown if the column (expression) is using at least one column from another connected transformation. Collibra Data Lineage supports input transformations in mapplets but does not support source definitions in mapplets. The stored procedures are stored and run in the databases that Informatica PowerCenter connects to. Collibra Data Lineage does not access the Informatica PowerCenter data sources, so Collibra Data Lineage collects the stored procedure names but does not support the Stored Procedure transformation.

Supported transformations

Unsupported transformations

Aggregator
Expression¹
Filter
Input
Joiner
Lookup
Mapplet²
Normalizer
Output
Pre- and post-session SQL commands
Rank
Router
Sorter
Source
SQL in the translate_db_type function
Target
Transaction Control
Union
Update Strategy

Data Masking
Java
Sequence Generator
Stored Procedure³
Web Services
XML

Note

The transformation is shown if the column (expression) is using at least one column from another connected transformation.
Collibra Data Lineage supports input transformations in mapplets but does not support source definitions in mapplets.
The stored procedures are stored and run in the databases that Informatica PowerCenter connects to. Collibra Data Lineage does not access the Informatica PowerCenter data sources, so Collibra Data Lineage collects the stored procedure names but does not support the Stored Procedure transformation.

Informatica Intelligent Cloud Services

The following table shows a non-exhausitive list of supported taskflows and unsupported tasks in Informatica Intelligent Cloud Services.

Supported taskflows	Unsupported tasks
Taskflow Linear Taskflow	Parallel tasks Parallel tasks with decision Sequential tasks Sequential tasks with decision Single task

The following table shows a non-exhaustive list of supported and unsupported transformations and constructions in Informatica Intelligent Cloud Services. Specifically, transformations and constructions in the Cloud Data Integration service.

Supported transformations	Unsupported transformations, functions and constructions
Data-driven conditions Expression, including custom expressions in the supported transformations Filter Joiner, including join conditions Lookup Mapplet Pre SQL and post SQL commands Router Sequence Generator Source Stored Procedure Target Union	Aggregator Cleanse Data Masking Deduplicate Hierarchy Builder Hierarchy Parser Hierarchy Processor Input Java Labeler Machine Learning Normalizer NEXTVAL Parse Python Rank Rule Specification Structure Parser Transaction Control Velocity Verifier Web Services

Snowflake

You can create technical lineage for Snowflake by using SQL Snowflake ingestion mode or SQL-API Snowflake ingestion mode. Collibra Data Lineage supports different queries and transformations for each ingestion method. For more information about the ingestion methods, go to Technical lineage for Snowflake ingestion methods.

SQL Snowflake ingestion mode

With the SQL Snowflake ingestion mode, Collibra Data Lineage does not support the following non-exhaustive list of transformations:

Snowpark

SQL-API Snowflake ingestion mode

With the SQL-API Snowflake ingestion mode, Collibra Data Lineage supports the Data Manipulation Language (DML) statements from the following sources. The table also shows a non-exhaustive list of unsupported queries and transformations.

Supported transformations	Unsupported queries and transformations
Using a driver Data Definition Language (DDL) queries Direct login Stored procedures The `COPY INTO <table>` command¹ Streams ²	Queries or query paths that are not executed ³ Sequences, including generating new values Snowflake Scripting ⁴ Snowpark Snowpipes
Note Snowflake logs the `COPY INTO <table>` query only when the table is specified as the source in a FROM clause. For more information, go to Read query notes in the ACCESS_HISTORY view topic in the Snowflake documentation. If you create technical lineage for Snowflake by using the JDBC connection type, only queries or query paths that are executed are supported. For example, if a SQL query contains a CASE statement, the technical lineage will only show lineage from the WHEN path that was executed. However, if you use the folder connection type to ingest Snowflake, SQL queries that include all paths of a CASE statement will be parsed and reflected in the technical lineage. Collibra Data Lineage supports lineage that uses streams as a source and lineage on tables that has streams. Collibra Data Lineage does not support lineage on a `CREATE STREAM` statement. Snowflake in SQL-API mode doesn't parse Snowflow Scripting, however, queries that run from Snowflake Scripting that are put in ACCESS_HISTORY still contribute to lineage. If the data sharing consumer moves data from the shared view to a table, Collibra Data Lineage does not support lineage from the table. Technical lineage for Snowflake in SQL-API mode supports lineage from batches that drop and replace tables frequently. For details, go to Snowflake SQL-API lineage missing due to frequent table replacement in Collibra Support Portal.

Supported transformations

Unsupported queries and transformations

Using a driver
Data Definition Language (DDL) queries
Direct login
Stored procedures
The COPY INTO <table> command¹
Streams ²

Queries or query paths that are not executed ³
Sequences, including generating new values
Snowflake Scripting ⁴
Snowpark
Snowpipes

Note

Snowflake logs the COPY INTO <table> query only when the table is specified as the source in a FROM clause. For more information, go to Read query notes in the ACCESS_HISTORY view topic in the Snowflake documentation.
If you create technical lineage for Snowflake by using the JDBC connection type, only queries or query paths that are executed are supported. For example, if a SQL query contains a CASE statement, the technical lineage will only show lineage from the WHEN path that was executed. However, if you use the folder connection type to ingest Snowflake, SQL queries that include all paths of a CASE statement will be parsed and reflected in the technical lineage.
Collibra Data Lineage supports lineage that uses streams as a source and lineage on tables that has streams. Collibra Data Lineage does not support lineage on a CREATE STREAM statement.
Snowflake in SQL-API mode doesn't parse Snowflow Scripting, however, queries that run from Snowflake Scripting that are put in ACCESS_HISTORY still contribute to lineage.
If the data sharing consumer moves data from the shared view to a table, Collibra Data Lineage does not support lineage from the table.
Technical lineage for Snowflake in SQL-API mode supports lineage from batches that drop and replace tables frequently. For details, go to Snowflake SQL-API lineage missing due to frequent table replacement in Collibra Support Portal.

SQL Server Integration Services (SSIS)

Collibra Data Lineage supports the following non-exhaustive list of transformations and component types in SQL Server Integration Services:

Supported transformations	Supported component types
Aggregate Cache Transform Conditional Split Data Conversion Derived Column Fuzzy Grouping Lookup Merge Join Multicast OLE DB Command Row Count Script Component Slowly Changing Dimension Sort Union All	Microsoft.ADONETDestination Microsoft.DataReaderSourceAdapter Microsoft.ManagedComponentHost Microsoft.ScriptComponentHost Microsoft.XmlSourceAdapter PragmaticWorks.TaskFactory.HashTransform PragmaticWorks.TaskFactory.UpsertDestination

Supported transformations

Supported component types

Aggregate
Cache Transform
Conditional Split
Data Conversion
Derived Column
Fuzzy Grouping
Lookup
Merge Join
Multicast
OLE DB Command
Row Count
Script Component
Slowly Changing Dimension
Sort
Union All

Microsoft.ADONETDestination
Microsoft.DataReaderSourceAdapter
Microsoft.ManagedComponentHost
Microsoft.ScriptComponentHost
Microsoft.XmlSourceAdapter
PragmaticWorks.TaskFactory.HashTransform
PragmaticWorks.TaskFactory.UpsertDestination

Important

Collibra Data Lineage supports SQL, but cannot parse other languages or scripts, for example SHELL and BAT scripts.
SQL statements from Excel are not supported.
Collibra Data Lineage does not create lineage for disabled executables.