Supported transformation details

Collibra Data Lineage supports the most commonly used transformations in the following sources:

Azure Data Factory

Collibra Data Lineage supports the most commonly used transformations and data sources in Azure Data Factory.

Supported transformations

The following tables shows a non-exhaustive list of supported and unsupported transformations.

Supported transformations

Unsupported transformations

  • Aggregate1
  • Alter Row
  • Assert
  • Derived Column1
  • Exists
  • External Call2
  • Filter
  • Flatten1
  • Join
  • Lookup
  • Parse1
  • Pivot3
  • Rank
  • Select1
  • Sink4
  • Sort
  • Source
  • Split
  • Stringify
  • Surrogate Key
  • Union
  • Unpivot
  • Window1
  • Some reserved variables names, for example {@context}
  • Flowlets

Limitations

  1. Transformations that contain column patterns or rule-based mappings can only be partially analyzed because they generate column names on the fly during the actual data flow run. If technical lineage is detected from a dynamically generated column, it is given the placeholder Dynamic Column in the technical lineage viewer.
  2. In the Mapping section of the editor, column patterns are not supported and not displayed in the technical lineage graph. Note that Auto mapping uses column patterns behind the scenes and is therefore not supported either.
  3. Pivoted columns can only be inferred when explicit values are provided in the Pivot Key tab. When columns cannot be inferred, a placeholder Pivoted Columns is added.
  4. The SQL scripts and rule-based mappings in the transformation are not supported.

Supported data sources

The following table shows a non-exhaustive list of supported sources with the corresponding dataset and linked service types.

CollibraData Lineage supports all data format types that are supported in Azure Data Factory, including binary, Excel file, Delimited text, JSON, Parquet, and so on.

Data sources

Dataset type

Linked service type
Amazon Redshift AmazonRedshiftTable AmazonRedshift
Azure Blob storage AzureBlob AzureBlobStorage
Azure Data Lake Storage Gen2 AzureBlobFSFile AzureBlobFS
Azure Data Lake Store AzureDataLakeStoreFile AzureDataLakeStore
Azure Databricks Delta Lake AzureDatabricksDeltaLake AzureDatabricksDeltaLake
Azure SQL Managed Instance AzureSqlMITable AzureSqlMI
Azure SQL Server database AzureSqlTable AzureSqlDatabase
Azure Synapse Analytics AzureSqlDWTable AzureSqlDW
DB2 data source Db2Table Db2
Google Cloud Storage GoogleCloudStorageLocation GoogleCloudStorage
Microsoft Access MicrosoftAccessTable MicrosoftAccess
Microsoft Azure Cosmos Database CosmosDbSqlApiCollection CosmosDb
Open Database Connectivity (ODBC) OdbcTable Odbc
On-premises Oracle database OracleTable Oracle
REST RestResource RestService
Salesforce SalesforceObject Salesforce
Salesforce Marketing Cloud SalesforceMarketingCloudObject SalesforceMarketingCloud
Salesforce Service Cloud SalesforceServiceCloudObject SalesforceServiceCloud
SAP Business Warehouse (open hub) SapOpenHubTable SapBW
SFTP server SftpLocation Sftp
Snowflake SnowflakeTable Snowflake
SQL Server SqlServerTable SqlServer

Supported activity types

A Data Factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. There are three groupings of activities: data movement activities, data transformation activities, and control activities. For a complete list of Azure Data Factory activity types and descriptions, see Microsoft's documentation on pipelines and activities.

Collibra Data Lineage currently supports the following activity types:

Activity type Activity group
Copy Data movement
Data Flow Data transformation
Execute Pipeline Control
For Each Control
Get Metadata Control
If Condition Control
Lookup Control
Set Variable Control
Web Activity Control

Databricks Unity Catalog

  • Collibra Data Lineage retrieves lineage information from the lineage system tables that build on the Unity Catalog's data lineage feature, and visualizes lineage down to the column level. Specifically, Collibra Data Lineage ingests lineage for Databases, Schemas, Tables, and Columns, but does not ingest any other assets such as Notebooks or Workflows. So, while Collibra Data Lineage retrieves lineage information from notebooks, CollibraData Lineage does not ingest or include the notebook assets in the technical lineage.

    Note Currently, Databricks system tables don't include DLT (Delta Live Tables) column lineage.

  • Collibra Data Lineage retrieves lineage information from the lineage system tables and does not parse the language used to develop notebooks and jobs in Databricks to generate technical lineage. Therefore, you can use any supported language in Databricks. For examples of how Unity Catalog captures and presents data lineage, go to Capture and view data lineage with Unity Catalog in the Databricks documentation.

  • Collibra Data Lineage for Databricks Unity Catalog supports external delta tables referenced by external paths.

    Example 

    If the following SQL is used in Databircks Unity Catalog, lineage will be created in Collibra.
    CREATE OR REPLACE TABLE table_from_direct_delta_query AS (SELECT * FROM delta.`s3://kktesting/testfolder`)

Important 
  • Databricks Unity Catalog does not provide source code for each transformation. Therefore, the source code pane in the technical lineage graph generated by CollibraData Lineage will not display any source code.
  • Technical lineage for Databricks Unity Catalog supports only column lineage. Table lineage is not supported.

dbt

Collibra Data Lineage supports the following adapters in dbt:

  • Azure Synapse
  • Databricks
  • Google BigQuery
  • Greenplum
  • Hive
  • IBM Db2
  • Microsoft SQL Server
  • MySQL
  • Oracle
  • Postgres
  • Redshift
  • Snowflake
  • Spark
  • Teradata

dbt Cloud

Collibra Data Lineage supports materialization, and tables and views are treated like tables by default. You can customize the setting in one of the following ways so that the tables and views are treated like views:

Google Dataplex

  • Collibra Data Lineage visualizes lineage for Google Dataplex down to table level. To view the technical lineage for Google Dataplex, ensure that you select Objects in the toolbar of your technical lineage graph.
  • Google Dataplex Data Lineage can start from GCS or BigQuery and end in BigQuery.
  • Currently, stitching is not supported for the table level lineage. This support will be added in a future release with the addition of column level lineage support. When this support is available, stitching will work, regardless of whether you integrated Google Dataplex or registered Google BigQuery databases by using the BigQuery JDBC connector.
  • Transformations are ingested by calling the GCP Process and subsequently the GCP Jobs. Therefore, the Service Account user that is defined in the Edge connection requires, at a minimum, the bigquery.jobs.get permission, and optionally the bigquery.admin role, which lets the capability ingest the details of all the jobs in the project.
  • Interactive queries that are run in GCP are not shown in the source code pane in the Technical lineage viewer.

IBM DataStage

IBM DataStage uses jobs with stages instead of transformations. IBM Datastage has three job types: parallel jobs, sequence jobs and server jobs. For a list of all job stages per job type in IBM DataStage, read the IBM documentation.

Technical lineage for DataStage supports the following parameters and expressions:

Informatica PowerCenter transformations

The following table shows a non-exhaustive list of supported and unsupported transformations in Informatica PowerCenter.

Supported transformations

Unsupported transformations

  • Aggregator
  • Expression1
  • Filter
  • Input
  • Joiner
  • Lookup
  • Mapplet
  • Normalizer
  • Output
  • Pre- and post-session SQL commands
  • Rank
  • Router
  • Sorter
  • Source
  • SQL in the translate_db_type function
  • Target
  • Transaction Control
  • Union
  • Update Strategy
  • Data Masking
  • Java
  • Sequence Generator
  • Stored Procedure2
  • Web Services
  • XML
    Note 
  1. The transformation is shown if the column (expression) is using at least one column from another connected transformation.
  2. The stored procedures are stored and run in the databases that Informatica PowerCenter connects to. Collibra Data Lineage does not access the Informatica PowerCenter data sources, so Collibra Data Lineage collects the stored procedure names but does not support the Stored Procedure transformation.

Informatica Intelligent Cloud Services

The following table shows a non-exhausitive list of supported taskflows and unsupported tasks in Informatica Intelligent Cloud Services.

Supported taskflows

Unsupported tasks

  • Taskflow
  • Linear Taskflow
  • Parallel tasks
  • Parallel tasks with decision
  • Sequential tasks
  • Sequential tasks with decision
  • Single task

The following table shows a non-exhaustive list of supported and unsupported transformations and constructions in Informatica Intelligent Cloud Services. Specifically, transformations and constructions in the Cloud Data Integration service.

Supported transformations

Unsupported transformations, functions and constructions

  • Data-driven conditions
  • Expression, including custom expressions in the supported transformations
  • Filter
  • Joiner, including join conditions
  • Lookup
  • Mapplet
  • Pre SQL and post SQL commands
  • Router
  • Sequence Generator
  • Source
  • Stored Procedure
  • Target
  • Union
  • Aggregator
  • Cleanse
  • Data Masking
  • Deduplicate
  • Hierarchy Builder
  • Hierarchy Parser
  • Hierarchy Processor
  • Input
  • Java
  • Labeler
  • Machine Learning
  • Normalizer
  • NEXTVAL
  • Parse
  • Python
  • Rank
  • Rule Specification
  • Structure Parser
  • Transaction Control
  • Velocity
  • Verifier
  • Web Services

Snowflake

You can create technical lineage for Snowflake by using SQL Snowflake ingestion mode or SQL-API Snowflake ingestion mode. Collibra Data Lineage supports different queries and transformations for each ingestion method. For more information about the ingestion methods, go to Technical lineage for Snowflake ingestion methods.

SQL Snowflake ingestion mode

With the SQL Snowflake ingestion mode, Collibra Data Lineage does not support the following non-exhaustive list of transformations:

  • Snowpark

SQL-API Snowflake ingestion mode

With the SQL-API Snowflake ingestion mode, Collibra Data Lineage supports the Data Manipulation Language (DML) statements from the following sources. The table also shows a non-exhaustive list of unsupported queries and transformations.

Supported transformations

Unsupported queries and transformations

  • Using a driver
  • Data Definition Language (DDL) queries
  • Direct login
  • Stored procedures
  • The COPY INTO DML command
  • Streams 2
  • Queries or query paths that are not executed 1
  • Sequences, including generating new values
  • Snowflake Scripting 3
  • Snowpark
  • Snowpipes
Note 
  1. If you create technical lineage for Snowflake by using the JDBC connection type, only queries or query paths that are executed are supported. For example, if a SQL query contains a CASE statement, the technical lineage will only show lineage from the WHEN path that was executed. However, if you use the folder connection type to ingest Snowflake, SQL queries that include all paths of a CASE statement will be parsed and reflected in the technical lineage.
  2. Collibra Data Lineage supports lineage that uses streams as a source and lineage on tables that has streams. Collibra Data Lineage does not support lineage on a CREATE STREAM statement.
  3. Snowflake in SQL-API mode doesn't parse Snowflow Scripting, however, queries that run from Snowflake Scripting that are put in ACCESS_HISTORY still contribute to lineage.
  4. If the data sharing consumer moves data from the shared view to a table, Collibra Data Lineage does not support lineage from the table.
  5. Technical lineage for Snowflake in SQL-API mode supports lineage from batches that drop and replace tables frequently. For details, go to Snowflake SQL-API lineage missing due to frequent table replacement in Collibra Support Portal.

SQL Server Integration Services (SSIS)

Collibra Data Lineage supports the following non-exhaustive list of transformations and component types in SQL Server Integration Services:

Supported transformations

Supported component types

  • Aggregate
  • Cache Transform
  • Conditional Split
  • Data Conversion
  • Derived Column
  • Fuzzy Grouping
  • Lookup
  • Merge Join
  • Multicast
  • OLE DB Command
  • Row Count
  • Script Component
  • Slowly Changing Dimension
  • Sort
  • Union All
  • Microsoft.ADONETDestination
  • Microsoft.DataReaderSourceAdapter
  • Microsoft.ManagedComponentHost
  • Microsoft.ScriptComponentHost
  • Microsoft.XmlSourceAdapter
  • PragmaticWorks.TaskFactory.HashTransform
  • PragmaticWorks.TaskFactory.UpsertDestination


Important 
  • Collibra Data Lineage supports SQL, but cannot parse other languages or scripts, for example SHELL and BAT scripts.
  • SQL statements from Excel are not supported.
  • Collibra Data Lineage does not create lineage for disabled executables.