Azure Data Factory: Supported transformation details

Collibra Data Lineage supports the most commonly used transformations and data sources in Azure Data Factory.

Pipelines

Technical lineage for Azure Data Factory retrieves and processes pipeline metadata and includes it in the technical lineage graph as follows:

Supported transformations

The following tables shows a non-exhaustive list of supported and unsupported transformations.

Supported transformations

Unsupported transformations

  • Aggregate1
  • Alter Row
  • Assert
  • Derived Column1
  • Exists
  • External Call2
  • Filter
  • Flatten1
  • Join
  • Lookup
  • Parse1
  • Pivot3
  • Rank
  • Select1
  • Sink4
  • Sort
  • Source
  • Split
  • Stringify
  • Surrogate Key
  • Union
  • Unpivot
  • Window1
  • Some reserved variables names, for example {@context}
  • Flowlets

Limitations

  1. Transformations that contain column patterns or rule-based mappings can only be partially analyzed because they generate column names on the fly during the actual data flow run. If technical lineage is detected from a dynamically generated column, it is given the placeholder Dynamic Column in the technical lineage viewer.
  2. In the Mapping section of the editor, column patterns are not supported and not displayed in the technical lineage graph. Note that Auto mapping uses column patterns behind the scenes and is therefore not supported either.
  3. Pivoted columns can only be inferred when explicit values are provided in the Pivot Key tab. When columns cannot be inferred, a placeholder Pivoted Columns is added.
  4. The SQL scripts and rule-based mappings in the transformation are not supported.

Supported data sources

The following table shows a non-exhaustive list of supported sources with the corresponding dataset and linked service types.

CollibraData Lineage supports all data format types that are supported in Azure Data Factory, including binary, Excel file, Delimited text, JSON, Parquet, and so on.

Data sources

Dataset type

Linked service type
Amazon Redshift AmazonRedshiftTable AmazonRedshift
Azure Blob storage AzureBlob AzureBlobStorage
Azure Data Lake Storage Gen2 AzureBlobFSFile AzureBlobFS
Azure Data Lake Store AzureDataLakeStoreFile AzureDataLakeStore
Azure Databricks Delta Lake AzureDatabricksDeltaLake AzureDatabricksDeltaLake
Azure SQL Managed Instance AzureSqlMITable AzureSqlMI
Azure SQL Server database AzureSqlTable AzureSqlDatabase
Azure Synapse Analytics AzureSqlDWTable AzureSqlDW
DB2 data source Db2Table Db2
Google Cloud Storage GoogleCloudStorageLocation GoogleCloudStorage
Microsoft Access MicrosoftAccessTable MicrosoftAccess
Microsoft Azure Cosmos Database CosmosDbSqlApiCollection CosmosDb
Open Database Connectivity (ODBC) OdbcTable Odbc
On-premises Oracle database OracleTable Oracle
REST RestResource RestService
Salesforce SalesforceObject Salesforce
Salesforce Marketing Cloud SalesforceMarketingCloudObject SalesforceMarketingCloud
Salesforce Service Cloud SalesforceServiceCloudObject SalesforceServiceCloud
SAP Business Warehouse (open hub) SapOpenHubTable SapBW
SFTP server SftpLocation Sftp
Snowflake SnowflakeTable Snowflake
SQL Server SqlServerTable SqlServer

Supported activity types

A Data Factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. There are three groupings of activities: data movement activities, data transformation activities, and control activities. For a complete list of Azure Data Factory activity types and descriptions, see Microsoft's documentation on pipelines and activities.

Collibra Data Lineage currently supports the following activity types:

Activity type Activity group
Append Variable Control flow
Copy Data movement
Data Flow Data transformation
Execute Pipeline Control flow
For Each Control flow
Get Metadata Control flow
If Condition Control flow
Lookup Control flow
Set Variable Control flow
Switch Control flow
Until Control flow
Web Control flow