Airflow: Supported transformation details

Collibra Data Lineage uses the OpenLineage standard to create technical lineage for OpenLineage, Apache Airflow, and AWS Glue.

Although Apache Airflow and AWS Glue appear as separate options in the interface, they rely on the same underlying OpenLineage framework. OpenLineage defines a standardized format for emitting metadata events that describe job execution and dataset interactions. Collibra Data Lineage processes these events to create technical lineage.

Apache Airflow and AWS Glue are listed separately because their setup steps have been validated and documented with dedicated setup instructions.

Collibra Data Lineage supports data sources listed in OpenLineage Naming Specification (Version 1.41.0) in the OpenLineage documentation. To ensure proper parsing and asset stitching, your OpenLineage events must follow the namespace format defined in that specification.

Function scope

Collibra Data Lineage provides visibility into how data flows at both table and column levels:

Lineage extraction mechanism

Collibra Data Lineage creates lineage from OpenLineage events captured during job execution.