Dataplex: Supported transformation details
Collibra Data Lineage visualizes lineage for Google Dataplex down to the column level. To view the technical lineage for Google Dataplex, ensure that you select Objects in the toolbar of your technical lineage graph.
Function scope
Collibra Data Lineage captures lineage for the following Google Cloud assets. Currently, only Column, Table, and File assets are processed and included in the technical lineage.
- BigQuery.
- Other Google Cloud services (GCS), only when they contribute lineage for BigQuery assets. Collibra Data Lineage does not collect metadata directly from other GCS. However, if these services generate lineage for BigQuery assets, that lineage is captured by Dataplex and included in the exported lineage file. Collibra Data Lineage then ingests this exported lineage, so any indirect lineage created by these services is reflected in the technical lineage for BigQuery assets.Note The column-level lineage generated in Collibra Data Lineage is subject to the limitations of the data lineage feature in Dataplex. For details, go to Limitations in the About data lineage topic of the Dataplex Universal Catalog documentation.
Lineage extraction mechanism
Collibra Data Lineage retrieves lineage metadata via the Google Data Lineage API to provide visibility into BigQuery and GCS data flows:
- Technical lineage for Google Dataplex can start from GCS or BigQuery and end in BigQuery.
- You can choose to create table-level lineage or column-level lineage for Google Dataplex when you synchronize the Technical Lineage for Google Dataplex capability.
- Stitching works for the column-level lineage, regardless of whether you integrated Dataplex Universal Catalog or registered Google BigQuery databases by using the BigQuery JDBC connector.
- Transformations are ingested by calling the GCP Process and subsequently the GCP Jobs. Therefore, to ingest transformation details, the Service Account user defined in the Edge connection requires,
- At minimum, the
bigquery.jobs.getpermission - Optionally, the
bigquery.adminrole, which lets the capability ingest the details of all the jobs in the project
- At minimum, the
Differences between technical lineage for Google Dataplex and Google BigQuery
You can create technical lineage for Google BigQuery by using a JDBC connection or for Google Dataplex by using a Google Cloud Platform (GCP) connection. Consider the following differences to determine which data source and connection type to use.
| Feature | Support in technical lineage for Google Dataplex (column-level lineage) | Support in technical lineage for Google Dataplex (table-level lineage) | Support in technical lineage for Google BigQuery |
|---|---|---|---|
| SQL transformation code | Yes | No | Yes |
| Executed SQL in stored procedures | No (table-level only) | Yes | No |
| Ingest lineage from... |
BigQuery and other Google Cloud services supported by the data lineage feature in Dataplex |
BigQuery and other Google Cloud services supported by the data lineage feature in Dataplex | BigQuery |
| BigQuery external tables | Yes | Yes | Yes |
| Stitching | Yes | No | Yes |