Google Cloud Platform (GCP): Supported transformation details

Collibra Data Lineage visualizes lineage for GCP down to the column level. To view the technical lineage for GCP, ensure that you select Objects in the toolbar of your technical lineage graph.

Function scope

Collibra Data Lineage captures lineage for the following Google Cloud assets. Currently, only Column, Table, and File assets are processed and included in the technical lineage.

  • BigQuery.
  • Other Google Cloud services (GCS), only when they contribute lineage for BigQuery assets. Collibra Data Lineage does not collect metadata directly from other GCS. However, if these services generate lineage for BigQuery assets, that lineage is captured by GCP and included in the exported lineage file. Collibra Data Lineage then ingests this exported lineage, so any indirect lineage created by these services is reflected in the technical lineage for BigQuery assets.
    Note The column-level lineage generated in Collibra Data Lineage is subject to the limitations of the data lineage feature in GCP. For details, go to Limitations in the About data lineage topic of the Google Cloud documentation.

Lineage extraction mechanism

Collibra Data Lineage retrieves lineage metadata via the Google Data Lineage API to provide visibility into BigQuery and GCS data flows:

  • Technical lineage for GCP can start from GCS or BigQuery and end in BigQuery.
  • You can choose to create table-level lineage or column-level lineage for GCP when you synchronize the Technical Lineage for GCP capability.
  • Stitching works for the column-level lineage, regardless of whether you integrated Knowledge Catalog (formerly known as Google Dataplex Catalog) or registered Google BigQuery databases by using the BigQuery JDBC connector.
  • Transformations are ingested by calling the GCP Process and subsequently the GCP Jobs. Therefore, to ingest transformation details, the Service Account user defined in the Edge connection requires,

Project ingestion and stitching behavior

Collibra Data Lineage handles projects based on their pre-existing state in Data Catalog to prevent duplicate assets and ensure lineage continuity.

  • If a project exists as a Database asset, for example, from a Knowledge Catalog integration or JDBC synchronization, Collibra Data Lineage preserves the full path: (System) > Database > Schema > Table > Column.
  • If a project already exists as a GCP Project asset from a Dataplex integration, Collibra Data Lineage preserves the full path: (Domain Name) > GCP Project > Schema > Table > Column.
  • If a project does not already exist in Data Catalog, Collibra Data Lineage ingests it as a GCP Project asset with the full path: (Domain Name) > GCP Project > Schema > Table > Column.

Data objects in the technical lineage graph are automatically stitched to corresponding Data Catalog assets when the full name matches.

Stitching is supported for both Database and GCP Project assets when the full name matches the existing integration.

Differences between technical lineage for GCP and Google BigQuery

You can create technical lineage for Google BigQuery by using a JDBC connection or for GCP by using a GCP connection. Consider the following differences to determine which data source and connection type to use.

Feature Support in technical lineage for GCP (column-level lineage) Support in technical lineage for GCP (table-level lineage) Support in technical lineage for Google BigQuery
SQL transformation code Yes No Yes
Executed SQL in stored procedures No (table-level only) Yes No
Ingest lineage from...

BigQuery and other Google Cloud services supported by the data lineage feature in GCP

BigQuery and other Google Cloud services supported by the data lineage feature in GCP BigQuery
BigQuery external tables Yes Yes Yes
Stitching Yes No Yes