Google Cloud Platform (GCP): Supported transformation details
Collibra Data Lineage visualizes lineage for GCP down to the column level. To view the technical lineage for GCP, ensure that you select Objects in the toolbar of your technical lineage graph.
Function scope
Collibra Data Lineage captures lineage for the following Google Cloud assets. Currently, only Column, Table, and File assets are processed and included in the technical lineage.
- BigQuery.
- Other Google Cloud services (GCS), only when they contribute lineage for BigQuery assets. Collibra Data Lineage does not collect metadata directly from other GCS. However, if these services generate lineage for BigQuery assets, that lineage is captured by GCP and included in the exported lineage file. Collibra Data Lineage then ingests this exported lineage, so any indirect lineage created by these services is reflected in the technical lineage for BigQuery assets.Note The column-level lineage generated in Collibra Data Lineage is subject to the limitations of the data lineage feature in GCP. For details, go to Limitations in the About data lineage topic of the Google Cloud documentation.
Lineage extraction mechanism
Collibra Data Lineage retrieves lineage metadata via the Google Data Lineage API to provide visibility into BigQuery and GCS data flows:
- Technical lineage for GCP can start from GCS or BigQuery and end in BigQuery.
- You can choose to create table-level lineage or column-level lineage for GCP when you synchronize the Technical Lineage for GCP capability.
- Stitching works for the column-level lineage, regardless of whether you integrated Knowledge Catalog (formerly known as Google Dataplex Catalog) or registered Google BigQuery databases by using the BigQuery JDBC connector.
- Transformations are ingested by calling the GCP Process and subsequently the GCP Jobs. Therefore, to ingest transformation details, the Service Account user defined in the Edge connection requires,
- At minimum, the
bigquery.jobs.getpermission - Optionally, the
bigquery.adminrole, which lets the capability ingest the details of all the jobs in the project
- At minimum, the
Project ingestion and stitching behavior
Collibra Data Lineage handles projects based on their pre-existing state in Data Catalog to prevent duplicate assets and ensure lineage continuity.
- If a project exists as a Database asset, for example, from a Knowledge Catalog integration or JDBC synchronization, Collibra Data Lineage preserves the full path: (System) > Database > Schema > Table > Column.
- If a project already exists as a GCP Project asset from a Dataplex integration, Collibra Data Lineage preserves the full path: (Domain Name) > GCP Project > Schema > Table > Column.
- If a project does not already exist in Data Catalog, Collibra Data Lineage ingests it as a GCP Project asset with the full path: (Domain Name) > GCP Project > Schema > Table > Column.
Data objects in the technical lineage graph are automatically stitched to corresponding Data Catalog assets when the full name matches.
Stitching is supported for both Database and GCP Project assets when the full name matches the existing integration.
Differences between technical lineage for GCP and Google BigQuery
You can create technical lineage for Google BigQuery by using a JDBC connection or for GCP by using a GCP connection. Consider the following differences to determine which data source and connection type to use.
| Feature | Support in technical lineage for GCP (column-level lineage) | Support in technical lineage for GCP (table-level lineage) | Support in technical lineage for Google BigQuery |
|---|---|---|---|
| SQL transformation code | Yes | No | Yes |
| Executed SQL in stored procedures | No (table-level only) | Yes | No |
| Ingest lineage from... |
BigQuery and other Google Cloud services supported by the data lineage feature in GCP |
BigQuery and other Google Cloud services supported by the data lineage feature in GCP | BigQuery |
| BigQuery external tables | Yes | Yes | Yes |
| Stitching | Yes | No | Yes |