Synchronize GCP lineage
You can synchronize your technical lineage manually or automatically by adding a synchronization schedule.
If you want to synchronize technical lineage by using the Collibra Catalog Cloud Ingestions API, use the
/genericIntegration/{ingestibleId}/run API, where {ingestibleId} is the capability ID.
Steps
-
On the main toolbar, click
→
Catalog.
The Catalog homepage opens. -
In the tab bar, click
Integrations.
The Integrations page opens. - Click the
Integration Configuration tab.
- Locate the GCP connection that you used when you added the technical lineage for GCP capability, and click the link in the capability column. If multiple capabilities exist for the GCP connection, expand them to locate your technical lineage for GCP capability. The synchronization configuration page opens.
- In the Synchronization Configuration section, click Add Configuration.
- Complete the fields as needed.
Field Action System Select the System asset in which the GCP assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins. Project IDs To add a GCP Project ID from which you want to harvest lineage, click Add Project Id. You can add multiple Project IDs, and the capability will harvest lineage data from across all specified projects.
Important If you choose Workload Identity Federation (WIF) using GKE as the connection type when creating the GCP connection, this field is required.GCP Locations To add a location, click Add GCP Location.
If a new location is added in GCP after you created the technical lineage, you can use this field to add the location. When you synchronize the technical lineage after adding the location, Collibra Data Lineage collects data sources only from the specified location.
For more information, go to Knowledge Catalog locations in Google Cloud documentation.Type of lineage Select the type of lineage you want to create:
- Table lineage: Create table-level lineage.
- Column lineage: Create column-level lineage.
GCS Bucket If you selected Column lineage in the Type of lineage field, enter the path to the GCS bucket you created in GCP to store the exported lineage, for example, gs://lineage-export-bucket.Skip ingesting SQL queries from interactive BigQuery jobs Use this option to control whether Collibra Data Lineage ingests SQL queries from interactive BigQuery jobs. By default, this option is not selected. Collibra Data Lineage ingests the SQL queries and includes them in the transformation and source code on the Sources tab page.
If you select this option, CollibraData Lineage does not ingest the SQL queries and excludes them from the transformation and source code.
SQL code extraction method Choose how you want to retrieve transformation code from BigQuery jobs. Select one of the following options:
- API
- Retrieve transformation code by using GET API calls.
- BigQuery Table
- Retrieve transformation code by building batch queries against
INFORMATION_SCHEMA.
Selecting
BigQuery Tableimproves performance by retrieving transformation code in bulk. If you choose this method, ensure the service account has the additional permissions as listed in GCP lineage integration preflight checks.Include filter To include specific projects, locations, datasets, or tables in technical lineage, click Add include pattern under Include filter. Enter one pattern per entry.
If you do not specify an include filter, technical lineage includes all lineage from GCP.
The following rules apply when you enter an include pattern:
- Enter the pattern in the format
project > location > @bigquery > dataset > table. - You can use the ? and * wildcards in the project, location, dataset, and table segments.
- For a lineage link to be ingested, both the source and the target must match at least one include pattern.
- The exclude filter takes precedence over the include filter.
Examplemy-project > us > @bigquery > sales_* > *_raw: Includes tables ending in_rawin datasets starting withsales_, in theuslocation of my-project.* > * > @bigquery > hr > *: Includes all tables in thehrdataset across any project and location.
Note This field is only available when Type of lineage is set to Table lineage.Exclude filter To exclude specific projects, locations, datasets, or tables from technical lineage, click Add exclude pattern under Exclude filter. Enter one pattern per entry.
If you do not specify an exclude filter, no lineage is suppressed.
The following rules apply when you enter an exclude pattern:
- Enter the pattern in the format
project > location > @bigquery > dataset > table. - You can use the ? and * wildcards in the project, location, dataset, and table segments.
- If either the source or the target matches an exclude pattern, the link is suppressed.
- The exclude filter takes precedence over the include filter.
Example* > * > @bigquery > script_* > *: Excludes all tables in datasets starting withscript_across any project and location.Note This field is only available when Type of lineage is set to Table lineage. - Click Save.
- Click Synchronize.
A notification indicates the synchronization has started.
-
On the main toolbar, click
→
Catalog.
The Catalog homepage opens. -
In the tab bar, click
Integrations.
The Integrations page opens. - Click the
Integration Configuration tab.
- Locate the GCP connection that you used when you added the technical lineage for GCP capability, and click the link in the capability column. If multiple capabilities exist for the GCP connection, expand them to locate your technical lineage for GCP capability. The synchronization configuration page opens.
- In the Synchronization Configuration section, click Add Configuration.
- Complete the fields as needed.
Field Action System Select the System asset in which the GCP assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins. Project IDs To add a GCP Project ID from which you want to harvest lineage, click Add Project Id. You can add multiple Project IDs, and the capability will harvest lineage data from across all specified projects.
Important If you choose Workload Identity Federation (WIF) using GKE as the connection type when creating the GCP connection, this field is required.GCP Locations To add a location, click Add GCP Location.
If a new location is added in GCP after you created the technical lineage, you can use this field to add the location. When you synchronize the technical lineage after adding the location, Collibra Data Lineage collects data sources only from the specified location.
For more information, go to Knowledge Catalog locations in Google Cloud documentation.Type of lineage Select the type of lineage you want to create:
- Table lineage: Create table-level lineage.
- Column lineage: Create column-level lineage.
GCS Bucket If you selected Column lineage in the Type of lineage field, enter the path to the GCS bucket you created in GCP to store the exported lineage, for example, gs://lineage-export-bucket.Skip ingesting SQL queries from interactive BigQuery jobs Use this option to control whether Collibra Data Lineage ingests SQL queries from interactive BigQuery jobs. By default, this option is not selected. Collibra Data Lineage ingests the SQL queries and includes them in the transformation and source code on the Sources tab page.
If you select this option, CollibraData Lineage does not ingest the SQL queries and excludes them from the transformation and source code.
SQL code extraction method Choose how you want to retrieve transformation code from BigQuery jobs. Select one of the following options:
- API
- Retrieve transformation code by using GET API calls.
- BigQuery Table
- Retrieve transformation code by building batch queries against
INFORMATION_SCHEMA.
Selecting
BigQuery Tableimproves performance by retrieving transformation code in bulk. If you choose this method, ensure the service account has the additional permissions as listed in GCP lineage integration preflight checks.Include filter To include specific projects, locations, datasets, or tables in technical lineage, click Add include pattern under Include filter. Enter one pattern per entry.
If you do not specify an include filter, technical lineage includes all lineage from GCP.
The following rules apply when you enter an include pattern:
- Enter the pattern in the format
project > location > @bigquery > dataset > table. - You can use the ? and * wildcards in the project, location, dataset, and table segments.
- For a lineage link to be ingested, both the source and the target must match at least one include pattern.
- The exclude filter takes precedence over the include filter.
Examplemy-project > us > @bigquery > sales_* > *_raw: Includes tables ending in_rawin datasets starting withsales_, in theuslocation of my-project.* > * > @bigquery > hr > *: Includes all tables in thehrdataset across any project and location.
Note This field is only available when Type of lineage is set to Table lineage.Exclude filter To exclude specific projects, locations, datasets, or tables from technical lineage, click Add exclude pattern under Exclude filter. Enter one pattern per entry.
If you do not specify an exclude filter, no lineage is suppressed.
The following rules apply when you enter an exclude pattern:
- Enter the pattern in the format
project > location > @bigquery > dataset > table. - You can use the ? and * wildcards in the project, location, dataset, and table segments.
- If either the source or the target matches an exclude pattern, the link is suppressed.
- The exclude filter takes precedence over the include filter.
Example* > * > @bigquery > script_* > *: Excludes all tables in datasets starting withscript_across any project and location.Note This field is only available when Type of lineage is set to Table lineage. - Click Save.
- On the Synchronization Schedule tab pane, click Add Schedule.
- Enter the required information and click Save:
Field Description Repeat The interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression. CronThe Quartz Cron expression that determines when the synchronization takes place.
This field is only visible if you select
Cron expressionin the Repeat field.EveryThe day on which you want to synchronize, for example, Sunday.
This field is only visible if you select
Weeklyin the Repeat field.Every firstThe day of the month on which you want to synchronize, for example, Tuesday.
This field is only visible if you select
Monthlyin the Repeat field.At
The time at which you want to synchronize automatically, for example, 14:00.
- You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
- This field is only visible if you select
Daily,Weekly, orMonthlyin the Repeat field.
Time zone The time zone for the schedule.