Synchronize GCP lineage
You can synchronize your technical lineage manually or automatically by adding a synchronization schedule.
If you want to synchronize technical lineage by using the Collibra Catalog Cloud Ingestions API, use the
/genericIntegration/{ingestibleId}/run API, where {ingestibleId} is the capability ID.
Steps
-
On the main toolbar, click
→
Catalog.
The Catalog homepage opens. -
In the tab bar, click
Integrations.
The Integrations page opens. - Click the
Integration Configuration tab.
- Locate the GCP connection that you used when you added the technical lineage for GCP capability, and click the link in the capability column. If multiple capabilities exist for the GCP connection, expand them to locate your technical lineage for GCP capability. The synchronization configuration page opens.
- In the Synchronization Configuration section, click Add Configuration.
- Complete the fields as needed.
Field Action System Select the System asset in which the GCP assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins. Project IDs To add a GCP Project ID from which you want to harvest lineage, click Add Project Id. You can add multiple Project IDs, and the capability will harvest lineage data from across all specified projects.
Important If you choose Workload Identity Federation (WIF) using GKE as the connection type when creating the GCP connection, this field is required.GCP Locations To add a location, click Add GCP Location.
If a new location is added in GCP after you created the technical lineage, you can use this field to add the location. When you synchronize the technical lineage after adding the location, Collibra Data Lineage collects data sources only from the specified location.
For more information, go to Knowledge Catalog locations in Google Cloud documentation.Type of lineage Select the type of lineage you want to create:
- Table lineage: Create table-level lineage.
- Column lineage: Create column-level lineage.
GCS Bucket If you selected Column lineage in the Type of lineage field, enter the path to the GCS bucket you created in GCP to store the exported lineage, for example, gs://lineage-export-bucket.Skip ingesting SQL queries from interactive BigQuery jobs Use this option to control whether Collibra Data Lineage ingests SQL queries from interactive BigQuery jobs. By default, this option is not selected. Collibra Data Lineage ingests the SQL queries and includes them in the transformation and source code on the Sources tab page.
If you select this option, CollibraData Lineage does not ingest the SQL queries and excludes them from the transformation and source code.
SQL code extraction method Choose how you want to retrieve transformation code from BigQuery jobs. Select one of the following options:
- API
- Retrieve transformation code by using GET API calls.
- BigQuery Table
- Retrieve transformation code by building batch queries against
INFORMATION_SCHEMA.
Selecting
BigQuery Tableimproves performance by retrieving transformation code in bulk. If you choose this method, ensure the service account has the additional permissions as listed in GCP lineage integration preflight checks. - Click Save.
- Click Synchronize.
A notification indicates the synchronization has started.
-
On the main toolbar, click
→
Catalog.
The Catalog homepage opens. -
In the tab bar, click
Integrations.
The Integrations page opens. - Click the
Integration Configuration tab.
- Locate the GCP connection that you used when you added the technical lineage for GCP capability, and click the link in the capability column. If multiple capabilities exist for the GCP connection, expand them to locate your technical lineage for GCP capability. The synchronization configuration page opens.
- In the Synchronization Configuration section, click Add Configuration.
- Complete the fields as needed.
Field Action System Select the System asset in which the GCP assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins. Project IDs To add a GCP Project ID from which you want to harvest lineage, click Add Project Id. You can add multiple Project IDs, and the capability will harvest lineage data from across all specified projects.
Important If you choose Workload Identity Federation (WIF) using GKE as the connection type when creating the GCP connection, this field is required.GCP Locations To add a location, click Add GCP Location.
If a new location is added in GCP after you created the technical lineage, you can use this field to add the location. When you synchronize the technical lineage after adding the location, Collibra Data Lineage collects data sources only from the specified location.
For more information, go to Knowledge Catalog locations in Google Cloud documentation.Type of lineage Select the type of lineage you want to create:
- Table lineage: Create table-level lineage.
- Column lineage: Create column-level lineage.
GCS Bucket If you selected Column lineage in the Type of lineage field, enter the path to the GCS bucket you created in GCP to store the exported lineage, for example, gs://lineage-export-bucket.Skip ingesting SQL queries from interactive BigQuery jobs Use this option to control whether Collibra Data Lineage ingests SQL queries from interactive BigQuery jobs. By default, this option is not selected. Collibra Data Lineage ingests the SQL queries and includes them in the transformation and source code on the Sources tab page.
If you select this option, CollibraData Lineage does not ingest the SQL queries and excludes them from the transformation and source code.
SQL code extraction method Choose how you want to retrieve transformation code from BigQuery jobs. Select one of the following options:
- API
- Retrieve transformation code by using GET API calls.
- BigQuery Table
- Retrieve transformation code by building batch queries against
INFORMATION_SCHEMA.
Selecting
BigQuery Tableimproves performance by retrieving transformation code in bulk. If you choose this method, ensure the service account has the additional permissions as listed in GCP lineage integration preflight checks. - Click Save.
- On the Synchronization Schedule tab pane, click Add Schedule.
- Enter the required information and click Save:
Field Description Repeat The interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression. CronThe Quartz Cron expression that determines when the synchronization takes place.
This field is only visible if you select
Cron expressionin the Repeat field.EveryThe day on which you want to synchronize, for example, Sunday.
This field is only visible if you select
Weeklyin the Repeat field.Every firstThe day of the month on which you want to synchronize, for example, Tuesday.
This field is only visible if you select
Monthlyin the Repeat field.At
The time at which you want to synchronize automatically, for example, 14:00.
- You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
- This field is only visible if you select
Daily,Weekly, orMonthlyin the Repeat field.
Time zone The time zone for the schedule.