Synchronize Dataplex lineage
You can synchronize your technical lineage manually or automatically by adding a synchronization schedule.
If you want to synchronize technical lineage by using the Collibra Catalog Cloud Ingestions API, use the
/genericIntegration/{ingestibleId}/run API, where {ingestibleId} is the capability ID.
Steps
-
On the main toolbar, click
→
Catalog.
The Catalog homepage opens. -
In the tab bar, click
Integrations.
The Integrations page opens. - Click the
Integration Configuration tab.
- Locate the GCP connection that you used when you added the technical lineage for Google Dataplex capability, and click the link in the capability column. If multiple capabilities exist for the GCP connection, expand them to locate your technical lineage for Google Dataplex capability. The synchronization configuration page opens.
- In the Synchronization Configuration section, click Add Configuration.
- Complete the fields as needed.
Field Action System Select the System asset in which the Dataplex assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins. Project IDs To add a Project ID where Dataplex is enabled, click Add Project Id. You can add multiple Project IDs. The capability will search in these projects.
Important If you choose Workload Identity Federation (WIF) using GKE as the connection type when creating the GCP connection, this field is required.Dataplex Locations To add a Dataplex location, click Add Dataplex Location.
If a new location is added in Dataplex after you created the technical lineage, you can use this field to add the location. When you synchronize the technical lineage after adding the location, Collibra Data Lineage collects data sources only from the specified location.
For more information, go to Dataplex locations in Google Cloud documentation.Type of lineage Select the type of lineage you want to create:
- Table lineage: Create table-level lineage.
- Column lineage: Create column-level lineage.
GCS Bucket If you selected Column lineage in the Type of lineage field, enter the path to the GCS bucket you created in Dataplex to store the exported lineage, for example, gs://lineage-export-bucket.Skip ingesting SQL queries from interactive BigQuery jobs Use this option to control whether Collibra Data Lineage ingests SQL queries from interactive BigQuery jobs. By default, this option is not selected. Collibra Data Lineage ingests the SQL queries and includes them in the transformation and source code on the Sources tab page.
If you select this option, CollibraData Lineage does not ingest the SQL queries and excludes them from the transformation and source code.
- Click Save.
- Click Synchronize.
A notification indicates the synchronization has started.
-
On the main toolbar, click
→
Catalog.
The Catalog homepage opens. -
In the tab bar, click
Integrations.
The Integrations page opens. - Click the
Integration Configuration tab.
- Locate the GCP connection that you used when you added the technical lineage for Google Dataplex capability, and click the link in the capability column. If multiple capabilities exist for the GCP connection, expand them to locate your technical lineage for Google Dataplex capability. The synchronization configuration page opens.
- In the Synchronization Configuration section, click Add Configuration.
- Complete the fields as needed.
Field Action System Select the System asset in which the Dataplex assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins. Project IDs To add a Project ID where Dataplex is enabled, click Add Project Id. You can add multiple Project IDs. The capability will search in these projects.
Important If you choose Workload Identity Federation (WIF) using GKE as the connection type when creating the GCP connection, this field is required.Dataplex Locations To add a Dataplex location, click Add Dataplex Location.
If a new location is added in Dataplex after you created the technical lineage, you can use this field to add the location. When you synchronize the technical lineage after adding the location, Collibra Data Lineage collects data sources only from the specified location.
For more information, go to Dataplex locations in Google Cloud documentation.Type of lineage Select the type of lineage you want to create:
- Table lineage: Create table-level lineage.
- Column lineage: Create column-level lineage.
GCS Bucket If you selected Column lineage in the Type of lineage field, enter the path to the GCS bucket you created in Dataplex to store the exported lineage, for example, gs://lineage-export-bucket.Skip ingesting SQL queries from interactive BigQuery jobs Use this option to control whether Collibra Data Lineage ingests SQL queries from interactive BigQuery jobs. By default, this option is not selected. Collibra Data Lineage ingests the SQL queries and includes them in the transformation and source code on the Sources tab page.
If you select this option, CollibraData Lineage does not ingest the SQL queries and excludes them from the transformation and source code.
- Click Save.
- On the Synchronization Schedule tab pane, click Add Schedule.
- Enter the required information and click Save:
Field Description Repeat The interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression. CronThe Quartz Cron expression that determines when the synchronization takes place.
This field is only visible if you select
Cron expressionin the Repeat field.EveryThe day on which you want to synchronize, for example, Sunday.
This field is only visible if you select
Weeklyin the Repeat field.Every firstThe day of the month on which you want to synchronize, for example, Tuesday.
This field is only visible if you select
Monthlyin the Repeat field.At
The time at which you want to synchronize automatically, for example, 14:00.
- You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
- This field is only visible if you select
Daily,Weekly, orMonthlyin the Repeat field.
Time zone The time zone for the schedule.