Synchronize via the Google Dataplex Catalog ingestion

Important This feature is available only in the latest UI.

Synchronizing via the Google Dataplex Catalog ingestion is the process of integrating metadata from the Google Dataplex projects and making the data available in Collibra Platform.

You can synchronize manually, or you can automate it by adding a synchronization schedule.

Prerequisites

You have enabled the Cloud Resource Manager API in GCP.
You have created a GCP connection.
You have added the Google Dataplex Catalog synchronization capability to the GCP connection.
You created a System asset in which you want to add the Google Dataplex assets, for example BigQuery.
Make sure you are on the latest UI, because the Dataplex Catalog ingestion is available only in the latest UI.
You have a resource role with the Configure external system resource permission, for example, Owner.
You have a global role with the Catalog global permission, for example, Catalog Author.
You have a global role with the View Edge connections and capabilities global permission, for example, Edge integration engineer.

Steps

Manually synchronize Google Dataplex Catalog
Add a synchronization schedule

On the main toolbar, click → Catalog.
The Catalog homepage opens.
On the main toolbar, click .
The Create dialog box appears.
In the Register with Edge section of the Create dialog box, click Integration Configuration.
The Integration Configuration tab page opens.
In the Connection Name column, locate the GCP connection that you used when you added the Dataplex capability and click the capability link in the Capabilities column.
The Dataplex capability configuration page opens.
In the Synchronization Configuration section, click the Edit icon.
In Ingestion Type, select Dataplex Catalog ingestion.
This will integrate the Dataplex Catalog Entries and Aspects.
If you want to integrate the metadata from the projects, lakes, zones, tables, and columns, go to Dataplex ingestion.

Complete the fields as follows:

Field	Mandatory / Optional	Action
System	Mandatory	In System, select the System asset in which you want to add the Google Dataplex assets.
Updated: <timestamp>	Optional	Click Updated: <timestamp> next to Synchronization Configuration, where `timestamp` indicates the last time when the data was loaded from Google Dataplex. The Project IDs are loaded to the drop-down list of the Project Id field. This can take some time.
Project ID	Optional	To add a Project ID where Dataplex is enabled, click Add Project Id. You can add multiple Project IDs. The capability will search in these projects. The following rules apply when you add Project IDs: If you do not add Project IDs here but entered a value in the Project IDs (Deprecated) field in the Google Dataplex Catalog synchronization capability, the capability will search in the projects that you entered in the capability. If you do not add Project IDs here and left the Project IDs (Deprecated) field empty in the Google Dataplex Catalog synchronization capability, the capability will search in the projects that you entered in the Service Account / Workload Identity Federation (WIF) field in the GCP connection. This applies only when the connection type is set to `Service Account`. Do not add Project IDs here and also enter a value in the Project IDs (Deprecated) field in the Google Dataplex Catalog synchronization capability; otherwise, the synchronization will end with an error.
Dataplex location	Optional	Select the Dataplex locations you want to integrate. If you select locations, the integration ingests Dataplex assets only from the specified locations. If the location is added in Dataplex but is not visible in the list, you can use this field to add the location for integration. Type the name of the location and press Enter. The Dataplex Catalog ingestion allows for both single-region and multi-region locations. For more information, go to Dataplex locations in Google Cloud documentation.
Domain Include Mappings	Optional	In Domain Include Mappings, specify which entries in Google Dataplex that you want to integrate and the Collibra domains where they need to be added. Note If no include mappings are defined, we ingest all assets into the same domain as the System asset. If there is no explicit domain mapping for a schema, we use the domain specified for the database. A match with a database has priority over a match with a schema. Show steps to add a domain include mapping Click Add Domain Include Mappings. In Path, add the path to the entries in Google Dataplex for which you want to integrate the metadata. Tip Use the following pattern: project name > location name > entryGroup name > parentEntry name > childEntry name. In the context of BigQuery, the parentEntry would be a BigQuery dataset name and childEntry would be a BigQuery table name. You can use the ? and * wildcards. If an entry matches multiple lines, the most detailed match is taken into account. Example `* > * > * > datasetX > tableY` `projectA > europe-west1 > * > datasetA` `projectB > * > @bigquery` In Domain, select the Collibra domain in which you want to integrate the metadata.
Domain Exclude Mappings	Optional	Optionally, in Domain Exclude Mappings, specify the path to entries in Google Dataplex that you don't want to integrate. Note The exclude mapping has priority over the include mapping. Show steps to add a domain exclude mapping Click Add Domain Exclude Mappings. In the field, add the path to entries that you want to exclude. Tip You can use the ? and * wildcards. For example:`projectA > * > @bigquery`.
Columns ingestion mode	Mandatory	In Columns ingestion mode, define how the ingestion must handle nested fields. The available options are: Ingest only parent columns: If you select this option, only the highest level fields are ingested as assets in Collibra. The hierarchy is shown via the View Array and View Struct links in the Technical Data Type column of these assets. Show an image Ingest parent and nested columns: If you select this option, Columns assets will be created for all fields. The parent assets also show the hierarchy via the View Array and View Struct links in the Technical Data Type column of these assets. Show an image Flatten columns structure: If you select this option, only the lowest level fields are ingested as assets. Show an image
Aspect Mappings	Optional	Aspects in Google Dataplex that refer to columns are integrated as Column assets in Collibra during a Dataplex Catalog ingestion. Optionally, in the this field, you can specify additional aspects in Google Dataplex that you want to integrate. Aspect mapping is supported for Schema, Table, Database View, and Column assets, including partition columns. To map an aspect, enter the Google Dataplex aspect in Aspect field and the corresponding Collibra attribute in the Attribute field. Show details Important If you use this feature, make sure to add all required characteristics to the asset type assignments. Click Add Another Mapping. In Aspect Field, add the reference to the aspect field you want to integrate. Use the following pattern: `location.aspectName>fieldPath`. For example: europe-west4.aieh-custom-aspect>custom_field1 In Attribute, select the attribute in which you want to see the value. Show an example of partition column mapping If you have a table in Google Dataplex that is partitioned by date, you can create a custom attribute named Partition interval and specify `bigquery-table > partitioning.interval` → `Partition interval` as shown in the following image. After synchronization, the partition interval values such as `Day`, `Hour`, and so on are added to the Partition interval attribute of the corresponding Column asset in Collibra.

Click Save.
Click Synchronize.
A notification indicates the synchronization has started.

On the main toolbar, click → Catalog.
The Catalog homepage opens.
On the main toolbar, click .
The Create dialog box appears.
In the Register with Edge section of the Create dialog box, click Integration Configuration.

The Integration Configuration tab page opens.
In the Connection Name column, locate the GCP connection that you used when you added the Dataplex capability and click the capability link in the Capabilities column.
The Dataplex capability configuration page opens.
In the Synchronization Configuration section, click the Edit icon.

Complete the fields as follows:

Field	Mandatory / Optional	Action
System	Mandatory	In System, select the System asset in which you want to add the Google Dataplex assets.
Updated: <timestamp>	Optional	Click Updated: <timestamp> next to Synchronization Configuration, where `timestamp` indicates the last time when the data was loaded from Google Dataplex. The Project IDs are loaded to the drop-down list of the Project Id field. This can take some time.
Project ID	Optional	To add a Project ID where Dataplex is enabled, click Add Project Id. You can add multiple Project IDs. The capability will search in these projects. The following rules apply when you add Project IDs: If you do not add Project IDs here but entered a value in the Project IDs (Deprecated) field in the Google Dataplex Catalog synchronization capability, the capability will search in the projects that you entered in the capability. If you do not add Project IDs here and left the Project IDs (Deprecated) field empty in the Google Dataplex Catalog synchronization capability, the capability will search in the projects that you entered in the Service Account / Workload Identity Federation (WIF) field in the GCP connection. This applies only when the connection type is set to `Service Account`. Do not add Project IDs here and also enter a value in the Project IDs (Deprecated) field in the Google Dataplex Catalog synchronization capability; otherwise, the synchronization will end with an error.
Dataplex location	Optional	Select the Dataplex locations you want to integrate. If you select locations, the integration ingests Dataplex assets only from the specified locations. If the location is added in Dataplex but is not visible in the list, you can use this field to add the location for integration. Type the name of the location and press Enter. The Dataplex Catalog ingestion allows for both single-region and multi-region locations. For more information, go to Dataplex locations in Google Cloud documentation.
Domain Include Mappings	Optional	In Domain Include Mappings, specify which entries in Google Dataplex that you want to integrate and the Collibra domains where they need to be added. Note If no include mappings are defined, we ingest all assets into the same domain as the System asset. If there is no explicit domain mapping for a schema, we use the domain specified for the database. A match with a database has priority over a match with a schema. Show steps to add a domain include mapping Click Add Domain Include Mappings. In Path, add the path to the entries in Google Dataplex for which you want to integrate the metadata. Tip Use the following pattern: project name > location name > entryGroup name > parentEntry name > childEntry name. In the context of BigQuery, the parentEntry would be a BigQuery dataset name and childEntry would be a BigQuery table name. You can use the ? and * wildcards. If an entry matches multiple lines, the most detailed match is taken into account. Example `* > * > * > datasetX > tableY` `projectA > europe-west1 > * > datasetA` `projectB > * > @bigquery` In Domain, select the Collibra domain in which you want to integrate the metadata.
Domain Exclude Mappings	Optional	Optionally, in Domain Exclude Mappings, specify the path to entries in Google Dataplex that you don't want to integrate. Note The exclude mapping has priority over the include mapping. Show steps to add a domain exclude mapping Click Add Domain Exclude Mappings. In the field, add the path to entries that you want to exclude. Tip You can use the ? and * wildcards. For example:`projectA > * > @bigquery`.
Columns ingestion mode	Mandatory	In Columns ingestion mode, define how the ingestion must handle nested fields. The available options are: Ingest only parent columns: If you select this option, only the highest level fields are ingested as assets in Collibra. The hierarchy is shown via the View Array and View Struct links in the Technical Data Type column of these assets. Show an image Ingest parent and nested columns: If you select this option, Columns assets will be created for all fields. The parent assets also show the hierarchy via the View Array and View Struct links in the Technical Data Type column of these assets. Show an image Flatten columns structure: If you select this option, only the lowest level fields are ingested as assets. Show an image
Aspect Mappings	Optional	Aspects in Google Dataplex that refer to columns are integrated as Column assets in Collibra during a Dataplex Catalog ingestion. Optionally, in the this field, you can specify additional aspects in Google Dataplex that you want to integrate. Aspect mapping is supported for Schema, Table, Database View, and Column assets, including partition columns. To map an aspect, enter the Google Dataplex aspect in Aspect field and the corresponding Collibra attribute in the Attribute field. Show details Important If you use this feature, make sure to add all required characteristics to the asset type assignments. Click Add Another Mapping. In Aspect Field, add the reference to the aspect field you want to integrate. Use the following pattern: `location.aspectName>fieldPath`. For example: europe-west4.aieh-custom-aspect>custom_field1 In Attribute, select the attribute in which you want to see the value. Show an example of partition column mapping If you have a table in Google Dataplex that is partitioned by date, you can create a custom attribute named Partition interval and specify `bigquery-table > partitioning.interval` → `Partition interval` as shown in the following image. After synchronization, the partition interval values such as `Day`, `Hour`, and so on are added to the Partition interval attribute of the corresponding Column asset in Collibra.

Click Save.
Click the Add synchronization schedule icon.

Enter the required information and click Save:

Field	Description
Repeat	The interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression.
Cron	The Quartz Cron expression that determines when the synchronization takes place. This field is only visible if you select `Cron expression` in the Repeat field.
Every	The day on which you want to synchronize, for example, Sunday. This field is only visible if you select `Weekly` in the Repeat field.
Every first	The day of the month on which you want to synchronize, for example, Tuesday. This field is only visible if you select `Monthly` in the Repeat field.
At	The time at which you want to synchronize automatically, for example, 14:00. You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45. This field is only visible if you select `Daily`, `Weekly`, or `Monthly` in the Repeat field.
Time zone	The time zone for the schedule.

What's next?

The synchronization job synchronizes the Google Dataplex data.
After the synchronization:

You can view a summary of the results from the Activities list.
For information on the integrated data, go to Synchronized data via Google Dataplex Catalog ingestion.