Add the Google Dataplex Catalog synchronization capability
After you have created a connection to the Google Cloud Platform (GCP) in your Edge site, you have to add the "Google Dataplex Catalog synchronization" capability to the connection.
Before you start
- You have created and installed an Edge site.
- You have given the Edge user the required permissions.
- You have created a connection to the Google Cloud Platform (GCP) in your Edge site.
- Make sure you are on the latest UI, because the Dataplex Catalog ingestion is available only in the latest UI.
Required permissions
-
You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.
Steps
- Open an Edge site.
-
On the main toolbar, click
, and then click
Settings.
The Collibra settings page opens. -
In the tab pane, click Edge.
The Sites tab opens and shows a table with an overview of the Edge sites. - In the table, click the name of the Edge site whose status is Healthy.
The Edge site page opens.
-
On the main toolbar, click
, and then click
Settings.
- In the Capabilities section, click Add capability.
The Create capability page appears. - Select Google Dataplex Catalog synchronization.
- Enter the required information.
Field Description Required Capability This section contains general information about the capability.
Name
The name of the Edge capability.
Yes
Description
The description of the Edge capability.
No
GCP service account
This section contains information on how to connect to Google Cloud Storage. GCP Connection The GCP connection to be used. Yes
Configuration This section contains information on the configuration of the crawlers. Project IDs (Deprecated) Add a comma-separated list of the Project IDs where Dataplex is enabled.
This field is deprecated in the latest user interface and replaced by the Project IDs field on the Synchronize Metadata page. You can add the Project IDs when you synchronize Google Dataplex Catalog.
The following rules apply when you add Project IDs:- If you enter a value in this field and do not add Project IDs on the Synchronize Metadata page, the capability will search in these projects in this field when you synchronize the capability.
- If you leave this field empty and do not add project IDs on the Synchronize Metadata page, the capability will search in the projects that you entered in the GCP Service Account field in the GCP connection.
- Do not enter a value in this field and also add Project IDs on the Synchronize Metadata page; otherwise, the synchronization will end with an error when you synchronize the capability.
No
Save input metadata Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. The Collibra Support team can provide the location of the saved ZIP files after the synchronization.
This checkbox is not selected by default.
No
(Deprecated) Filters and Domain Mapping ImportantThis field is deprecated. Define any mappings in the integration configuration.
No
Extensible Properties Mapping ImportantThis field does not apply if you use the Google Dataplex Catalog ingestion. Define any mappings in the integration configuration.
No
Advanced Configuration These configuration options help when investigating issues with the capability.
Important Only complete the fields Save Input Metadata, Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.
No
Debug
This field is ignored when you integrate metadata from the Google Dataplex Catalog.
An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.
Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.No
Log level
This field is ignored when you integrate metadata from the Google Dataplex Catalog.
An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.
No
- Click Create.
The capability is added to the Edge site.
The fields become read-only.
What's next?
You can synchronize the Google Dataplex. Go to Synchronize via the Google Dataplex Catalog ingestion.