Synchronize via Google Dataplex ingestion

Synchronizing via Google Dataplex ingestion is the process of integrating metadata from the Google Dataplex projects and making the data available in Collibra Data Intelligence Platform.

You can synchronize manually, or you can automate it by adding a synchronization schedule.

Before you begin

  • You have enabled the Cloud Resource Manager API in GCP.
  • You have created a GCP connection.
  • You have added the Google Dataplex Catalog synchronization capability to the GCP connection.
  • You know in which System asset you want to add the Google Dataplex assets.
    • If you have registered Google databases before via the JDBC driver, use the same System asset.
    • If you never registered Google databases before, create a new System asset manually and use that one.

Requirements and permissions

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

Steps

  1. On the main toolbar, click Products icon, and then click Catalog.
    The Catalog Home opens.
  2. On the main toolbar, click .
    The Create dialog box appears.
  3. In the Register with Edge section of the Create dialog box, click Register a data sourceIntegration Configuration.
    The Register contentIntegration Configuration tab page opens.
  4. In the Connection Name column, locate the GCP connection that you used when you added the Dataplex capability and click the capability link in the Data sources/CapabilitiesCapabilities column.
    The Dataplex capability configuration page opens.
  5. In the Synchronization Configuration section, click Add Configuration.
  6. In the Configuration Section section, click Add Configuration.
  7. In Ingestion Type, select Dataplex ingestion.
    This will integrate the metadata from the projects, lakes, zones, tables, and columns.
    If you want to integrate the Dataplex Catalog Entries and Aspects, go to Dataplex Catalog ingestion.
  8. Complete the fields as follows:
    FieldMandatory / OptionalAction
    SystemMandatory

    In System, select the System asset in which you want to add the Google Dataplex assets.

    Updated: <timestamp>OptionalClick Updated: <timestamp> next to Synchronization Configuration, where timestamp indicates the last time when the data was loaded from Google Dataplex.
    The Project IDs are loaded to the drop-down list of the Project Id fields that you can use in the following step. This can take some time.
    Project IDOptional

    To add a Project ID where Dataplex is enabled, click Add Project Id. You can add multiple Project IDs. The capability will search in these projects.

    The following rules apply when you add Project IDs:
    • If you do not add Project IDs here but entered a value in the Project IDs (Deprecated) field in the Google Dataplex Catalog synchronization capability, the capability will search in the projects that you entered in the capability.
    • If you do not add Project IDs here and left the Project IDs (Deprecated) field empty in the Google Dataplex Catalog synchronization capability, the capability will search in the projects that you entered in the GCP Service Account field in the GCP connection.
    • Do not add Project IDs here and also enter a value in the Project IDs (Deprecated) field in the Google Dataplex Catalog synchronization capability; otherwise, the synchronization will end with an error.
    Dataplex locationOptionalTo add a Dataplex location, click Add Dataplex Location. If a new location is added in Dataplex and not yet supported by the integration, you can use this field to add the location for integration. When you add a location in this field, the integration ingests Dataplex assets only from the specified location.
    For more information, go to Dataplex locations in Google Cloud documentation.
    Domain Include MappingsOptional

    In Domain Include Mappings, specify which entries in Google Dataplex that you want to integrate and the Collibra domains where they need to be added.

    Note 
    • If no include mappings are defined, we ingest all assets into the same domain as the System asset.
    • If there is no explicit domain mapping for a schema, we use the domain specified for the database.
    • A match with a database has priority over a match with a schema.
    Domain Exclude MappingsOptional

    In Domain Exclude Mappings, specify the path to entries in Google Dataplex that you don't want to integrate.

    Note The exclude mapping has priority over the include mapping.

    Custom Label MappingsOptional
    To ingest and map labels from Google Dataplex to asset attributes in Data Catalog, complete the following steps. Labels can be mapped to any out-of-the-box (OOTB) attributes or custom attributes that you added for the Dataplex Lake, Dataplex Zone, and Schema asset types.
    1. Click Custom Label Mappings.
    2. In the Label field, enter the label key from Google Dataplex.
    3. In the Attribute field, select an OOTB attribute or a custom attribute for the Dataplex Lake, Dataplex Zone, and Schema asset types.
  9. In System, select the System asset in which you want to add the Google Dataplex assets.
  10. (No longer available from 2023.09) In Project IDs, add a comma-separated list of the Project IDs where Dataplex is enabled.
    The capability will search in these projects. If the Project IDs field is empty, the integration will search in the project included in the provided GCP Service Account.
  11. Click Save.
  12. Click Save Configuration.
  13. Click Synchronize.
    A notification indicates the synchronization has started.
  1. On the main toolbar, click Products icon, and then click Catalog.
    The Catalog Home opens.
  2. On the main toolbar, click .
    The Create dialog box appears.
  3. In the Register with Edge section of the Create dialog box, click Register a data sourceIntegration Configuration.
    The Register contentIntegration Configuration tab page opens.
  4. In the Connection Name column, locate the GCP connection that you used when you added the Dataplex capability and click the capability link in the Data sources/CapabilitiesCapabilities column.
    The Dataplex capability configuration page opens.
  5. In the Synchronization Configuration section, click Add Configuration.
  6. In the Configuration Section section, click Add Configuration.
  7. In Ingestion Type, select Dataplex ingestion.
    This will integrate the metadata from the projects, lakes, zones, tables, and columns.
    If you want to integrate the Dataplex Catalog Entries and Aspects, go to Dataplex Catalog ingestion.
  8. Complete the fields as follows:
    FieldMandatory / OptionalAction
    SystemMandatory

    In System, select the System asset in which you want to add the Google Dataplex assets.

    Updated: <timestamp>OptionalClick Updated: <timestamp> next to Synchronization Configuration, where timestamp indicates the last time when the data was loaded from Google Dataplex.
    The Project IDs are loaded to the drop-down list of the Project Id fields that you can use in the following step. This can take some time.
    Project IDOptional

    To add a Project ID where Dataplex is enabled, click Add Project Id. You can add multiple Project IDs. The capability will search in these projects.

    The following rules apply when you add Project IDs:
    • If you do not add Project IDs here but entered a value in the Project IDs (Deprecated) field in the Google Dataplex Catalog synchronization capability, the capability will search in the projects that you entered in the capability.
    • If you do not add Project IDs here and left the Project IDs (Deprecated) field empty in the Google Dataplex Catalog synchronization capability, the capability will search in the projects that you entered in the GCP Service Account field in the GCP connection.
    • Do not add Project IDs here and also enter a value in the Project IDs (Deprecated) field in the Google Dataplex Catalog synchronization capability; otherwise, the synchronization will end with an error.
    Dataplex locationOptionalTo add a Dataplex location, click Add Dataplex Location. If a new location is added in Dataplex and not yet supported by the integration, you can use this field to add the location for integration. When you add a location in this field, the integration ingests Dataplex assets only from the specified location.
    For more information, go to Dataplex locations in Google Cloud documentation.
    Domain Include MappingsOptional

    In Domain Include Mappings, specify which entries in Google Dataplex that you want to integrate and the Collibra domains where they need to be added.

    Note 
    • If no include mappings are defined, we ingest all assets into the same domain as the System asset.
    • If there is no explicit domain mapping for a schema, we use the domain specified for the database.
    • A match with a database has priority over a match with a schema.
    Domain Exclude MappingsOptional

    In Domain Exclude Mappings, specify the path to entries in Google Dataplex that you don't want to integrate.

    Note The exclude mapping has priority over the include mapping.

    Custom Label MappingsOptional
    To ingest and map labels from Google Dataplex to asset attributes in Data Catalog, complete the following steps. Labels can be mapped to any out-of-the-box (OOTB) attributes or custom attributes that you added for the Dataplex Lake, Dataplex Zone, and Schema asset types.
    1. Click Custom Label Mappings.
    2. In the Label field, enter the label key from Google Dataplex.
    3. In the Attribute field, select an OOTB attribute or a custom attribute for the Dataplex Lake, Dataplex Zone, and Schema asset types.
  9. In System, select the System asset in which you want to add the Google Dataplex assets.
  10. (No longer available from 2023.09) In Project IDs, add a comma-separated list of the Project IDs where Dataplex is enabled.
    The capability will search in these projects. If the Project IDs field is empty, the integration will search in the project included in the provided GCP Service Account.
  11. Click Save.
  12. Click Save Configuration.
  13. In the Synchronization ScheduleSynchronization schedule section, click Add schedule.
  14. Enter the required information and click Save:
    FieldDescription
    RepeatThe interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression.
    Cron

    The Quartz Cron expression that determines when the synchronization takes place.

    This field is only visible if you select Cron expression in the Repeat field.

    Every

    The day on which you want to synchronize, for example, Sunday.

    This field is only visible if you select Weekly in the Repeat field.

    Every first

    The day of the month on which you want to synchronize, for example, Tuesday.

    This field is only visible if you select Monthly in the Repeat field.

    At

    The time at which you want to synchronize automatically, for example, 14:00.

    • You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
    • This field is only visible if you select Daily, Weekly, or Monthly in the Repeat field.
    Time zoneThe time zone for the schedule.

What's next?

The synchronization job synchronizes the Google Dataplex data.
After the synchronization: