Configure the synchronization of a data source

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

After registering your data source via Edge, you need to configure its synchronization. For each schema, you can set up synchronization rules that determine which tables are ingested and how.
You can add or edit synchronization rules for a schema and copy rules to other schemas.
Once configured, you can synchronize individual schemas or all schemas with defined rules.

Before you begin

  • You have registered your database via Edge.
  • Make sure the Database asset has an assigned owner.
    To validate this, go to the Database asset → Responsibilities. The owner is assigned during the registration of the data source. The owner doesn't need to be the same user as the user who triggers the synchronization.

Requirements and permissions

Add or edit synchronization rules

Steps

  1. Open a Database asset page.
  2. In the tab panebar, click Configuration. In the tab panebar, click Configuration.
  3. In the Metadata Synchronization tab page, select a schema.
    Tip 
    • You can search for a schema in the drop-down list or use the filter to see only specific schemas. For example, you can filter to see only schemas with synchronization rules.
      To see only schemas with the Missing from source status, select the Removed From Source option.
    • You can refresh the schema list, by clicking the Refresh List icon.
  4. If required, create, edit, or delete synchronization rules:
    1. Do one of the following:
      • To create a new rule, click Add Rule.
      • To edit an existing rule, click Edit in the upper right corner.
      • To delete a rule, click Delete Rule.
    2. If you add or edit a rule, enter the required information.
      Rule fieldDescription
      Include Tables

      A comma-separated list of the names of the tables you want to synchronize.

      • In the list, add a space after each comma. For example, CUSTOMERS, ORDER, SKU.
      • Don’t add new lines to the list. Only add comma-separated values.
      • You can use * as a wildcard. For example, SKU*.
      • The default value is *, which means all tables are taken into account.
      • If the name of a table contains a special character, like . + * \ ? ^ $ ( ) [ ] { } | then add a \ before the special character for it to be correctly evaluated. For example, *SKU\+*.
      • The Include Tables field is processed before the field.
      Example 
      • Out of all tables in a schema, you only want to synchronize the table with name "CUSTOMERS" and the tables with a name that starts with "ORDER".
        To do this:
        In the Include Tables field, enter: CUSTOMERS, ORDER*.
      • Out of all tables in a schema, you only want to synchronize the tables with a name that contains "SKU".
        To do this:
        In the Include Tables field, enter: *SKU*.
      • Out of all tables in a schema, you only want to include the tables with a name that contains "SKU+".
        To do this:
        In the Include Tables field, enter: *SKU\+*.
      Exclude Tables

      A comma-separated list of the names of the tables you do not want to synchronize.

      • In the list, add a space after each comma. For example, CUSTOMERS, ORDER, SKU.
      • Don’t add new lines to the list. Only add comma-separated values.
      • You can use * as a wildcard.
      • If the name of a table contains a special character, like . + * \ ? ^ $ ( ) [ ] { } | then add a \ before the special character for it to be correctly evaluated. For example, *SKU\+*.
      • The Include Tables field is processed before the field.

      You can use exclude to do the following:

      • Synchronize all tables in a schema except the ones defined in the Exclude Tables field.
      • Synchronize only tables as defined in the Include Tables field, with the exception of tables that are listed in the Exclude Tables field.
      Example 
      • Out of all tables in a schema, you do not want to synchronize a table with the name "ADDRESS" and tables with a name that ends with "PHONE".
        To do this:
        In the Include Tables field, enter: * and in the Exclude Tables field, enter: ADDRESS, *PHONE.
      • Out of all tables in a schema, you only want to exclude the table with name "example$table".
        To do this:
        In the Include Tables field, enter: * and in the Exclude Tables field, enter: example\$table.
      • Out of all tables in a schema, you want to synchronize the tables with a name that starts with "SKU", but exclude the tables with a name that contains "bkp".
        To do this:
        In the Include Tables field, enter: SKU* and in the Exclude Tables field, enter: *bkp*.
        From the following list, only "SKU_1" and "SKU_2" will be synchronized.
        SKU_1 , SKU_2, SKU_bkp_1, SKU_bkp_2, New, bkp, bkp_SKU
      Target Domain

      The Physical Data Dictionary domain in which the schema is synchronized.

      The default value is Schema domain. The default value is Automatically Created for Schema. This means the metadata is placed in a domain located in the same community as the domain of your Database asset. If that domain doesn't exist yet, Data Catalog creates the domain using the following naming convention: [edge_connection_name] > [database_name] > [schema_name], for example Snowflake Connection > CERTIFICATION > CUSTOMERS.

      You can select any other Physical Data Dictionary domain for which you have a resource role with the Configure External System resource permission. It is advised, however, to have a domain per schema.

      Options

      Additional options to specify which type of tables you want to synchronize.

      Exclude Database Views

      A checkbox to exclude database views from the synchronization process. If selected, no assets of the type Database view are created.

      Tip You can also use Include Tables and Exclude Tables to include or exclude specific database views.

      Include Source Tags

      If you select this option, the tags defined on the assets in the data source are registered and available from the Schema, Table, Database View, and Column assets in the Source Tags attribute.

      Note Currently, you can synchronize source tags only from Snowflake.

      Important 
      • You can add up to 10 synchronization rules. The Schema asset and the Foreign Key assets are always ingested in the domain defined in the first rule.
        Check out the examples below to understand how the order of the rules impacts the synchronization.
      • Views and Source Tags are taken into account for the tables according to the defined rule.
  5. Click Save.
    A table icon () appears next to the schema name in the schema list.
  6. Do one of the following:
    1. To save the updates for the selected schema only, click Save.
      A table icon () appears next to the schema name in the schema list.
    2. To save the updates and also copy the defined synchronization rules to other schemas:
      1. Click Save & Clone.

        The Copy Rules From dialog box appears. The dialog box shows all available schemas on the left, and the synchronization rules you'll copy on the right.

      2. In the Available Schemas list, select all the schemas to which you want to copy the rules.
        Important 

        If you copy rules to a schema that already has synchronization rules, we'll remove all defined rules and replace them by the rules you copied.

      3. If you see the This action will replace existing rules for checkbox, it means you have selected one or more schemas that already have synchronization rules.
        If you're sure that you want to replace the existing rules for those schemas, select the checkbox.
        If you want to keep the existing rules, deselect the schemas from the Available Schemas list.
      4. Click Copy Rules.
        A table icon () appears next to the schema you updated and next to the schemas you copied the rules to.

Note Only schemas with a synchronization rule can be synchronized.

Examples

Example 
  • Rule 1: Include Tables: * , Include Views: false , Target Domain: A
  • Rule 2: Include Tables: * , Include Views: true , Target Domain: B

Result: The Schema asset, and the Table, Column, and Foreign Key assets are added in Domain A. All Database View assets are added in Domain B.

Example 
  • Rule 1: Include Tables: A,B,C , Include Views: true , Source Tags: false , Target Domain: A
  • Rule 2: Include Tables: D , Include Views: true , Source Tags: true , Target Domain: A

Result: All assets are created in Domain A. Only Table D has synchronized Source Tags.

Copy synchronization rules to other schemas

  1. Open a Database asset page.
  2. In the tab panebar, click Configuration. In the tab panebar, click Configuration.
  3. In the Metadata Synchronization tab page, select the schema that contains the rules you want to copy.
    Tip 
    • You can search for a schema in the drop-down list or use the filter to show only schemas with a synchronization rule.
    • You can refresh the schema list, by clicking the Refresh List icon.
  4. Click Copy Rules.
    The Copy Rules From dialog box appears. The dialog box shows all available schemas on the left, and the synchronization rules you'll copy on the right.
  5. In the Available Schemas list, select all the schemas to which you want to copy the rules.
    Important 

    If you copy rules to a schema that already has synchronization rules, we'll remove all defined rules and replace them by the rules you copied.

  6. If you see the This action will replace existing rules for checkbox, it means you have selected one or more schemas that already have synchronization rules.
    If you're sure that you want to replace the existing rules for those schemas, select the checkbox.
    If you want to keep the existing rules, deselect the schemas from the Available Schemas list.
  7. Click Copy Rules.
    A table icon () appears next to the schemas you copied the rules to.

What's next?

You can now synchronize the schemas to ingest the metadata into Collibra.