Synchronize Dataplex lineage

You can synchronize your technical lineage manually or automatically by adding a synchronization schedule.

If you want to synchronize technical lineage by using the Collibra Catalog Cloud Ingestions API, use the /genericIntegration/{ingestibleId}/run API, where {ingestibleId} is the capability ID.

Steps

  1. On the main toolbar, click Products icon Catalog.
    The Catalog homepage opens.
  2. In the tab bar, click Integrations.
    The Integrations page opens.
  3. Click the Integration Configuration tab.
  4. Locate the GCP connection that you used when you added the technical lineage for Google Dataplex capability, and click the link in the capability column. If multiple capabilities exist for the GCP connection, expand them to locate your technical lineage for Google Dataplex capability.
    The synchronization configuration page opens.
  5. In the Synchronization Configuration section, click Add Configuration.
  6. Complete the fields as needed.
    FieldAction
    SystemSelect the System asset in which the Dataplex assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins.
    Project IDs

    To add a Project ID where Dataplex is enabled, click Add Project Id. You can add multiple Project IDs. The capability will search in these projects.

    Important If you choose Workload Identity Federation (WIF) using GKE as the connection type when creating the GCP connection, this field is required.
    Dataplex LocationsTo add a Dataplex location, click Add Dataplex Location.
    If a new location is added in Dataplex after you created the technical lineage, you can use this field to add the location. When you synchronize the technical lineage after adding the location, Collibra Data Lineage collects data sources only from the specified location.
    For more information, go to Dataplex locations in Google Cloud documentation.
    Type of lineage

    Select the type of lineage you want to create:

    • Table lineage: Create table-level lineage.
    • Column lineage: Create column-level lineage.
    GCS BucketIf you selected Column lineage in the Type of lineage field, enter the path to the GCS bucket you created in Dataplex to store the exported lineage, for example, gs://lineage-export-bucket.
    Skip ingesting SQL queries from interactive BigQuery jobsUse this option to control whether Collibra Data Lineage ingests SQL queries from interactive BigQuery jobs.

    By default, this option is not selected. Collibra Data Lineage ingests the SQL queries and includes them in the transformation and source code on the Sources tab page.

    If you select this option, CollibraData Lineage does not ingest the SQL queries and excludes them from the transformation and source code.

  7. Click Save.
  8. Click Synchronize.
    A notification indicates the synchronization has started.
  1. On the main toolbar, click Products icon Catalog.
    The Catalog homepage opens.
  2. In the tab bar, click Integrations.
    The Integrations page opens.
  3. Click the Integration Configuration tab.
  4. Locate the GCP connection that you used when you added the technical lineage for Google Dataplex capability, and click the link in the capability column. If multiple capabilities exist for the GCP connection, expand them to locate your technical lineage for Google Dataplex capability.
    The synchronization configuration page opens.
  5. In the Synchronization Configuration section, click Add Configuration.
  6. Complete the fields as needed.
    FieldAction
    SystemSelect the System asset in which the Dataplex assets were ingested. Collibra Data Lineage stitches the ingested data objects to the selected assets when synchronization begins.
    Project IDs

    To add a Project ID where Dataplex is enabled, click Add Project Id. You can add multiple Project IDs. The capability will search in these projects.

    Important If you choose Workload Identity Federation (WIF) using GKE as the connection type when creating the GCP connection, this field is required.
    Dataplex LocationsTo add a Dataplex location, click Add Dataplex Location.
    If a new location is added in Dataplex after you created the technical lineage, you can use this field to add the location. When you synchronize the technical lineage after adding the location, Collibra Data Lineage collects data sources only from the specified location.
    For more information, go to Dataplex locations in Google Cloud documentation.
    Type of lineage

    Select the type of lineage you want to create:

    • Table lineage: Create table-level lineage.
    • Column lineage: Create column-level lineage.
    GCS BucketIf you selected Column lineage in the Type of lineage field, enter the path to the GCS bucket you created in Dataplex to store the exported lineage, for example, gs://lineage-export-bucket.
    Skip ingesting SQL queries from interactive BigQuery jobsUse this option to control whether Collibra Data Lineage ingests SQL queries from interactive BigQuery jobs.

    By default, this option is not selected. Collibra Data Lineage ingests the SQL queries and includes them in the transformation and source code on the Sources tab page.

    If you select this option, CollibraData Lineage does not ingest the SQL queries and excludes them from the transformation and source code.

  7. Click Save.
  8. On the Synchronization Schedule tab pane, click Add Schedule.
  9. Enter the required information and click Save:
    FieldDescription
    RepeatThe interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression.
    Cron

    The Quartz Cron expression that determines when the synchronization takes place.

    This field is only visible if you select Cron expression in the Repeat field.

    Every

    The day on which you want to synchronize, for example, Sunday.

    This field is only visible if you select Weekly in the Repeat field.

    Every first

    The day of the month on which you want to synchronize, for example, Tuesday.

    This field is only visible if you select Monthly in the Repeat field.

    At

    The time at which you want to synchronize automatically, for example, 14:00.

    • You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
    • This field is only visible if you select Daily, Weekly, or Monthly in the Repeat field.
    Time zoneThe time zone for the schedule.