Dataplex lineage integration preflight checks

To ensure successful metadata ingestion and lineage generation, complete the following preflight checks.

In your Dataplex environment

Be sure to review the supported transformation details to determine whether to create technical lineage for Google Dataplex or Google BigQuery. You can also use this information to understand the lineage data that Collibra Data Lineage ingests from Google Dataplex.

  • Use your service account when you create a GCP connection. You can choose to create table-level lineage or column-level lineage for Dataplex when you synchronize the capability.
    • To create table-level lineage: 
      • Enable the Data Lineage API in Dataplex for the projects that you want to harvest lineage from.
        For more information, go to Data Lineage API in Google Cloud documentation.
      • The Data Lineage Viewer role.
      • The BigQuery Admin role if you want Collibra Data Lineage to collect lineage from stored procedures that you created and those created by other Dataplex users.
      • The bigquery.jobs.get permission.
        For more information, go to IAM basic and predefined roles reference in the Google Cloud documentation.
    • To create column-level lineage:
      • Enable the exportMetadata API, which is in Preview. To enable the API , contact Google by following the steps in the article How to Request Access to Google Dataplex Column Level Lineage (preview) export API.
      • Create a GCS bucket to store the exported lineage metadata. Ensure you have the following permissions on the GCS bucket:
        • The storage.objects.create permission in the Storage Object Creator (roles/storage.objectCreator) role.
        • The storage.objects.list and storage.objects.get permissions in the Storage Object Viewer (roles/storage.objectViewer) role.
      • Ensure you have the following permissions for lineage export: 
        • The datacatalog.entries.exportAll permission in the Data Catalog admin role (roles/datacatalog.admin) role.
          Note Google currently classifies this specific permission as Testing. Permissions at this level may be subject to change or deprecation.
        • Alternatively, grant roles/datacatalog.metadataExporter to your Service Account or the User.
        • The bigquery.jobs.get permission to get SQL transformation code.
          For more information, go to IAM basic and predefined roles reference in the Google Cloud documentation.
    • When you synchronize technical lineage for Google Dataplex, you can add Project IDs that you want to harvest lineage from. If you want to have Project IDs available for selection when you add Project IDs, ensure that the service account has the resourcemanager.projects.get permission to GCP Projects where Dataplex is enabled. If the service account does not have this permission, you can enter the Project IDs manually on the Synchronization configuration page.
  • In your Collibra environment

    Lineage enablement

    • Technical lineage via Edge is enabled in your Collibra environment.
    • You are using Collibra Platform 2024.07 or newer.
    • Your Collibra environment is configured with the latest Collibra UI.

    Edge

    • You either created and installed an Edge site or were granted a Collibra Cloud site.
      Important If you're using a Collibra Cloud site, go the Collibra Cloud site documentation to check if your data source is supported.
    • The Edge site status must be Healthy.
    • You've registered the data source via Edge.
    • You've integrated Dataplex Universal Catalog or registered Google BigQuery databases by using the BigQuery JDBC connector. For details, go to About working with Google Cloud Platform (GCP).

    Network and proxy configuration

    • Edge can connect to all Collibra Data Lineage service instances in your geographic location.
    • Optionally, you've connected to a proxy server.
    • Optionally, use a custom certificate to allow the Edge capability to connect to your data source. In this case, you've saved the certificate as "ca.pem" in the same directory as the Edge site installer. If you've saved the certificate in another directory, use the --ca argument in the Edge site installation command.

    Collibra permissions

    You can connect to Collibra Data Lineage by using the basic or OAuth authentication method. The following permissions are required only if you use the basic authentication method. 

    To connect to Collibra Data Lineage service instances via OAuth authentication:

    To add an Edge capability:

    To synchronize technical lineage: