Dataplex lineage integration preflight checks
To ensure successful metadata ingestion and lineage generation, complete the following preflight checks.
In your Dataplex environment
Be sure to review the supported transformation details to determine whether to create technical lineage for Google Dataplex or Google BigQuery. You can also use this information to understand the lineage data that Collibra Data Lineage ingests from Google Dataplex.
- To create table-level lineage:
- Enable the Data Lineage API in Dataplex for the projects that you want to harvest lineage from.
For more information, go to Data Lineage API in Google Cloud documentation. - The Data Lineage Viewer role.
- The BigQuery Admin role if you want Collibra Data Lineage to collect lineage from stored procedures that you created and those created by other Dataplex users.
- The
bigquery.jobs.getpermission.For more information, go to IAM basic and predefined roles reference in the Google Cloud documentation.
- Enable the Data Lineage API in Dataplex for the projects that you want to harvest lineage from.
- To create column-level lineage:
- Enable the
exportMetadataAPI, which is in Preview. To enable the API , contact Google by following the steps in the article How to Request Access to Google Dataplex Column Level Lineage (preview) export API. - Create a GCS bucket to store the exported lineage metadata. Ensure you have the following permissions on the GCS bucket:
- The
storage.objects.createpermission in the Storage Object Creator (roles/storage.objectCreator) role. - The
storage.objects.listandstorage.objects.getpermissions in the Storage Object Viewer (roles/storage.objectViewer) role.
- The
- Ensure you have the following permissions for lineage export:
- The
datacatalog.entries.exportAllpermission in the Data Catalog admin role (roles/datacatalog.admin) role.Note Google currently classifies this specific permission as Testing. Permissions at this level may be subject to change or deprecation. - Alternatively, grant
roles/datacatalog.metadataExporterto your Service Account or the User. - The
bigquery.jobs.getpermission to get SQL transformation code.For more information, go to IAM basic and predefined roles reference in the Google Cloud documentation.
- The
- Enable the
- When you synchronize technical lineage for Google Dataplex, you can add Project IDs that you want to harvest lineage from. If you want to have Project IDs available for selection when you add Project IDs, ensure that the service account has the
resourcemanager.projects.getpermission to GCP Projects where Dataplex is enabled. If the service account does not have this permission, you can enter the Project IDs manually on the Synchronization configuration page.
In your CPSH environment
Lineage enablement
- Technical lineage via Edge is enabled in your Collibra environment.
- You are using Collibra Platform Self-Hosted 2024.07 or newer.
- Your CPSH environment is configured with the latest Collibra UI.
Edge
-
You created and installed an Edge site.Important If you're using a Collibra Cloud site, go the Collibra Cloud site documentation to check if your data source is supported.
- The Edge site status must be Healthy.
- You've registered the data source via Edge.
- You've integrated Dataplex Universal Catalog or registered Google BigQuery databases by using the BigQuery JDBC connector. For details, go to About working with Google Cloud Platform (GCP).
Network and proxy configuration
- Edge can connect to all Collibra Data Lineage service instances in your geographic location.
- Optionally, you've connected to a proxy server.
- Optionally, use a custom certificate to allow the Edge capability to connect to your data source. In this case, you've saved the certificate as "ca.pem" in the same directory as the Edge site installer. If you've saved the certificate in another directory, use the
--caargument in the Edge site installation command.
CPSH permissions
You can connect to Collibra Data Lineage by using the basic or OAuth authentication method. The following permissions are required only if you use the basic authentication method.
- A global role with the following global permissions:
- Data Stewardship Manager
- Manage all resources
- System administration
- Technical lineage
- A resource role with the following resource permissions on the community level in which you created the domain:
- Asset > Add
- Attribute > Add
- Domain > Add
- Attachment > Add
To connect to Collibra Data Lineage service instances via OAuth authentication:
- You have a global role with the Product Rights > System administration global permission.
- You have a global role that has the Manage Edge sites global permission.
- You have a global role that has the Manage connections and capabilities global permission.
To add an Edge capability:
- You have a global role with the Product Rights > System administration global permission.
- You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.
To synchronize technical lineage:
- A global role that has the following global permission:
- Catalog, for example Catalog Author
- View Edge connections and capabilities
- A resource role with Configure external system resource permission, for example Owner.
- Data source-specific permissions.