Google BigQuery integration preflight checks
To ensure successful metadata ingestion and lineage generation, complete the following preflight checks.
In your Google BigQuery environment
- bigquery.datasets.get
- bigquery.tables.get
- bigquery.tables.list
- bigquery.jobs.create
- bigquery.routines.get
- bigquery.routines.list
- resourcemanager.projects.get
- bigquery.readsessions.create
- bigquery.readsessions.getData
In your CPSH environment
For best technical lineage results, use the JDBC connection to ingest JDBC sources when possible. Shared Storage connections are not supported for Collibra Cloud sites.
Lineage enablement
- Technical lineage via Edge is enabled in your Collibra environment.
Edge
-
You created and installed an Edge site.Important If you're using a Collibra Cloud site, go the Collibra Cloud site documentation to check if your data source is supported.
- The Edge site status must be Healthy.
- You've registered the data source via Edge.
Network and proxy configuration
- Edge can connect to all Collibra Data Lineage service instances in your geographic location.
- Optionally, you've connected to a proxy server.
- Optionally, use a custom certificate to allow the Edge capability to connect to your data source. In this case, you've saved the certificate as "ca.pem" in the same directory as the Edge site installer. If you've saved the certificate in another directory, use the
--caargument in the Edge site installation command.
CPSH permissions
You can connect to Collibra Data Lineage by using the basic or OAuth authentication method. The following permissions are required only if you use the basic authentication method.
- A global role with the following global permissions:
- Data Stewardship Manager
- Manage all resources
- System administration
- Technical lineage
- A resource role with the following resource permissions on the community level in which you created the domain:
- Asset > Add
- Attribute > Add
- Domain > Add
- Attachment > Add
To create a JDBC connection:
- You have a global role with the Product Rights > System administration global permission.
- You have a global role that has the Manage connections and capabilities global permission.
- You created and installed an Edge site.
- You have added a vault to your Edge site.
- If your data source connection requires a file from your vault, the file must be encoded into Base64 and stored as a regular secret in your vault.
To create a Shared Storage connection:
- Consider the retention limits of the Shared Storage connection. For more information, go to Shared Storage and Cloud Storage connections.
- You have a global role that has the Manage Edge sites global permission.
- You have a global role that has the Manage connections and capabilities global permission.
To connect to Collibra Data Lineage service instances via OAuth authentication:
- You have a global role with the Product Rights > System administration global permission.
- You have a global role that has the Manage Edge sites global permission.
- You have a global role that has the Manage connections and capabilities global permission.
To add an Edge capability:
- You have a global role with the Product Rights > System administration global permission.
- You have a global role that has the Manage connections and capabilities global permission.
To synchronize technical lineage:
- A global role that has the following global permission:
- Catalog, for example Catalog Author
- View Edge connections and capabilities
- A resource role with Configure external system resource permission, for example Owner.
- Data source-specific permissions.
Cloud Storage connection
The following requirements apply only if you will store your SQL files in a cloud-based storage system. In that case, you need to create a Cloud Storage connection to your Edge site.
In your CPSH environment
- You have a global role with the Product Rights > System administration global permission.
- You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.
In your Azure environment
- To integrate ADLS folders, you need an Azure Service Principal user that is defined in Azure and that has permissions to list the files which need to be integrated into Collibra. The Azure Service Principal user must have the "Reader" and "Storage Blob Data Reader" roles for the storage locations of your data. For information, go to the Azure documentation.
- If you use Microsoft Purview:
- The Azure Service Principal user must have the "Data reader" role to fetch entities/assets from the Microsoft Purview Rest API. For information, go to the Microsoft Purview documentation.
- If your ADLS storage is private, ensure that the Allow Azure services on the trusted services list to access this storage account checkbox in the Networking → Firewalls and virtual networks is selected.
In your CPSH environment
- You created and installed an Edge site. If you have defined an outbound (forward) proxy on your Edge site, the integration considers that configuration when connecting to the data source.
- You have added a vault to your Edge site.
- If your data source connection requires a file from your vault, the file must be encoded into Base64 and stored as a regular secret in your vault.
- If you have configured a forward proxy for your Edge site and want the integration API calls to bypass this proxy, update the Edge nonProxy property:
- Adding
login.microsoftonline.comallows the API calls that get access tokens to bypass the proxy. If you are using a government cloud host, addlogin.microsoftonline.usinstead. - Adding
dfs.core.windows.netorblob.core.windows.netallows the ADLS API calls to bypass the proxy. - Adding
purview.azure.comallows the Purview APIs to bypass the proxy.
- Adding
- You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.
In your Google Cloud environment
You need a Google Cloud Platform Service Account that can read the Google Cloud Storage (GCS) file system that you want to integrate. This means that the Service Account must have the permissions to list buckets (storage.buckets.list) and objects in a bucket (storage.objects.list). For information on GCP, go to the Google documentation.
If you use Dataplex, the Service Account must be able to detect file schemas in GCS resources from Dataplex. This means that the Service Account must have the following permissions dataplex.*.get and dataplex.*.list, for example, via the Dataplex Viewer role. For information on GCP service account, go to the Google documentation, and for information on Dataplex roles, go to the Google documentation.
In your CPSH environment
- Pass through (No authentication)
- Pass through (Basic authentication)
- MITM (No authentication)
- MITM (Basic authentication)
- No proxy for noProxy hosts defined by Edge
- You have added a vault to your Edge site.
- If your data source connection requires a file from your vault, the file must be encoded into Base64 and stored as a regular secret in your vault.
- You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.
If you have defined an outbound (forward) proxy on your Edge site, the integration considers that configuration when connecting to the data source. The following proxies are supported for GCS: