Configuring Workload Identity for Google BigQuery

Workload Identity

Workload Identity is the recommended method for authenticating BigQuery connections GKE-based deployments of Collibra DQ because it provides improved security and manageability of your service account credentials JSON file. To use Workload Identity to authenticate, ensure that you fulfill the requirements and select Workload Identity from the Authentication Type dropdown menu when you set up your Google BigQuery connection.

Workload Identity permissions

To use Workload Identity to authenticate your connection between Collibra DQ and Google BigQuery, ensure that you satisfy the following requirements:

  • Enable the GKE API.
  • Enable the IAM Service Account Credentials API.
  • You have the IAM role roles/container.admin.
  • You have the IAM role roles/iam.serviceAccountAdmin.
  • You have Workload Identity enabled and meet all additional service account requirements.
  • You have ROLE_ADMIN assigned to your user in Collibra DQ.

Note These roles and permissions are only required when you use the Workload Identity option to authenticate your connection.

Prerequisites

  • Assign your service account to your Collibra DQ EKS Cluster or VM.
  • Add your service account to the GCP project where Google BigQuery resources reside.
  • Grant service account role(s) for your desired Google BigQuery access.

Important Workload Identity is a supported authentication type for Collibra DQ deployments on GCP with either GKE or Google Compute Engine (GCE). If you use GKE, ensure you have Workload Identity Federation enabled against the GKE cluster and have BigQuery access enabled. If you use GCE, ensure the attached Service Account has BigQuery access enabled.

1. Authenticate to API

  1. Use the following POST call to authenticate, replacing the placeholder variables in the ${} with your actual values.
  2. Copy
    curl --location 'https://${dq-server-url.example.com}/v3/auth/signin' \
    --header 'Content-Type: application/json' \
    --header 'Accept: */*' \
    --data '{
      "username": "${ExampleUsername}",
      "password": "${ExamplePassword123}",
      "iss": "${ExampleTenantName}"
    }'
  3. Copy the token from the response.
  4. Copy
    "username": "ExampleUser",
    "token": "${token}"

2. Create a connection

  1. Copy the Agent UUID and ID number from the Collibra DQ UI, replacing the placeholder variables in the ${} with your actual values.
  2. Note If you are using mutli-tenancy, the token for the multi-tenant admin user is required.

    Copy
    curl --location 'https://${dq-server-url.example.com}/v2/getagents' \
    --header 'Accept: application/json' \
    --header 'Authorization: Bearer ${token}'
  3. Use the following POST call to create a connection, replacing the placeholder variables in the ${} with your actual values.
  4. Copy
    curl --location
    'https://${dq-server-url.example.com}/v2/addconnection?Alias=${BigQuery_WI_Connection}&Host=jdbc%3Abigquery%3A%2F%2Fhttps%3A%2F%2Fwww.googleapis.com%2Fbigquery%2Fv2%3A443%3BProjectId%${project_id}%3BTimeOut%3D3600&Port=443&driver=com.simba.googlebigquery.jdbc42.Driver&username=x&password=x&driverlocation=%2Fopt%2Fowl%2Fdrivers%2Fbigquery&driverprops=&isHive=0&usepwdmgr=0&iskerb=0&keytab=&principal=&isglobal=1&conntype=jdbc&authtype=workload-identity&isPushdown=0&dbBrandName=BIGQUERY&agentId=${ID}&agentUUID=${agent-UUID}&archiveBreaks=false' \
    --header 'Accept: application/json' \
    --header 'Authorization: Bearer ${token}' \
    --form
    'agentIds="[{\"id\":${ID},\"uuid\":\"${agent-UUID}\"}]"'

3. Verify the connection

  1. Use the following GET call to verify the connection was successfully created, replacing the placeholder variables in the ${} with your actual values.
  2. Copy
    curl --location
    'https://${dq-server-url.example.com/v2/getconnectionsByDbBrand?dbBrandName=BIGQUERY' \
    --header 'Accept: application/json' \
    --header 'Authorization: Bearer ${token}'

4. Use Workload Identity to authenticate your connection

  1. From the Connections page in the Admin Console, add a BigQuery connection and, under Authentication Type, click the dropdown menu.
  2. Select Workload Identity from the list.
  3. Finish setting up your BigQuery connection and click Submit to save it.