Configuring Workload Identity for Google BigQuery

Workload Identity

Workload Identity is the recommended method for authenticating BigQuery connections GKE-based deployments of Collibra DQ because it provides improved security and manageability of your service account credentials JSON file. To use Workload Identity to authenticate, ensure that you fulfill the requirements and select Workload Identity from the Authentication Type dropdown menu when you set up your Google BigQuery connection.

Workload Identity permissions

To use Workload Identity to authenticate your connection between Collibra DQ and Google BigQuery, ensure that you satisfy the following requirements:

Note These roles and permissions are only required when you use the Workload Identity option to authenticate your connection.

Prerequisites

Important Workload Identity is a supported authentication type for Collibra DQ deployments on GCP with either GKE or Google Compute Engine (GCE). If you use GKE, ensure you have Workload Identity Federation enabled against the GKE cluster and have BigQuery access enabled. If you use GCE, ensure the attached Service Account has BigQuery access enabled.

1. Authenticate to API

  1. Use the following POST call to authenticate, replacing the placeholder variables in the ${} with your actual values.
  2. Copy
    curl --location 'https://${dq-server-url.example.com}/v3/auth/signin' \
    --header 'Content-Type: application/json' \
    --header 'Accept: */*' \
    --data '{
      "username": "${ExampleUsername}",
      "password": "${ExamplePassword123}",
      "iss": "${ExampleTenantName}"
    }'
  3. Copy the token from the response.
  4. Copy
    "username": "ExampleUser",
    "token": "${token}"

2. Create a connection

  1. Copy the Agent UUID and ID number from the Collibra DQ UI, replacing the placeholder variables in the ${} with your actual values.
  2. Note If you are using mutli-tenancy, the token for the multi-tenant admin user is required.

    Copy
    curl --location 'https://${dq-server-url.example.com}/v2/getagents' \
    --header 'Accept: application/json' \
    --header 'Authorization: Bearer ${token}'
  3. Use the following POST call to create a connection, replacing the placeholder variables in the ${} with your actual values.
  4. Copy
    curl --location
    'https://${dq-server-url.example.com}/v2/addconnection?Alias=${BigQuery_WI_Connection}&Host=jdbc%3Abigquery%3A%2F%2Fhttps%3A%2F%2Fwww.googleapis.com%2Fbigquery%2Fv2%3A443%3BProjectId%${project_id}%3BTimeOut%3D3600&Port=443&driver=com.simba.googlebigquery.jdbc42.Driver&username=x&password=x&driverlocation=%2Fopt%2Fowl%2Fdrivers%2Fbigquery&driverprops=&isHive=0&usepwdmgr=0&iskerb=0&keytab=&principal=&isglobal=1&conntype=jdbc&authtype=workload-identity&isPushdown=0&dbBrandName=BIGQUERY&agentId=${ID}&agentUUID=${agent-UUID}&archiveBreaks=false' \
    --header 'Accept: application/json' \
    --header 'Authorization: Bearer ${token}' \
    --form
    'agentIds="[{\"id\":${ID},\"uuid\":\"${agent-UUID}\"}]"'

3. Verify the connection

  1. Use the following GET call to verify the connection was successfully created, replacing the placeholder variables in the ${} with your actual values.
  2. Copy
    curl --location
    'https://${dq-server-url.example.com/v2/getconnectionsByDbBrand?dbBrandName=BIGQUERY' \
    --header 'Accept: application/json' \
    --header 'Authorization: Bearer ${token}'

4. Use Workload Identity to authenticate your connection

  1. From the Connections page in the Admin Console, add a BigQuery connection and, under Authentication Type, click the dropdown menu.
  2. Select Workload Identity from the list.
  3. Finish setting up your BigQuery connection and click Submit to save it.