Connecting to Google Cloud Storage (GCS)

This section contains an overview of Google Cloud Storage (GCS).

General information

FieldDescription
Data sourceGoogle Cloud Storage (GCS)
Supported versionsN/A
Connection stringgcs://
Packaged?

Yes

Certified?

Yes

Supported features
Analyze data

Yes

Archive breaking records

Yes

Estimate job

Yes

Pushdown

No

Processing capabilities
Spark agent

Yes

Yarn agent

Yes

Minimum user permissions

In order for Collibra DQ to access your Cloud Storage bucket, you need the following permissions.

  • Viewer permissions on your Cloud Storage bucket.
  • When using the Archive Break Records feature, you need Editor and Viewer permissions on the Cloud Storage bucket location where break records will send.

Recommended and required connection properties

RequiredConnection PropertyTypeValue

Yes

NameTextThe unique name of your connection. Do not use spaces in your connection name and only use valid characters.

Yes

Connection URLString

The connection string path of your GCS connection. The path must start with gcs:// and point to the root bucket, not a sub-folder.

Example gcs://<bucket-name>

You can optionally add a key after the bucket name.

Example gcs://<bucket-name>/key

Yes

Target AgentOptionThe Agent used to submit your DQ Jobs.

Yes

Auth TypeOption

The method to authenticate your connection.

Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types.

Yes

Save CredentialsOptionSelect this option after you enter your connection details.

No

Driver PropertiesString

The configurable driver properties for your connection. Multiple properties must be comma delimited. For example, abc=123,test=true

Authentication

FieldDescription
GCSSelect this option to upload a JSON file locally that contains your GCS service account access credentials.
Authorization (JSON)

The JSON file that contains service account access credentials. Upload a JSON file locally that contains the following information:

Copy
{
  "type": "service_account",
  "project_id": "owl-hadoop-cdh",
  "private_key_id": "encodedstring",
  "private_key": "-----BEGIN PRIVATE KEY-----\encodedstring\n-----END PRIVATE KEY-----\n",
  "client_email": "<your-service-account>@developer.gserviceaccount.com",
  "client_id": "your-service-account-id",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/<your-service-account>@developer.gserviceaccount.com"
}

Tip This JSON file is typically available to download in GCP when you create a service account in your IAM profile.