About the Google Cloud Storage file system integration via Edge

Important 

In Collibra 2024.05, we've launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

The Google Cloud Storage file system integration allows for the registration of Google Cloud Storage (GCS) as a data source in Collibra and the synchronization of the metadata. GCS is a service provided in the Google Cloud Platform (GCP).
After synchronization, the files and directories of the GCS file system are represented in Collibra by specific asset types, retaining the original names.

Important 
  • You cannot profile and classify the integrated columns and tables.
  • You can only integrate a Google Cloud Storage file system via Edge, not via Jobserver.

For more information about these Google products, go to the Google Cloud Storage documentation and Google Dataplex documentation.

About Google Dataplex

The GCS integration supports Google Dataplex, a service used for schema discovery. This allows you to integrate the schemas, tables and columns from the files and create a File Group asset in Collibra rather than multiple File assets.

Important 
  • The Dataplex zone in which the GCS buckets are registered must be in the same project as the GCP service account.
  • For integrations of Dataplex with multi-region or dual-region GCS buckets, we query all Dataplex lakes and zones that are located in the regions of the buckets and in which a Dataplex service is available. The composition of multi-regions and dual-regions, as well as the availability of a Dataplex service are hard-coded. If new regions are added or if a Dataplex service is made available in new regions, Dataplex information from these regions will not be registered until a new version of the GCS integration feature is released.

For information on how to add a GCS asset to a Dataplex Zone that can then be discovered by our GCS integration, go to the Google Dataplex documentation. For information on the supported data types, go to the data types Google documentation.

Note When you add a bucket to Dataplex and Dataplex identifies schemas (tables and columns) for files in the bucket, these tables and columns are also added automatically to BigQuery by Dataplex.