Steps overview: Integrate a Google Cloud Storage file system via Edge

You can configure Collibra to register and synchronize a Google Cloud Storage (GCS) file system via Edge.

Tip If you are using schemas with table files that you want to integrate as File Group assets with tables and columns instead of File assets, you can use Google Dataplex. The Dataplex zone in which the GCS buckets are registered must be in the same project as the GCP service account. For information on how to add a GCS asset to a Dataplex Zone that can then be discovered by the our GCS integration, go to the Google Dataplex documentation.

# Step Description

1

Enable the Google Cloud Storage file system registration and synchronization via Edgeand give the Edge Site user the required permissions.

Define that you want to integrate GCS via Edge.

Note 

If you have defined an outbound (forward) proxy on your Edge site, the integration will take that configuration into account when connecting to the data source. The following proxies are supported for GCS:

  • Path through (No authentication)
  • Path through (Basic authentication)
  • MITM (No authentication)
  • MITM (Basic authentication)
  • No proxy for noProxy hosts defined by Edge
2 Create a GCP connection to your Edge site. Create a connection to the Google Cloud Platform (GCP) in an Edge site.

3

Add a GCS synchronization capability to your Edge site. Add the GCS synchronization capability to the GCP Edge connection. The capability allows to retrieve data from the GCS file system.
4 Register a GCS file system. Create the initial structure of a Storage Catalog domain and GCS File System asset in the selected parent community.
5 Connect the GCS file system asset to the Edge capability. Link the registered GCS file system to the Edge capability.
6 Create crawlers. Create crawlers to define the folders that you want to synchronize.
7 Synchronize GCS.

You can manually synchronize GCS or you can add a synchronization schedule to automatically synchronize it.