Integrated Google Cloud Storage data

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

After the synchronization, the resulting assets are in the domain that was specified in the crawler. By default, the assets get the Implemented status.

Warning Do not move the assets to another domain. Doing so may lead to errors during future synchronizations.

Tip GCS synchronization relies on UUIDs.

Note In case of a partial synchronization caused by a temporary communication issue, the status of the assets that cannot be synchronized is set to Missing from source. Their previous status is restored, if they are found in the source system during the next fully successful synchronization.

By default, the assets are shown in a plain list, but you can enable a multi-path hierarchy to show it in a tree structure. The resulting assets depend on whether you use Google Dataplex.

Synchronization results without Google Dataplex

For the best result, we recommend that you use the following relations:

  1. File Storage contains Storage Container
  2. Storage Container contains Storage Container
  3. Storage Container contains File
  4. Directory contains Directory

The following images shows the resulting hierarchical table.

Synchronization results with Google Dataplex

For the best result, we recommend that you use the following relations:

  1. File Storage contains Storage Container
  2. Storage Container contains Storage Container
  3. Storage Container contains File
  4. Directory contains Directory
  5. Directory contains File Group
  6. File Group contains Table
  7. Table contains Column


The synchronization creates a Directory asset named /. This is needed to ensure the asset can contain the File Group assets.

Synchronized metadata per asset type

This table shows the metadata for each GCS asset type.

Asset type

Synchronized metadata

Resource ID
GCS Bucket File Storage contains/ is part of Storage Container 00000000-0000-0000-0001-002600000000
Location 00000000-0000-0000-0000-000000000203
Directory

URL

00000000-0000-0000-0000-000000000258
Storage Container contains/ is part of Storage Container 00000000-0000-0000-0001-002600000001
Directory contains/ is part of Directory 00000000-0000-0000-0001-002600000003
File Group File Type 00000000-0000-0000-0001-002500000012
Directory contains/ is part of File Group 00000000-0000-0000-0001-002600000004
File URL 00000000-0000-0000-0000-000000000258
Storage Container contains/ contained in File 00000000-0000-0000-0000-000000007060
Table

Description

00000000-0000-0000-0000-000000003114
File Group contains/ is part of Table 00000000-0000-0000-0001-002600000005
Column

Description

00000000-0000-0000-0000-000000003114
Technical Data Type 00000000-0000-0000-0000-000000000219
Column Position 00000000-0000-0000-0001-000500000020
Is Nullable 00000000-0000-0000-0001-000500000011
Column is part of/ contains Table 00000000-0000-0000-0000-000000007042