About working with Google Cloud Platform (GCP)

Important 

Choose an option below to explore the documentation for the latest user interface (UI) or the classic UI.

In Collibra Platform, you can:

  • Register individual Google BigQuery databases via the BigQuery JDBC driver.
  • Integrate a Google Cloud Storage (GCS) file system.
  • Integrate all metadata of the projects from Google Dataplex.
  • Integrate the Entries and Aspects from Google Dataplex.

It's important to understand the difference between these methods because the result in Collibra is different.

Possible way to work with GCP Result in Collibra
Integrating Google Dataplex - Dataplex Catalog ingestion

Google Dataplex is a technical catalog on Google that provides information for all the data in the various Dataplex projects. The Dataplex Catalog integration results in assets that represent Views, Entries, and Aspects in Collibra Platform.

With Dataplex Catalog integration, you can retrieve sample data, and can profile and classify the data. This feature is in preview. To learn how to integrate Google Dataplex Catalog with sampling, profiling, and classification, go to Steps: Integrate Google Dataplex Catalog via Edge.

When you integrate Dataplex Catalog, BigQuery metadata is also integrated. For more information about integrating BigQuery through a Dataplex Catalog integration or the BigQuery JDBC connector, go to Ways to integrate Google BigQuery data sources.

Integrating Google Dataplex - Dataplex ingestion

Google Dataplex is a technical catalog on Google side provides information for all the data in the various Dataplex projects. If you use the Google Dataplex ingestion, we will register and synchronize the GCP Projects, Dataplex Lakes, Dataplex Zones, Tables, and Columns.

Note Google Dataplex ingestion is no longer in active development and will only update for defect fixes. Consider using the Google Dataplex Catalog ingestion via Edge instead.

The integration will create the whole asset structure, representing Dataplex objects such as Project, Lake , Zone, Table, Column, and allows for filtering based on Lakes and Zones.

Integrating a Google Cloud Storage file system

The Google Cloud Storage (GCS) file system integration allows for the registration of Google Cloud Storage (GCS) as a data source in Collibra and the synchronization of the metadata. The GCS integration supports Google Dataplex, a service used for schema discovery. This allows you to integrate the schemas, tables and columns from the files and create a File Group asset in Collibra rather than multiple File assets.

This GCS integration will integrate data from GCS based on the configured crawler and in addition add Tables and Columns recognized by Dataplex, which are related to files and file groups.

Register a Google BigQuery database

If you register a specific Google BigQuery data source via the BigQuery JDBC connector, the resulting assets represent the columns and the tables in the database.
You can retrieve sample data, and can profile and classify the data.

For more information about integrating BigQuery through a Dataplex Catalog integration or the JDBC connector, go to Ways to integrate Google BigQuery data sources.