Ways to integrate Google BigQuery data sources

In Collibra Platform, you can work with BigQuery data sources in the following ways:

  • Integrate all metadata from BigQuery data sources by using Dataplex Catalog integration.
  • Register individual BigQuery data sources by using the BigQuery JDBC connector.

It is important to understand the differences between these methods, as they produce different results in Collibra Platform.

Integrating BigQuery data sources via Dataplex Catalog integration

When you integrate Dataplex Catalog, metadata from BigQuery data sources is also ingested into Collibra Platform for the BigQuery Entry type in Dataplex Catalog. The resulting assets represent BigQuery databases, schemas, tables, and columns, including their associated aspects.

Integrating BigQuery metadata from Dataplex Catalog supports sampling, profiling, and classification (in preview).

For more information, go to Steps: Integrate Google Dataplex Catalog via Edge.

Registering a BigQuery data source via the BigQuery JDBC connector

If you register a specific BigQuery data source by using the BigQuery JDBC connector, the resulting assets represent the columns and the tables in the database.

Registering BigQuery databases also supports sampling, profiling, and classification.

Note When you register a BigQuery data source via the JDBC connector, BigQuery Aspects are not ingested. To ingest Aspects, use a Dataplex Catalog integration.

For more information, go to About data source registration via Edge.

Combining the ways of working with BigQuery

You can integrate BigQuery by using both the Dataplex Catalog integration and the JDBC connector. Using both methods together allows you to display the desired information in the Collibra Platform.

If you first register a BigQuery data source using the BigQuery JDBC connector, make sure to use the same System asset when integrating Dataplex Catalog. This ensures that the Dataplex Catalog integration skips any assets already registered via JDBC connection.

Migrating to use the Dataplex Catalog integration only

If you have previously used both the Dataplex Catalog integration and the BigQuery JDBC synchronization for some data sources, and now want to use only the Dataplex Catalog integration, complete the following steps:

  1. Edit your the Google Dataplex Catalog synchronization capability by adding the BigQuery JDBC connection:
    1. Open a site.
      1. On the main toolbar, click Products iconCogwheel icon Settings.
        The Settings page opens.
      2. In the tab pane, click Edge.
        The Sites tab opens and shows a table with an overview of your sites.
      3. In the table, click the name of the site whose status is Healthy.
        The site page opens.
    2. In the Capabilities section, click the name of your Google Dataplex Catalog synchronization capability.
    3. Click Edit.
    4. Add your BigQuery JDBC connection in the JDBC GCP Connection (in preview) field.
    5. Click Save.
  2. Synchronize the Dataplex Catalog integration again.
    You can now set up sampling, profiling, and classification to profile and classify the data, and request sample data for the integrated assets.

For more information about integrating Dataplex Catalog and setting up sampling, profiling, and classification, go to Steps: Integrate Google Dataplex Catalog via Edge.