Ways to integrate Google BigQuery data sources

In Collibra Platform, you can work with Google BigQuery data sources in the following ways:

It is important to understand the differences between these methods, as they produce different results in Collibra Platform.

Integrating BigQuery data sources via Knowledge Catalog integration

When you integrate Knowledge Catalog, metadata from BigQuery data sources is also ingested into Collibra Platform for the BigQuery Entry type in Knowledge Catalog. The assets represent BigQuery databases, schemas, tables, and columns, and their associated aspects.

Integrating BigQuery metadata from Knowledge Catalog supports sampling, profiling, and classification (in preview).

For more information, go to Steps: Integrate Google Knowledge Catalog via Edge.

Registering a BigQuery data source via the BigQuery JDBC connector

If you register a specific BigQuery data source by using the BigQuery JDBC connector, the assets represent the schemas, tables, and columns in the database. Registering BigQuery databases also supports sampling, profiling, and classification.

Note When you register a BigQuery data source via the JDBC connector, BigQuery Aspects are not ingested. To ingest Aspects, use a Knowledge Catalog integration.

For more information, go to Steps overview: Data source registration via Edge.

Combining the ways of working with BigQuery

You can integrate BigQuery by using both the Knowledge Catalog integration and JDBC connector. Using the two methods together allows you to display the desired information in Collibra Platform.

If you first register a BigQuery data source using the BigQuery JDBC connector, make sure to use the same System asset when integrating Knowledge Catalog. This ensures that the Knowledge Catalog integration skips any assets already registered via the JDBC connection.

Since the Knowledge Catalog integration supports sampling, profiling, and classification (in preview), you can choose to migrate to the Knowledge Catalog integration only, if you previously combined it with BigQuery JDBC synchronization.

Migrating to use the Google Knowledge Catalog integration only

If you have previously used both the Knowledge Catalog integration and BigQuery JDBC synchronization for some data sources, and now want to use only the Knowledge Catalog integration, complete the following steps:

  1. Edit your Knowledge Catalog capability by adding the BigQuery JDBC connection:
    1. Open a site.
      1. On the main toolbar, click Products iconCogwheel icon Settings.
        The Settings page opens.
      2. In the tab pane, click Edge.
        The Sites tab opens and shows a table with an overview of your sites.
      3. In the table, click the name of the site whose status is Healthy.
        The site page opens.
    2. In the Capabilities section, click the name of your Knowledge Catalog capability.
    3. Click Edit.
    4. Add your BigQuery JDBC connection in the JDBC GCP Connection (in preview) field.
    5. Click Save.
  2. Synchronize the Knowledge Catalog integration again.
    You can now set up sampling, profiling, and classification to profile and classify the data, and request sample data for the integrated assets.

For more information about integrating Knowledge Catalog and setting up sampling, profiling, and classification, go to Steps: Integrate Google Knowledge Catalog via Edge.