Ways to integrate Google BigQuery data sources
In Collibra Platform, you can work with Google BigQuery data sources in the following ways:
- Use the Google Knowledge Catalog (formerly Dataplex Universal Catalog) integration to integrate all metadata from BigQuery data sources. In addition, you can:
- Ingest BigQuery aspects, which is not supported via the BigQuery JDBC connector.
- Configure outbound synchronization to push enriched metadata from Collibra to aspects in Knowledge Catalog. This feature is in preview.
- Use the BigQuery JDBC connector to register individual BigQuery data sources.
It is important to understand the differences between these methods, as they produce different results in Collibra Platform.
Integrating BigQuery data sources via Knowledge Catalog integration
When you integrate Knowledge Catalog, metadata from BigQuery data sources is also ingested into Collibra Platform for the BigQuery Entry type in Knowledge Catalog. The assets represent BigQuery databases, schemas, tables, and columns, and their associated aspects.
Integrating BigQuery metadata from Knowledge Catalog supports sampling, profiling, and classification (in preview).
For more information, go to Steps: Integrate Google Knowledge Catalog via Edge.
Registering a BigQuery data source via the BigQuery JDBC connector
If you register a specific BigQuery data source by using the BigQuery JDBC connector, the assets represent the schemas, tables, and columns in the database. Registering BigQuery databases also supports sampling, profiling, and classification.
For more information, go to Steps overview: Data source registration via Edge.
Combining the ways of working with BigQuery
You can integrate BigQuery by using both the Knowledge Catalog integration and JDBC connector. Using the two methods together allows you to display the desired information in Collibra Platform.
If you first register a BigQuery data source using the BigQuery JDBC connector, make sure to use the same System asset when integrating Knowledge Catalog. This ensures that the Knowledge Catalog integration skips any assets already registered via the JDBC connection.
Since the Knowledge Catalog integration supports sampling, profiling, and classification (in preview), you can choose to migrate to the Knowledge Catalog integration only, if you previously combined it with BigQuery JDBC synchronization.
Migrating to use the Google Knowledge Catalog integration only
If you have previously used both the Knowledge Catalog integration and BigQuery JDBC synchronization for some data sources, and now want to use only the Knowledge Catalog integration, complete the following steps:
- Edit your Knowledge Catalog capability by adding the BigQuery JDBC connection:
- Open a site.
-
On the main toolbar, click
→
Settings.
The Settings page opens. -
In the tab pane, click Edge.
The Sites tab opens and shows a table with an overview of your sites. - In the table, click the name of the site whose status is Healthy.
The site page opens.
-
On the main toolbar, click
- In the Capabilities section, click the name of your Knowledge Catalog capability.
- Click Edit.
- Add your BigQuery JDBC connection in the JDBC GCP Connection (in preview) field.
- Click Save.
- Open a site.
- Synchronize the Knowledge Catalog integration again.
You can now set up sampling, profiling, and classification to profile and classify the data, and request sample data for the integrated assets.
For more information about integrating Knowledge Catalog and setting up sampling, profiling, and classification, go to Steps: Integrate Google Knowledge Catalog via Edge.