Steps: Integrate Google Dataplex Catalog via Edge
Use the following steps to integrate Google Dataplex Catalog. You can choose to set up sampling, profiling, and classification as needed. This feature is in preview.
If you previously used both the Google Dataplex Catalog integration to integrate BigQuery projects and the BigQuery JDBC synchronization, and you want to use only the Dataplex Catalog integration, complete the steps in Migrating to use the Dataplex Catalog integration only.
# | Step | Description |
---|---|---|
1 | Create the required connections | |
1a
|
Create a Google Cloud Platform connection. | Creates a connection to the Google Cloud Platform (GCP) in an Edge or Collibra Cloud site. |
1b
|
Creates a JDBC connection to BigQuery in an Edge or Collibra Cloud site. Create a BigQuery JDBC connection only if you want to profile and classify the integrated data. If you created a BigQuery JDBC connection, you can use that JDBC connection. |
|
2 |
Add the Google Dataplex Catalog synchronization capability to the Edge or Collibra Cloud site. |
Adds the Google Dataplex Catalog synchronization capability to the GCP Edge connection. The capability allows to retrieve data from the Google Dataplex Catalog projects. If you want to profile and classify the integrated data, and request sample data, select the BigQuery JDBC connection on the Google Dataplex Catalog synchronization capability. |
3 | Synchronize Dataplex Catalog. |
You can manually synchronize Dataplex Catalog or you can add a synchronization schedule to automatically synchronize it. If you selected a JDBC connection in the previous step, the synchronization process automatically creates the Catalog JDBC ingestion, JDBC profiling, and Catalog Data Classification capabilities if they do not already exist. When the synchronization is completed, assets are available, and the Profiling tab is available on the Database asset page. |
4 | Optionally, set up and configure data profiling | Goes through the required permission and steps to prepare Edge and Collibra to profile columns in Dataplex Catalog. |
5 | Optionally, enable and set up Unified Data Classification | Goes through the required permission and steps to prepare Edge and Collibra to classify columns in Dataplex Catalog via the Unified Data Classification method. |
6 | Optionally, set up and configure the use of sample data | Goes through the required permissions and steps to prepare Edge and Collibra to show sample data for columns in Dataplex Catalog. |
Result |
Users with the correct permissions can now configure the profiling options and profile the data, automatically classify the data, or request sample data. |
Integration workflow
The following graphic shows the process of integrating Dataplex Catalog, profiling and classifying the data, and requesting sample data (in preview).