Steps overview: Google Dataplex Catalog ingestion
The steps differ depending on whether you want to be able to profile and classify the column data after the Google Dataplex Catalog integration.
- Steps to integrate metadata, profile and classify data (in preview)
- Steps to integrate only metadata
# | Step | Description |
---|---|---|
1 | Create a GCP connection. | Create a connection to the Google Cloud Platform (GCP) in an Edge or Collibra Cloud site. |
2 |
Create a JDBC connection for Google BigQuery. If you created a JDBC connection for BigQuery JDBC registration previously, you can use that JDBC connection. |
Creates a JDBC connection to Dataplex Catalog in an Edge or Collibra Cloud site, which will be used during profiling and classification. |
3 |
Add the Google Dataplex Catalog synchronization capability. | Add the Google Dataplex Catalog synchronization capability to the GCP Edge connection. The capability allows to retrieve data from the Google Dataplex Catalog projects. |
4 |
Add the JDBC Catalog Ingestion capability. | Adds the JDBC Catalog Ingestion capability to the JDBC Dataplex Catalog connection. The capability will allow to retrieve the available databases and schemas in Dataplex Catalog during profiling and classification. |
5 | Synchronize Dataplex Catalog. |
You can manually synchronize Dataplex Catalog or you can add a synchronization schedule to automatically synchronize it. Note Don't use the Metadata Synchronization tab in Database assets to resynchronize the assets added via the Dataplex Catalog ingestion. The tab will be removed in a future release. Always resynchronize the data through the capability. |
6 | Set up and profile the data. |
Complete the following steps to profile the data. Before you start, ensure that you enabled profiling for Edge. |
6.a
|
Add the JDBC profiling capability for the JDBC connection. | |
6.b
|
Synchronize Dataplex Catalog again. | |
6.c
|
Configure the profiling options for the synchronized schemas. | |
6.d
|
||
7 |
Set up and classify the data. |
Complete the following steps to classify the data. |
7.a
|
Enable and set up Unified Data Classification. | |
7.b
|
Start the data classification. |
# | Step | Description |
---|---|---|
1 | Create a GCP connection to your Edge or Collibra Cloud site. | Create a connection to the Google Cloud Platform (GCP) in an Edge or Collibra Cloud site. |
2 |
Add the Google Dataplex Catalog synchronization capability to your Edge or Collibra Cloud site. | Add the Google Dataplex Catalog synchronization capability to the GCP Edge connection. The capability allows to retrieve data from the Google Dataplex Catalog projects. |
3 | Synchronize via Google Dataplex Catalog ingestion. |
You can manually synchronize Google Dataplex or you can add a synchronization schedule to automatically synchronize it. |