Steps overview: Google Dataplex Catalog ingestion
The steps differ depending on whether you want to be able to profile and classify the column data after the Google Dataplex Catalog integration.
- Steps to integrate metadata, profile and classify data (in preview)
- Steps to integrate only metadata
# | Step | Description |
---|---|---|
1 | Create a GCP connection. | Create a connection to the Google Cloud Platform (GCP) in an Edge or Collibra Cloud site. |
2 |
Create a JDBC connection for Google BigQuery. If you created a JDBC connection for BigQuery JDBC registration previously, you can use that JDBC connection. |
Creates a JDBC connection to Dataplex Catalog in an Edge or Collibra Cloud site, which will be used during profiling and classification. |
3 |
Add the Google Dataplex Catalog synchronization capability. | Add the Google Dataplex Catalog synchronization capability to the GCP Edge connection. The capability allows to retrieve data from the Google Dataplex Catalog projects. |
4 |
Add the JDBC Catalog Ingestion capability. | Adds the JDBC Catalog Ingestion capability to the JDBC Dataplex Catalog connection. The capability will allow to retrieve the available databases and schemas in Dataplex Catalog during profiling and classification. |
5 | Synchronize Dataplex Catalog. |
You can manually synchronize Dataplex Catalog or you can add a synchronization schedule to automatically synchronize it. |
6 | Set up and profile the data. |
Complete the following steps to profile the data. Before you start, ensure that you enabled profiling for Edge. |
6.a
|
Add the JDBC profiling capability for the JDBC connection. | |
6.b
|
Synchronize Dataplex Catalog again. | |
6.c
|
Configure the profiling options for the synchronized schemas. | |
6.d
|
||
7 |
Set up and classify the data. |
Complete the following steps to classify the data. |
7.a
|
Enable and set up Unified Data Classification. | |
7.b
|
Start the data classification. |
# | Step | Description |
---|---|---|
1 | Create a GCP connection to your Edge or Collibra Cloud site. | Create a connection to the Google Cloud Platform (GCP) in an Edge or Collibra Cloud site. |
2 |
Add the Google Dataplex Catalog synchronization capability to your Edge or Collibra Cloud site. | Add the Google Dataplex Catalog synchronization capability to the GCP Edge connection. The capability allows to retrieve data from the Google Dataplex Catalog projects. |
3 | Synchronize via Google Dataplex Catalog ingestion. |
You can manually synchronize Google Dataplex or you can add a synchronization schedule to automatically synchronize it. |