Configure Cloud Data Classification Platform
When you want to use the Cloud Data Classification Platform in Data Catalog, you first have to configure it.
Depending on your environment, follow this procedure either in Collibra Console or on the Services Configuration tab of the Collibra settings:
Prerequisites
- You have the ADMIN or SUPER role in Collibra Console.
- You have a global role that has the Product Rights > System administration global permission.
- The Services Configuration tab is available in the Collibra settings.
Steps
-
Open the Services Configuration tab:
-
On the main toolbar, click
→
Settings.
The Settings page opens. - Click Services Configuration.
- Click Edit configuration.
Open the DGC service settings for editing:- Open Collibra Console.
Collibra Console opens with the Infrastructure page. - In the tab pane, expand an environment to show its services.
- In the tab pane, click the Data Governance Center service of that environment.
- Click Configuration.
- Click Edit configuration.
-
On the main toolbar, click
- Go to the Data Classification section.
- Enter the required information:
Setting
Description
Machine Learning platform URL
This setting requires the SUPER role.
The address of the machine learning platform that will classify your data. Requester Name
This setting requires the SUPER role.
The unique name to identify the client when using Machine Learning platform. API key
This setting requires the SUPER role.
The API Key to authorize the requester when connecting to the Machine Learning platform. Enable Data Classification
True: Enable Collibra's data classification technology.
False (default): Do not use Collibra's data classification technology are not accepted.
This setting is no longer in use. For information, go to About Data Classification.
Classification job execution timeout The maximum amount of time (in seconds) that a data classification job can run until it is canceled.
The default value is 604,800, which is 1 week. This is also the maximum value.
Tip This is a global timeout limit for the entire classification process. However, in your Edge classification capability, you can configure timeout limits for various job stages, for that job. Timeout limits that you configure in your Edge capability cannot exceed the value that you set in this Console setting.
For more information, go to Set up Unified Data Classification.
Unified Classification enabled
Enables the Unified Data Classification method on Edge.
True (default): The environment uses the new classification method, Unified Data Classification. This has an impact on the available data classes, the required capabilities, and the way you classify data.
Note All existing data classes and classifications become unavailable.
Tip A migration process is available via setting Unified Classification migration tool enabled.
False: The feature is not enabled.
Unified Classification migration tool enabled
Enables the Unified Data Classification migration process.
True: You can manually start the migration process in Unified Data Classification. The migration process:
- Copies classification information from the old classification methods, old Edge method and Cloud Data Classification Platform, into the Unified Data Classification method.
- Creates data classes in the Unified Data Classification method for existing Advanced Data Types (ADTs). ADTs are supported only for Jobserver, which will be end of life on September 30, 2024.
For all information about the process, go to Migrating to Unified Data Classification.
False (default): The migration process is not available.
- If needed, configure the automatic classification acceptance and rejection.
Setting
Description
True: The automatic acceptance and rejection of data classification suggestions is active.
False (default): Data classification suggestions are not automatically accepted or rejected.
Tip Start by manually accepting and rejecting a suggested data class. Only activate the automatic acceptance and rejection feature if you are comfortable with the data classification results.
Automatic acceptance threshold The percentage from which data classification suggestions must be accepted automatically.
If you set this value to 75, then the classification suggestions with a confidence level of 75% or higher are automatically accepted.
If multiple classification suggestions meet the threshold condition for a column, the classification suggestion with the highest confidence level percentage is accepted automatically if this classification suggestion is the only one to have that confidence level percentage.ExampleYou set the automatic acceptance threshold to 85%. You classify a table with 2 columns.
- For column A, three classification suggestions are possible, one with confidence level 93%, one with 92%, and one with 90%.
- For column B, two classification suggestions are possible. Their confidence level is the same, 86%.
The results of the automatic acceptance will be:
- For column A, the classification suggestion with 93% will be accepted automatically.
- For column B, nothing is done, both suggestions will be visible.
The default acceptance threshold is 90.
Automatic rejection threshold The percentage from which data classification suggestions must be rejected automatically. If you set this value to 49, then all data classification suggestions with a confidence level of 49% or lower are automatically rejected.
The default rejection threshold is 10.
Note If the acceptance threshold and rejection threshold are set to the same value, and a data classification suggestion has this confidence level percentage, the classification suggestion will be rejected.
- Click Save all.