Classify assets via the Unified Data Classification method
Important Unified Data Classification is in beta testing. Only activate this feature in your Test environments. Don't enable it in Production environments yet because it's not fully ready.
Manually classify assets
To add a data class to a Column asset, navigate to the Column asset and select the data class for the column.
For an example, go to Example: Manual classification.
The data classes in the drop-down list are the ones defined after you have activated the Unified Data Classification method. For information on configuring new data classes, go to Configuring data classes.
Automatically classify assets
When you start the automatic data classification process, the process verifies the data in the column or columns against the data classes, and makes classification suggestions with a confidence score. This score is an estimation based on data samples that the data classification process collects. A deviation from the exact score is possible.
If the Unified Data Classification setting has been enabled, the automatic classification process uses the newly defined data classes. For information on configuring new data classes, go to Configuring data classes.
-
The automatic data classification process needs at least six values that can be checked, to classify a column.
Example: For data class A, you define a regular expression and indicate you don't want to consider empty values.
If you then classify a column with a lot of null values and five non-null values, the column won't get classified, even if the non-null values match data class A. -
The automatic data classification process will extract a maximum of 1,000 values from the data source. The samples are stored between 24 and 48 hours in the Edge Site cache. They are not transferred to the Collibra. If the Edge Site cache already contains at least 100 samples for this data source, the automatic data classification process will use those.
To start this classification process for one column:
- Navigate to the related Column asset.
- Click the Data Profiling tab page.
- Click the Classify button.
The data classification process starts.
If a data class matches the data in the column, a classification suggestion will be assigned to the Column asset with a confidence percentage.
To start the classification process for one or more columns from a Table, Schema, or Database asset:
- Navigate to the Table, Schema, or Database asset.
- Select Actions → Classify.
The data classification process starts.
If a data class matches a column in the Table asset, a data classification suggestion will be assigned to the Column asset with a confidence percentage.
What's next?
Go to some examples