About data classes in the Unified Data Classification method
In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.
Use the following options to see the documentation in the latest UI or in the previous, classic UI:
Data classes are the different groups you want to use to classify your data, for example, email, phone number, and web browser.
The automatic data classification method uses the classification rules defined in a data class to verify whether an asset can be classified with the data class. A data class can contain multiple classification rules and the rules can have different types.
A data class in the Unified Data Classification method consists of the following elements:
Data class element | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name | The name of the data class. | ||||||||||||
Enabled |
Switch to indicate whether this data class needs to be taken into account during the data classification process. This option can be useful if the data class is not ready for use or if it is in testing phase. |
||||||||||||
Description |
The description of the data class. The description can't exceed 10,000 characters. |
||||||||||||
Details | |||||||||||||
Minimum confidence threshold
|
The confidence percentage that must be reached for the data class to be considered as a possible classification result. The confidence percentage refers to the percentage of values in a column that match at least one of the classification rules in a data class, for example, the regular expression. Enter a value between 0
and 100. Example If you add value 80 in this field, this data class will be suggested by the automatic data classification process only if the confidence percentage reaches 80 percent or higher. Tip Confidence scores of 0 are never taken into account. |
||||||||||||
Include empty values
|
Indicates if you want to include empty values in the confidence percentage calculation.
This option can be used to receive an accurate confidence score for all data in a column. Example
You have a column Z with 40 empty values and 60 phone numbers. You have a data class A with a regular expression to detect US phone numbers.
Important Some regular expressions are constructed to allow a match with empty values. This means that, through the regular expression, empty values can be matched to the data class, which affects the confidence score. |
||||||||||||
Examples
|
Some examples of values that match the classification rule for the data class. Add one example per line. |
||||||||||||
Classification rules |
A data classification rule is used by the data classification process to calculate the confidence score, which is a percentage that indicates the likelihood that the data class fits the data in an asset. A data class can contain multiple classification rules. Each rule is verified against the data, and the data class is assigned as soon as one of the rules applies. Tip
|
||||||||||||
Description
|
A description of the classification rule. |
||||||||||||
Type
|
This is the type of classification rule. The possible values are Regular expression or List of values. Depending on your selection other fields appear. Tip
Use a regular expression when you can validate a pattern. Email addresses, for example, follow a specific format.
|