Example: Configuring a data class based on a list of values and starting the automatic classification for a table
In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.
Use the following options to see the documentation in the latest UI or in the previous, classic UI:
You want to create an extra data class for T-shirt sizes in the Unified Data Classification method. Once that is done, you want to start the classification process for a full table.
Before you begin
Make sure you know which values you used in the organization to refer to T-shirt sizes. In this case, we consider: XS, M, L, XL, XXL, XXXL, Extra small, Small, Medium, Large, X-Large, XX-Large, XXX-Large, 2XL, 3XL.
For more information, go to Add a data class.
Steps
-
Create and configure data class T-shirt size.
- On the main toolbar, click , and then click Stewardship.
- Click the Data Classification tab.
- Add the data class.
- Click Add.
- Add the Name of the data class. In our case, T-shirt size.
- Press Enter to add the data class.
- Click Create.
The data class has been created and is available in the list.
- Define the data class parameters.
- In the Data Classification tab, select the row of the new data class.
The data class parameters appear in a pane on the right-hand side. - Optionally, add a description by clicking the Description field, typing the description, and clicking outside the field.
- Optionally, add a description by clicking the Edit icon next to the Description field.
- Open the Details section.
- Complete the fields as required.
For information on the fields, go to Configuring data classes.Data class parameter Description Minimum confidence threshold We set this value to 80.
Include empty values
We leave this field as the default value (False).
Examples Small, L
- Open the Classification rules section.
- Click Add new rule.
- In the Type list, select List of values.
Extra fields appear. - Complete the fields as required.
For information on the fields, go to Configuring data classes.Data class parameter Description Values We add the following list. Each value must start on a new line.
XS
S
M
L
XL
XXL
XXXL
extra small
small
medium
large
X-large
XX-large
XXX-large
2XL
3XL
Description We leave this field empty. - Click Save.
The classification rule for the data class is configured.
If you expand the Classification rules section, you see the details.
- In the Data Classification tab, select the row of the new data class.
- Start the automatic classification.
- Navigate to a Table asset.
- Select Actions → Classify.
The data classification process starts. For more information, go to Automatically classify assets
If a data class matches a column in the Table asset, a data classification suggestion will be assigned to the Column asset with a confidence percentage. For more information, go to accepting and rejecting data classification suggestions.Important The values are not case-sensitive, the value “small” in the list will also be a match with the values “Small” and “SMALL”.
Example A column contains the values
petite
,s
,L
,xl
,XL
,unknown
,unknown
, andno size
. After the automatic data classification, the column will be classified as a T-shirt size with a confidence score of 50% because half of the values in the column are part of the list of values.
Note that the character case didn’t affect the result.
What's Next?
You can also add an extra classification rule to an existing data class.