Example: Configuring a data class based on a list of values and starting the automatic classification for a table

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

You want to create an extra data class for T-shirt sizes in the Unified Data Classification method. Once that is done, you want to start the classification process for a full table.

Before you begin

Make sure you know which values you used in the organization to refer to T-shirt sizes. In this case, we consider: XS, M, L, XL, XXL, XXXL, Extra small, Small, Medium, Large, X-Large, XX-Large, XXX-Large, 2XL, 3XL.
For more information, go to Add a data class.

Steps

  1. Create and configure data class T-shirt size.

    1. On the main toolbar, click Products icon, and then click Stewardship.
    2. Click the Data Classification tab.
    3. Add the data class.
      1. Click Add.
      2. Add the Name of the data class. In our case, T-shirt size.
      3. Press Enter to add the data class.
      4. Click Create.
        The data class has been created and is available in the list.
    4. Define the data class parameters.
      1. In the Data Classification tab, select the row of the new data class.
        The data class parameters appear in a pane on the right-hand side.
      2. Optionally, add a description by clicking the Description field, typing the description, and clicking outside the field.
      3. Optionally, add a description by clicking the Edit icon next to the Description field.
      4. Open the Details section.
      5. Complete the fields as required.
        For information on the fields, go to Configuring data classes.
        Data class parameterDescription
        Minimum confidence threshold

        We set this value to 80.

        Include empty values

        We leave this field as the default value (False).

        ExamplesSmall, L
      6. Open the Classification rules section.
      7. Click Add new rule.
      8. In the Type list, select List of values.
        Extra fields appear.
      9. Complete the fields as required.
        For information on the fields, go to Configuring data classes.
        Data class parameterDescription
        Values

        We add the following list. Each value must start on a new line.

        XS

        S

        M

        L

        XL

        XXL

        XXXL

        extra small

        small

        medium

        large

        X-large

        XX-large

        XXX-large

        2XL

        3XL

        DescriptionWe leave this field empty.
      10. Click Save.
        The classification rule for the data class is configured.
        If you expand the Classification rules section, you see the details.
  2. Start the automatic classification.
    1. Navigate to a Table asset.
    2. Select ActionsClassify.
      The data classification process starts. For more information, go to Automatically classify assets
      If a data class matches a column in the Table asset, a data classification suggestion will be assigned to the Column asset with a confidence percentage. For more information, go to accepting and rejecting data classification suggestions.

      Important The values are not case-sensitive, the value “small” in the list will also be a match with the values “Small” and “SMALL”.

      Example A column contains the values petite, s, L, xl, XL, unknown, unknown, and no size. After the automatic data classification, the column will be classified as a T-shirt size with a confidence score of 50% because half of the values in the column are part of the list of values.
      Note that the character case didn’t affect the result.

What's Next?

You can also add an extra classification rule to an existing data class.