Automatically classify assets via the Unified Data Classification method

Important 

In Collibra 2024.05, we've launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

Automatically classify assets

When you start the automatic data classification process, the process verifies the data in the column or columns against the data classes, and makes classification suggestions with a confidence score. This score is an estimation based on data samples that the data classification process collects. A deviation from the exact score is possible. If the Unified Data Classification setting has been enabled, the automatic classification process uses the newly defined data classes. For information on configuring new data classes, go to Configuring data classes.

Important 
  • The automatic data classification process needs at least six values that can be checked, to classify a column.
    Example: For data class A, you define a regular expression and indicate you don't want to consider empty values.
    If you then classify a column with a lot of null values and five non-null values, the column won't get classified, even if the non-null values match data class A.
  • The automatic data classification process will extract a maximum of 1,000 values from the data source. The samples are temporarily added to the Edge site cache. They are not transferred to Collibra. If the Edge Site cache already contains at least 100 samples for this data source, the automatic data classification process will use those.

Requirements and permissions

Make sure you have the required permissions.

Start the classification process for one column

  1. Navigate to the related Column asset.
  2. In the At a Glance section, click Classify.
    The data classification process starts.
    If a data class matches the data in the column, a classification suggestion will be assigned to the Column asset with a confidence percentage.
  3. Click the Data Profiling tab page.
  4. Click the Classify button.
    The data classification process starts.
    If a data class matches the data in the column, a classification suggestion will be assigned to the Column asset with a confidence percentage.

Start the classification process for one or more columns from a Table, Schema, or Database asset

  1. Navigate to the Table, Schema, or Database asset.
  2. Select ActionsClassify.
    The data classification process starts.
    If a data class matches a column, a data classification suggestion will be assigned to the Column asset with a confidence percentage.
  3. Open the Table asset with the classified columns.
  4. Add the Data Classification column to the table.
    In the Data Classification column, you find the suggested data classes.
  5. Example of data classification result

    Example of data classification result

What's next?

Accepting or rejecting automatic data classification suggestions

For examples, go to Examples.