Feedback on Automatic Data Classification

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

When Collibra Data Intelligence Platform predicts data classes for a column, the information is visible in the Data Classification column in the Table and Column asset pages.


Example of data classification result

  • If no data classes are suggested for a column, Automatic Data Classification could not predict the data class.
  • Sometimes multiple data classes can be suggested.
  • The percentage next to the data class indicates the confidence level of the suggestion.
    If automatic data classification acceptance and rejection is active, some data classification suggestions with a confidence level within the defined thresholds can be accepted or rejected automatically.
  • You can accept or reject the data class suggestions, or add a user-defined class.
    • If you reject a data class suggestion, the data class is removed from the column.

      Important When a data class is rejected, the system remembers that. This data class won't be suggested again by the data classification.

    • If you accept a data class suggestion, the data class is added to the column.

Manually accepting and rejecting data classes

To manually accept a data class suggestion, click the Accept icon in the suggestion.
Once accepted, the suggestion icon disappears.

To manually reject a data class suggestion, click the Reject icon.
The data class is removed from the column and won't be suggested again by the data classification.

To manually accept or reject a data class, hover over the data class and click the appropriate icon.

If automatic data classification acceptance and rejection is active, some data classification suggestions with a confidence level within the defined thresholds can be accepted or rejected automatically.

For the Cloud Data Classification Platform, sending this feedback is important. Without the feedback, the Cloud Data Classification Platform cannot retrain. Accepting a data class is more valuable than rejecting.

  • When you reject a suggestion, the Cloud Data Classification Platform classification model no longer uses the sample data.
  • When you accept, the sample data is permanently added to the Cloud Data Classification Platform classification model to improve future data class predictions.
Note If you use Automatic Data Classification via Edge, the feedback is only stored. It is not used to retrain the classification model nor used for future reference.

Creating user-defined classes

When columns cannot be classified, you can create user-defined classes.

Take the following guidelines into account when you create user defined classes:

  • Avoid duplications. Always check the list of proposed classes before creating a new data class.
  • Avoid vague data classes.
  • Avoid mixed data classes and accept the best applicable one.

The Cloud Data Classification Platform uses this new information to retrain the platform and improve the predictions in the future.

Note If you use Automatic Data Classification via Edge, the user-defined classes are only stored. They are not used to retrain the classification model.