Feedback on Automatic Data Classification

When Collibra Data Intelligence Cloud predicts data classes for a column, the information is visible in the Data Classification column in the Table and Column asset pages.
Example of data classification result

  • If no data classes are suggested for a column, Automatic Data Classification could not predict the data class.
  • Sometimes multiple data classes can be suggested.
  • The percentage next to the data class indicates the confidence level of the suggestion.
    If automatic data classification acceptance and rejection is active, data classification suggestions with a confidence level within the defined thresholds will be accepted or rejected automatically.

You can accept or reject the data classification suggestions, or add a user-defined class.

Accepting and rejecting data classes

You can accept or reject the data classes that are suggested.

  • Reject data class: The data class is removed from the column.
    When a data class is rejected, the system remembers that. This data class won't be suggested again by the data classification.
  • Accept data class: The data class is added to the column.

To manually accept or reject a data class, hover over the data class and click the appropriate icon.

If automatic data classification acceptance and rejection is active, data classification suggestions with a confidence level within the defined thresholds will be accepted or rejected automatically.

For the Cloud Data Classification Platform, sending this feedback is important. Without the feedback, the Cloud Data Classification Platform cannot retrain. Accepting a data class is more valuable than rejecting.

  • When you reject a suggestion, the Cloud Data Classification Platform classification model no longer uses the sample data.
  • When you accept, the sample data is permanently added to the Cloud Data Classification Platform classification model to improve future data class predictions.
Note If you use Automatic Data Classification via Edge, the feedback is only stored. It is not used to retrain the classification model nor used for future reference.

Creating user-defined classes

When columns cannot be classified, you can create user-defined classes.

Take the following guidelines into account when you create user defined classes:

  • Avoid duplications. Always check the list of proposed classes before creating a new data class.
  • Avoid vague data classes.
  • Avoid mixed data classes and accept the best applicable one.

The Cloud Data Classification Platform uses this new information to retrain the platform and improve the predictions in the future.

Note If you use Automatic Data Classification via Edge, the user-defined classes are only stored. They are not used to retrain the classification model.