Automatic data classification via the Data Classification platform

When you register a data source, you have the option to store a data profile and sample data. These options are required if you want to classify columns in the data set.

The Data Classification platform predicts the data classes of selected columns and sends them back to Collibra Data Intelligence Cloud, where you confirm or reject the suggested data classes. The Data Classification Platform uses your feedback to retrain the platform and improve future data classifications.

Warning   If you want to use the Data Classification platform, request it via your Collibra contact or create a support ticket.

Limitations

Automatic data classification flow

In the following schema, you can see the different steps of an automatic data classification flow.

Data Classification flow

Step

Description

Step 1 You select the columns that you want to classify and send their sample data to the Data Classification platform.
Step 2 The Data Classification platform predicts the data classes of the columns.
Step 3 The Data Classification platform sends the data classes to Collibra.
Step 4

You accept or reject the predicted data class of each column or add your own new classes.

The Data Classification platform might predict multiple data classes for a column. In this case, if the prediction is accurate, you can accept multiple data classes for the column.

Step 5

Your selections are sent to the Data Classification platform.

The Data Classification platform stores your selections, along with the associated sample data, to retrain the Classification Model and improve future classification predictions.