Warning  We have announced the end of life of Jobserver and all related Jobserver integrations for September 30, 2024, with the exception of Public Sector customers using GovCloud or on-prem environments. For more information, go to Announcements.

Automatic Data Classification via the Cloud Data Classification Platform

When you register a data source, you can store a data profile and sample data. This is required if you want to classify columns in the data set. The Cloud Data Classification Platform predicts the data classes of selected columns and sends them back to Collibra Data Intelligence Platform, where you confirm or reject the suggested data classes. The Cloud Data Classification Platform uses your feedback to retrain the platform and improve future data classifications.

Warning If you want to use the Cloud Data Classification Platform, request it via your Collibra contact or create a support ticket. See also Cloud Data Classification Platform setup.

Limitations

  • Automatic data classification via the Cloud Data Classification Platform is a cloud service. Only if your on-premises environment can reach the cloud service, you can use it.
  • Out-of-the-box, automatic data classification can predict several data classes. However, you can also create user-defined data classes to increase its prediction quality.
  • The only supported language for data classes is English.
  • The Cloud Data Classification Platform needs sample data and profiling data to be able to predict the data classes.
    Note You can create sample data and profiling data by registering a data source and choosing to create sample data and profiling data or by importing the data via the Catalog API.
  • The Cloud Data Classification Platform only works for columns of data sources that are registered in Data Catalog with sample data and profiling data.

Automatic data classification flow via the Cloud Data Classification Platform

In the following schema, you can see the different steps of an automatic data classification flow via the Cloud Data Classification Platform.

Data Classification flow

Step

Description

You select the columns that you want to classify and send their sample and profiling data to the Cloud Data Classification Platform. See Classify columns
The Cloud Data Classification Platform predicts the data classes of the columns.
The Cloud Data Classification Platform sends the data classes to Collibra.
You provide feedback by accepting or rejecting the predicted data class of each column or by adding your own new classes.
The Cloud Data Classification Platform can predict multiple data classes for one column. If the prediction is accurate, you can accept multiple data classes for one column.

Your data class selections are sent to the Cloud Data Classification Platform .
The Cloud Data Classification Platform stores your selections, along with the associated sample data, to retrain the classification model and improve future classification predictions.