About Automatic Data Classification via Edge

Edge classifies your data when you register a data source using Edge, synchronize one or more schemas and trigger the profiling and classification job. Edge profiles and classifies the data on the Edge site itself and only sends the profiling results and classification suggestions to Collibra Data Intelligence Cloud. You can accept or reject the predicted data class of each column or add your own new classes. Automatic Data Classification can predict multiple data classes for a column. If the prediction is accurate, you can accept multiple data classes for the column.

Limitations

  • Automatic Data Classification via Edge is only available for customers using Collibra Data Intelligence Cloud.
  • Currently, data classification on Edge does not retrain the classification model to improve future classification predictions. However, the feedback you provide is stored, and will be valuable once retraining is possible.
  • Out-of-the-box, automatic data classification can predict several data classes. You can also create user-defined data classes. Currently, these user-defineddata classes are not taken into account by the automatic classification process. You need to assign user-defined data classes manually.
  • English is the only supported language, but Automatic Data Classification can run on data in other languages as well.
  • Automatic Data Classification needs profiling data to predict the data classes. Data classification is performed automatically after the profiling process on an Edge site. That means that you can only classify columns of data sources registered in Data Catalog via an Edge site that has the JDBC profiling capability.

Automatic data classification flow via Edge

In the following schema, you can see the different steps of an automatic data classification flow via Edge.

Data Classification flow

Step

Description

Step 1 You create an Edge site with a JDBC connection, a JDBC ingestion capability and a JDBC profiling capability.

Step 2

You register a data source via Edge.
Step 3 You synchronize one or more schemas.
Step 4 You profile and classify.
Edge sends the profiled data and the data class suggestions to Collibra Data Intelligence Cloud. Sensitive data is automatically anonymized before the metadata is sent to Collibra.
Once the data is classified, you can provide feedback on the predicted data classes. Note however that your feedback is not yet used to retrain the classification model.

Tip You can trigger the profiling and classification job based on a schedule or trigger it after synchronizing a schema.