About Data Classification
Data intelligence goes beyond just registering a data source. To get the most out of it, you need to classify data and connect it to other nodes in the Data Intelligence knowledge graph. Data classification helps you understand the types of data you have and where it's located. By assigning a data class, such as name, phone number, or web browser, to Column assets, you add context to your data.
Tip Data Classification is only the first step in getting a better view on your data. The next step is often to clearly identify columns that contain specific data categories, such as Personally Identifiable Information (PII) or Protected Health Information (PHI). For this reason, Collibra Data Classification provides an automated data category association tool. This tool can automatically create a relation between a Column asset and a Data Category asset based on the classification of the column. For more information, go to About automatic Data Category association via Data Classification.
Data classification can be done manually or automatically.
- Manual data classification means that you assign a data class to a column.
- Automatic data classification is a feature that can analyze the data in your data source and suggest a data class for that data, without needing human input. It does this by analyzing a subset of the data itself. You can then choose to accept or reject the suggested data class manually or automatically.
Note Automatic data classification looks only at structured data. Unstructured data is out of scope.
Collibra Data Classification is available via the following:
- Unified Data Classification method on Edge
This is the default method for all new environments working on Edge starting from release 2024.02 and for all existing environments from 2024.07.Tip The old Edge method and the Cloud Data Classification Platform are both end of life. If you were using either, a migration process to the Unified Data Classification method is available.
- Unified Data Classification REST APIs:
- Data Class Management REST API v1
This API allows you to manage data classes in Unified Data Classification. - Data Class Import REST API v1
This API allows you to import out-of-the-box data classes in Unified Data Classification. - Catalog Data Classification REST API v2
This API allows you to start the classification process, and search for and manage the associations between a data class and a data category. - Catalog Data Classification REST API v1 (partially deprecated)
This API allows you to assign data classes to assets, import existing data classifications, and so on.Note Only the ClassificationMatches endpoints are still valid.
- Data Class Management REST API v1