About Unified Data Classification

Unified Data Classification (UDC) is the data classification method used at Collibra. It is enabled by default and has replaced previous data classification methods via Edge and data classification via the Cloud Data Classification Platform.

Unified Data Classification characteristics

  • With the correct permissions, data stewards can configure data classes and classify data in an environment.

  • Data stewards can create custom data classes or import optional, out-of-the-box data classes and adjust them based as needed, such as changing the name or classification rules.

  • The method includes an automatic data classification process.

    • The process works via Edge and requires specific setup.
      Because the data doesn't leave your organization's network, the automatic data classification process is secure. The samples used during the automatic data classification process are temporarily added to the Edge site cache. They are not transferred to Collibra.

      Note If you're using a Collibra Cloud site, go the Collibra Cloud site documentation to check if your data source is supported.

    • The process relies on classification rules specified for each data class.
      The process doesn't rely on machine learning, which makes issues and changes more transparent. Using classification rules also provides high flexibility and allows for customizations.

      Note The automatic data classification process remembers any rejected data class suggestions, meaning a data class will not be suggested again if you have rejected the data class for an asset. Also, once a data classification has been accepted for a column, the data classification won't be automatically updated if you run the data classification process again.

  • UDC is available in the user interface and through REST APIs: Data Classification REST API v2, Data Class Management REST API v1, Data Class Import REST API v1.

    Important The Data Classification REST API v1 ClassificationMatches endpoints remain valid and can be used by UDC. The other endpoints in this API are deprecated.

Steps overview: Data classification

Data classification is available for registered JDBC data sources and for Databricks Unity Catalog and Dataplex Universal Catalog data sources integrated via Edge.

Step

Description

Register a data source via Edge and synchronize one or more schemas.

Make sure your environment is set up for Unified Data Classification via Edge.

Configure your data classes in UDC.

Classify the synchronized data.
You can classify data in 2 ways.
  • Manual data classification: A data steward assigns a data class to a column.
  • Automatic data classification: Collibra analyzes a subset of the data in a data source and suggests a data class for that data without human input.
    You can choose to accept or reject suggested data classifications manually or automatically.

    Note Automatic data classification looks only at structured data. Unstructured data is out of scope.

Step

Description

Integrate the Databricks or Dataplex Universal Catalog data sources.

To allow for classification, add a JDBC connection in the Databricks Unity Catalog synchronization or Dataplex Universal Catalog capability. During synchronization, a Catalog Data Classification capability is created automatically if it does not already exist. As a result, you do not need to create a separate Catalog Data Classification capability to classify data from integrations.

For more information, go to Steps: Integrate Databricks Unity Catalog via Edge or Steps: Integrate Google Dataplex Universal Catalog via Edge.

Make sure your environment is set up for Unified Data Classification via Edge

Configure your data classes in UDC.

Classify the synchronized data.
You can classify data in 2 ways.
  • Manual data classification: A data steward assigns a data class to a column.
  • Automatic data classification: Collibra analyzes a subset of the data in a data source and suggests a data class for that data without human input.
    You can choose to accept or reject suggested data classifications manually or automatically.

    Note Automatic data classification looks only at structured data. Unstructured data is out of scope.

Understanding the automatic data classification process

The following image demonstrates on a high level how the automatic classification system works.

Image of the data classification flow showing the various steps in the process

What's next

Set up Unified Data Classification

Related topics