About Unified Data Classification

Unified Data Classification (UDC) is the data classification method used at Collibra. It is enabled by default and has replaced previous data classification methods via Edge and data classification via the Cloud Data Classification Platform.

Unified Data Classification characteristics

Steps overview: Data classification

Data classification is available for registered JDBC data sources and for Databricks Unity Catalog and Knowledge Catalog data sources integrated via Edge.

Step

Description

Number 1

Register a data source via Edge and synchronize one or more schemas.

Number 2

Make sure your environment is set up for Unified Data Classification via Edge.

Number 3

Configure your data classes in UDC.

number 4

Classify the synchronized data.
You can classify data in 2 ways.
  • Manual data classification: A data steward assigns a data class to a column.
  • Automatic data classification: Collibra analyzes a subset of the data in a data source and suggests a data class for that data without human input.
    You can choose to accept or reject suggested data classifications manually or automatically.

    Note Automatic data classification looks only at structured data. Unstructured data is out of scope.

Step

Description

Number 1

Integrate the Databricks or Knowledge Catalog data sources.

To allow for classification, add a JDBC connection in the Databricks Unity Catalog synchronization or Knowledge Catalog capability. During synchronization, a Catalog Data Classification capability is created automatically if it does not already exist. As a result, you do not need to create a separate Catalog Data Classification capability to classify data from integrations.

For more information, go to Steps: Integrate Databricks Unity Catalog via Edge or Steps: Integrate Google Knowledge Catalog via Edge.

Number 2

Make sure your environment is set up for Unified Data Classification via Edge

Number 3

Configure your data classes in UDC.

number 4

Classify the synchronized data.
You can classify data in 2 ways.
  • Manual data classification: A data steward assigns a data class to a column.
  • Automatic data classification: Collibra analyzes a subset of the data in a data source and suggests a data class for that data without human input.
    You can choose to accept or reject suggested data classifications manually or automatically.

    Note Automatic data classification looks only at structured data. Unstructured data is out of scope.

Understanding the automatic data classification process

The following image demonstrates on a high level how the automatic classification system works.

Image of the data classification flow showing the various steps in the process

What's next

Set up Unified Data Classification

Related topics