Enable and calculate data similarity

Important 

In Collibra 2024.05, we've launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

The Data similarity feature requires some setup to calculate similarity scores for your data during profiling.

Tip Even if data similarity is enabled, you can specify that you don't want to calculate similarity scores for a data source.

Before you begin

Important 

Data similarity is a cloud-only feature and is not certified for FedRAMP.

Steps

Step More details
1

In the Service Configuration settings, enable the Calculate Data Similarity and define the Data similarity threshold profiling settings.

Data similarity scores can be calculated when you profile a data source via Edge.

2

In the Collibra settings, enable the Data Similarity setting for Data Marketplace.
If, for the Table asset, some assets have a similarity score higher than the defined threshold, the Similar Data tab is visible in the Data Marketplace asset preview.

3

Register a data source via Edge and profile the data.

Similarity scores are calculated for the profiled Table assets.

Important 

If you don't want to calculate similarity scores for a data source during profiling, you can deactivate the calculation via the profiling capability configuration. In the capability, add the following parameter in the Other section:

  • Name: data-similarity
  • Type: Text
  • Encryption: Not encrypted (plain text)
  • Value: false

What's next?

If a data consumer in Data Marketplace opens a Table asset preview, and similar assets are available for this table, the Similar Data tab is shown.