Enable and calculate data similarity

To show similar assets, data similarity must be enabled in your environment and similarity scores must be calculated for your data.

Tip Even if data similarity is enabled in your environment, you can specify that you don't want to calculate similarity scores for a data source.

Before you begin

Important 
  • Data similarity is a cloud-only feature and is not certified for FedRAMP.
  • This feature is in Beta testing.

Steps

Step More details
1

In the Service Configuration settings, enable the Calculate Data Similarity (Beta) profiling setting.

Data similarity scores can be calculated when you profile a data source via Edge.

2

In the Collibra settings, enable the Data Similarity (Beta) setting for Data Marketplace.

If, for the Table asset, some assets have a similarity score higher than 50%, the Similar Data tab is visible in the Data Marketplace asset preview.

3

Register a data source via Edge and profile the data.

Similarity scores are calculated for the profiled Table assets.

Important 

If you don't want to calculate similarity scores for a data source during profiling, you can deactivate the calculation via the profiling capability configuration. In the capability, add the following parameter in the Other section:

  • Name: data-similarity
  • Type: Text
  • Encryption: Not encrypted (plain text)
  • Value: false

What's next?

If a data consumer in Data Marketplace opens a Table asset preview, and similar assets are available for this table, the Similar Data tab is shown.