Enable and calculate data similarity

Important This is a cloud-only feature. It is not supported for Collibra Platform for Government or Collibra Platform Self-Hosted (CPSH) environments.

The Data similarity feature requires some setup to calculate similarity scores for your data during profiling.

Prerequisites

Steps

  1. In the Service Configuration settings, enable the Calculate Data Similarity and define the Data similarity threshold profiling settings.

    Data similarity scores can be calculated when you profile a data source via Edge.

  2. In the Collibra settings, enable the Data Similarity setting for Data Marketplace.
    If, for the Table asset, some assets have a similarity score higher than the defined threshold, the Similar Data tab is visible in the Data Marketplace asset preview.

  3. Register a data source via Edge and profile the data.

    Similarity scores are calculated for the profiled Table assets.

    Tip 

    If you don't want to calculate similarity scores for a data source during profiling, you can deactivate the calculation via the profiling capability configuration. To do this, in the capability, add the following parameter in the Other section:

    • Name: data-similarity
    • Type: Text
    • Encryption: Not encrypted (plain text)
    • Value: false

What's next

If a data consumer in Data Marketplace opens a Table asset preview, and similar assets are available for this table, the Similar Data tab is shown.