Enable and calculate data similarity
The Data similarity feature requires some setup to calculate similarity scores for your data during profiling.
Prerequisites
- You are using Edge.
- If you want to use standalone Data Marketplace, Data Marketplace is enabled.
Steps
-
In the Service Configuration settings, enable the Calculate Data Similarity and define the Data similarity threshold profiling settings.
Data similarity scores can be calculated when you profile a data source via Edge.
Show howDepending on your environment, follow this procedure either in Collibra Console or on the Services Configuration tab of the Collibra settings:
Prerequisites:
- You have the ADMIN or SUPER role in Collibra Console.
- You have a global role with the Product Rights > System administration global permission.
- The Services Configuration tab is available in the Collibra settings.
Steps:
-
Open the Services Configuration tab:
-
On the main toolbar, click
→
Settings.
The Settings page opens. - Click Services Configuration.
- Click Edit configuration.
Open the DGC service settings for editing:- Open Collibra Console.
Collibra Console opens with the Infrastructure page. - In the tab pane, expand an environment to show its services.
- In the tab pane, click the Data Governance Center service of that environment.
- Click Configuration.
- Click Edit configuration.
-
On the main toolbar, click
- In the Data profiling section,
- Make sure the Calculate Data Similarity setting is enabled. This setting is enabled by default for cloud environments, except for Public sector.
- In Data similarity threshold, define from which similarity score Table assets must be displayed as similar data.
The default value is 0.5, which means that Table assets with a similarity score higher than 50% will show up as similar data.
Enter a value between 0.1 and 0.9.
- Click Save all.
-
In the Collibra settings, enable the Data Similarity setting for Data Marketplace.
If, for the Table asset, some assets have a similarity score higher than the defined threshold, the Similar Data tab is visible in the Data Marketplace asset preview.Show howPrerequisites:
The Settings landing page is enabled.
You are an administrator in Data Marketplace.Steps:
-
On the main toolbar, click
→
Settings.
The Settings page opens. - In the Search section, click Actions and Preview.
- Click the Data Similarity tab.
- Make sure the Show Similar Data checkbox is selected. This checkbox is selected by default for cloud environments, except for Public sector.
- Click Save.
-
On the main toolbar, click
-
Register a data source via Edge and profile the data.
Similarity scores are calculated for the profiled Table assets.
TipIf you don't want to calculate similarity scores for a data source during profiling, you can deactivate the calculation via the profiling capability configuration. To do this, in the capability, add the following parameter in the Other section:
- Name: data-similarity
- Type: Text
- Encryption: Not encrypted (plain text)
- Value: false
If a data consumer in Data Marketplace opens a Table asset preview, and similar assets are available for this table, the Similar Data tab is shown.