Enable and calculate data similarity
The Data similarity feature requires some setup to calculate similarity scores for your data during profiling.
Prerequisites
- You are using Edge.
- If you want to use standalone Data Marketplace, Data Marketplace is enabled.
Steps
-
In the Service Configuration settings, enable the Calculate Data Similarity and define the Data similarity threshold profiling settings.
Data similarity scores can be calculated when you profile a data source via Edge.
Show howDepending on your environment, follow this procedure either in Collibra Console or on the Services Configuration tab of the Collibra settings:
Important You can't edit the service configuration from the Settings page in the latest UI. If you use the latest UI, you can edit the service configuration only in Collibra Console. For more information, go to DGC service configuration settings.Prerequisites:
- You have the ADMIN or SUPER role in Collibra Console.
- You have a global role with the Product Rights > System administration global permission.
- The Services Configuration tab is available in the Collibra settings.
Steps:
-
Open the Services Configuration tab:
-
On the main toolbar, click
→
Settings.
The Settings page opens. - Click Services Configuration.
- Click Edit configuration.
Open the DGC service settings for editing:- Open Collibra Console.
Collibra Console opens with the Infrastructure page. - In the tab pane, expand an environment to show its services.
- In the tab pane, click the Data Governance Center service of that environment.
- Click Configuration.
- Click Edit configuration.
-
On the main toolbar, click
- In the Data profiling section,
- Make sure the Calculate Data Similarity setting is enabled. This setting is enabled by default for cloud environments, except for Public sector.
- In Data similarity threshold, define from which similarity score Table assets must be displayed as similar data.
The default value is 0.5, which means that Table assets with a similarity score higher than 50% will show up as similar data.
Enter a value between 0.1 and 0.9.
- Click Save all.
-
In the Collibra settings, enable the Data Similarity setting for Data Marketplace.
If, for the Table asset, some assets have a similarity score higher than the defined threshold, the Similar Data tab is visible in the Data Marketplace asset preview.Show howPrerequisites:
The Settings landing page is enabled.
You are an administrator in Data Marketplace.Steps:
-
On the main toolbar, click
→
Settings.
The Settings page opens. - In the Data Marketplace section, click Extra Options.
- In the Search section, click Actions and Preview.
- Click the Data Similarity tab.
- Make sure the Show Similar Data checkbox is selected. This checkbox is selected by default for cloud environments, except for Public sector.
- Click Save.
-
On the main toolbar, click
-
Register a data source via Edge and profile the data.
Similarity scores are calculated for the profiled Table assets.
TipIf you don't want to calculate similarity scores for a data source during profiling, you can deactivate the calculation via the profiling capability configuration. To do this, in the capability, add the following parameter in the Other section:
- Name: data-similarity
- Type: Text
- Encryption: Not encrypted (plain text)
- Value: false
If a data consumer in Data Marketplace opens a Table asset preview, and similar assets are available for this table, the Similar Data tab is shown.