Configure the recommenders and matchers

Collibra Platform contains recommenders and matchers that recommend data sets or business assets.

You can configure them to optimize the recommendations.

Depending on your environment, follow this procedure either in Collibra Console or on the Services Configuration tab of the Collibra settings:

Collibra Console Collibra Settings

Important You can't edit the service configuration from the Settings page in the latest UI. If you use the latest UI, you can edit the service configuration only in Collibra Console. For more information, go to Collibra service configuration settings.

Prerequisites

You have the ADMIN or SUPER role in Collibra Console.

You have a global role that has the Product Rights > System administration global permission.

The Services Configuration tab is available in the Collibra settings.

Steps

Open the Services Configuration tab:
1. On the main toolbar, click → Settings.
  The Settings page opens.
2. Click Services Configuration.
3. Click Edit configuration.
Open the DGC service settings for editing:
1. Open Collibra Console.
  Collibra Console opens with the Infrastructure page.
2. In the tab pane, expand an environment to show its services.
3. In the tab pane, click the Collibra Platform service of that environment.
4. Click Configuration.
5. Click Edit configuration.

In the Recommender configuration section, make the necessary changes.

Setting	impacts	Description
Catalog recommender enabled	All recommendations	True (default): The "Data sets you might like" section is included on the Catalog homepage page. This section shows data sets you might be interested in, as determined by the recommender, which takes into account your data sets and the data sets of similar users. False: The "Data sets you might like" section is not included on the Data Catalog Home page.
Data set recommender execution time	Recommendations of data sets to users	The schedule (CRON job) by which the data set recommender looks for recommended data sets for a user. By default the data set recommender does this every night.
Asset recommender execution time	Recommendations of business assets to data assets	The schedule (CRON job) by which the asset recommender looks for suggested relations between business assets and data sets.
Data set matcher execution time	Data set matcher	The schedule (CRON job) by which the data set matcher looks for similar data sets.
Data set similarity threshold	Data set matcher	The amount of business assets that have to be related to two data sets before the data sets are considered to be similar. This percentage is expressed by a decimal where 1,00 equals 100%. Example If this value is 0.3 and at least 30% of the related business assets are related to both data sets, they are considered to be similar.
Duplicate schema threshold	Schema matcher	The amount of assets that have to be related to both schemas before the schemas are considered to be similar. This percentage is expressed by a decimal where 1,00 equals 100%.
Fuzzy vs exact matching strategy for business assets	Recommendations of business assets to data sets and of business assets to column assets	The percentage that determines to what extent assets with a similar name become more important. The ranking in the search engine results always has an impact on the suggestion score. However, similarity between the asset names can also be taken into account. If you decrease this percentage, the ranking of the search results becomes more important for the suggestion score, while the similarity between the asset names becomes less important. If you increase the percentage, assets with similar names will receive a higher suggestion score. This percentage is expressed by a decimal where 1,00 equals 100%. You can enter a value greater than 1,00.
Recommendation weights for data sets	Recommendations of data sets to users	An ordered comma-separated list of values that define the importance of properties for recommendations. The order of the values reflects the importance of the value. This setting is only used for data set recommendations if your Collibra does not yet have enough data for relevant results from the active recommendations algorithms. Possible values: CERTIFIED: Data sets that are certified are considered more relevant. POPULARITY: The number of visits to the data set page.
Active recommendation algorithms	Recommendations of data sets to users and of business assets to data sets	A comma-separated list of algorithms that calculate recommendations. By default, all available algorithms are listed. Possible values: BASELINE USER_MEAN IICF (Item-Item Collaborative Filtering) SLOPE_ONE WEIGHTED_SLOPE_ONE
Data set elements threshold	Recommendations of data sets to users	The maximum number of elements per data set that the recommender will use to train the model. The data set elements are taken randomly. Lowering this number can prevent out-of-memory issues but also impacts the accuracy of recommendations for large data sets.

Warning If you create an invalid cron pattern, Collibra stops responding.

Click Save all.

Note Depending on the configuration that you have applied, it is possible that you do not notice the recommendation updates immediately, but only the next day, for example when you update a schedule.