Configure the recommenders and matchers
Collibra Data Intelligence Platform contains recommenders and matchers that recommend data sets or business assets.
You can configure them to optimize the recommendations.
Depending on your environment, follow this procedure either on the Services Configuration tab of the Collibra settings or in Collibra Console:
Prerequisites
- You have the ADMIN or SUPER role in Collibra Console.
- You have a global role that has the System administration global permission.
- The Services Configuration tab is available in the Collibra settings.
Steps
-
Open the Services Configuration page.
-
On the main toolbar, click
, and then click
Settings.
The Collibra settings page opens. - Click Services Configuration.
- Click Edit configuration.
Open the DGC service settings for editing:- Open Collibra Console.
Collibra Console opens with the Infrastructure page. - In the tab pane, expand an environment to show its services.
- In the tab pane, click the Data Governance Center service of that environment.
- Click Configuration.
- Click Edit configuration.
-
On the main toolbar, click
, and then click
Settings.
- In the Recommender configuration section, make the necessary changes.
Setting impacts Description Catalog recommender enabled All recommendations - True (default): The "Data sets you might like" section is included on the Data Catalog Home page. This section shows data sets you might be interested in, as determined by the recommender, which takes into account your data sets and the data sets of similar users.
- False: The "Data sets you might like" section is not included on the Data Catalog Home page.
Data set recommender execution time Recommendations of data sets to users The schedule (CRON job) by which the data set recommender looks for recommended data sets for a user.
By default the data set recommender does this every night.
Asset recommender execution time Recommendations of business assets to data assets The schedule (CRON job) by which the asset recommender looks for suggested relations between business assets and data sets. Data set matcher execution time Data set matcher The schedule (CRON job) by which the data set matcher looks for similar data sets. Data set similarity threshold Data set matcher The amount of business assets that have to be related to two data sets before the data sets are considered to be similar.
This percentage is expressed by a decimal where 1,00 equals 100%.
ExampleIf this value is 0.3 and at least 30% of the related business assets are related to both data sets, they are considered to be similar.
Duplicate schema threshold Schema matcher The amount of assets that have to be related to both schemas before the schemas are considered to be similar.
This percentage is expressed by a decimal where 1,00 equals 100%.
Fuzzy vs exact matching strategy for business assets Recommendations of business assets to data sets and of business assets to column assets The percentage that determines to what extent assets with a similar name become more important.
The ranking in the search engine results always has an impact on the suggestion score. However, similarity between the asset names can also be taken into account. If you decrease this percentage, the ranking of the search results becomes more important for the suggestion score, while the similarity between the asset names becomes less important. If you increase the percentage, assets with similar names will receive a higher suggestion score.
This percentage is expressed by a decimal where 1,00 equals 100%. You can enter a value greater than 1,00.
Recommendation weights for data sets Recommendations of data sets to users An ordered comma-separated list of values that define the importance of properties for recommendations. The order of the values reflects the importance of the value.
This setting is only used for data set recommendations if your Collibra does not yet have enough data for relevant results from the active recommendations algorithms.
Possible values:
- CERTIFIED: Data sets that are certified are considered more relevant.
- POPULARITY: The number of visits to the data set page.
Active recommendation algorithms Recommendations of data sets to users and of business assets to data sets A comma-separated list of algorithms that calculate recommendations. By default, all available algorithms are listed.
Possible values:
- BASELINE
- USER_MEAN
- IICF (Item-Item Collaborative Filtering)
- SLOPE_ONE
- WEIGHTED_SLOPE_ONE
Data set elements threshold Recommendations of data sets to users The maximum number of elements per data set that the recommender will use to train the model. The data set elements are taken randomly.
Lowering this number can prevent out-of-memory issues but also impacts the accuracy of recommendations for large data sets.Warning If you create an invalid cron pattern, Collibra Data Intelligence Platform stops responding.
- Click Save all.
Note Depending on the configuration that you have applied, it is possible that you do not notice the recommendation updates immediately, but only the next day, for example when you update a schedule.