Configure the recommenders and matchers

Collibra Data Intelligence Platform contains recommenders and matchers that recommend data sets or business assets.

You can configure them to optimize the recommendations.

Depending on your environment, follow this procedure either on the Services Configuration tab of the Collibra settings or in Collibra Console:

Important You can't edit the Services Configuration from the Settings page in the latest UI. If you use the latest UI, you can configure settings only in Collibra Console. For more information, go to DGC service configuration settings.

Prerequisites

Steps

  1. Open the Services Configuration page.
    1. On the main toolbar, click Products icon, and then click Cogwheel icon Settings.
      The Collibra settings page opens.
    2. Click Services Configuration.
    3. Click Edit configuration.
    Open the DGC service settings for editing:
    1. Open Collibra Console.
      Collibra Console opens with the Infrastructure page.
    2. In the tab pane, expand an environment to show its services.
    3. In the tab pane, click the Data Governance Center service of that environment.
    4. Click Configuration.
    5. Click Edit configuration.
  2. In the Recommender configuration section, make the necessary changes.
    SettingimpactsDescription
    Catalog recommender enabledAll recommendations
    • True (default): The "Data sets you might like" section is included on the Data Catalog Home page. This section shows data sets you might be interested in, as determined by the recommender, which takes into account your data sets and the data sets of similar users.
    • False: The "Data sets you might like" section is not included on the Data Catalog Home page.
    Data set recommender execution timeRecommendations of data sets to users

    The schedule (CRON job) by which the data set recommender looks for recommended data sets for a user.

    By default the data set recommender does this every night.

    Asset recommender execution timeRecommendations of business assets to data assets The schedule (CRON job) by which the asset recommender looks for suggested relations between business assets and data sets.
    Data set matcher execution time Data set matcherThe schedule (CRON job) by which the data set matcher looks for similar data sets.
    Data set similarity thresholdData set matcher

    The amount of business assets that have to be related to two data sets before the data sets are considered to be similar.

    This percentage is expressed by a decimal where 1,00 equals 100%.

    Duplicate schema thresholdSchema matcher

    The amount of assets that have to be related to both schemas before the schemas are considered to be similar.

    This percentage is expressed by a decimal where 1,00 equals 100%.

    Fuzzy vs exact matching strategy for business assetsRecommendations of business assets to data sets and of business assets to column assets

    The percentage that determines to what extent assets with a similar name become more important.

    The ranking in the search engine results always has an impact on the suggestion score. However, similarity between the asset names can also be taken into account. If you decrease this percentage, the ranking of the search results becomes more important for the suggestion score, while the similarity between the asset names becomes less important. If you increase the percentage, assets with similar names will receive a higher suggestion score.

    This percentage is expressed by a decimal where 1,00 equals 100%. You can enter a value greater than 1,00.

    Recommendation weights for data setsRecommendations of data sets to users

    An ordered comma-separated list of values that define the importance of properties for recommendations. The order of the values reflects the importance of the value.

    This setting is only used for data set recommendations if your Collibra does not yet have enough data for relevant results from the active recommendations algorithms.

    Possible values:

    • CERTIFIED: Data sets that are certified are considered more relevant.
    • POPULARITY: The number of visits to the data set page.
    Active recommendation algorithmsRecommendations of data sets to users and of business assets to data sets

    A comma-separated list of algorithms that calculate recommendations. By default, all available algorithms are listed.

    Possible values:

    Data set elements thresholdRecommendations of data sets to users

    The maximum number of elements per data set that the recommender will use to train the model. The data set elements are taken randomly.
    Lowering this number can prevent out-of-memory issues but also impacts the accuracy of recommendations for large data sets.

    Warning If you create an invalid cron pattern, Collibra Data Intelligence Platform stops responding.

  3. Click Save all.

Note Depending on the configuration that you have applied, it is possible that you do not notice the recommendation updates immediately, but only the next day, for example when you update a schedule.