Edit the tokenizer settings

The tokenizer settings determine how the search text entered in the Search field is split into individual terms—called tokens—when a search is performed.

Important After you edit the tokenizer settings, you need to reindex Collibra Data Intelligence Platform.

Depending on your environment, follow this procedure either on the Services Configuration tab of the Collibra settings or in Collibra Console:

Important You can't edit the Services Configuration from the Settings page in the latest UI. If you use the latest UI, you can configure settings only in Collibra Console. For more information, go to DGC service configuration settings.

Requirements and permissions

Steps

  1. Open the Services Configuration page.
    1. On the main toolbar, click Products icon, and then click Cogwheel icon Settings.
      The Collibra settings page opens.
    2. Click Services Configuration.
    3. Click Edit configuration.
    Open the DGC service settings for editing:
    1. Open Collibra Console.
      Collibra Console opens with the Infrastructure page.
    2. In the tab pane, expand an environment to show its services.
    3. In the tab pane, click the Data Governance Center service of that environment.
    4. Click Configuration.
    5. Click Edit configuration.
  2. In the Search index configuration section, in the Tokenizer subsection, enter the required information.
    SettingDescription
    Type

    The tokenizer that determines how the search text is split.

    The Type field must contain either Standard (default) or Character.

    • Standard: The Standard tokenizer is a method of splitting the search text into individual terms and is based on a default set of characters. This tokenizer follows the word break rules from the Unicode Text Segmentation algorithm.
    • Character: The Character tokenizer allows you to customize when you want a search text to be split into individual terms. For more information, go to When and how to use the Character tokenizer.
    Parameter map

    A list of characters that the Character tokenizer allows.

    If you entered Character in the Type field, you must add a parameter map.

    To add a parameter map:

    1. Click Add.
      The Add map option dialog box appears.
    2. In the Field key field, enter the following value: allowedCharacters
    3. In the Field value field, enter the set of characters that you want the tokenizer to allow. For more information, go to When and how to use the Character tokenizer.

    Tokenizer settings

  3. Click Save all.
    Your changes are saved.

What's next?