Edit the tokenizer settings
The tokenizer settings determine how the search text entered in the Search field is split into individual terms—called tokens—when a search is performed.
Important After you edit the tokenizer settings, you need to reindex Collibra Data Intelligence Platform.
Depending on your environment, follow this procedure either on the Services Configuration tab of the Collibra settings or in Collibra Console:
Important You can't edit the Services Configuration from the Settings page in the latest UI. If you use the latest UI, you can configure settings only in Collibra Console. For more information, go to DGC service configuration settings.
Requirements and permissions
- You have the ADMIN or SUPER role in Collibra Console.
- You have a global role that has the System administration global permission.
- The Services Configuration tab is available in the Collibra settings.
Steps
-
Open the Services Configuration page.
-
On the main toolbar, click
, and then click
Settings.
The Collibra settings page opens. - Click Services Configuration.
- Click Edit configuration.
Open the DGC service settings for editing:- Open Collibra Console.
Collibra Console opens with the Infrastructure page. - In the tab pane, expand an environment to show its services.
- In the tab pane, click the Data Governance Center service of that environment.
- Click Configuration.
- Click Edit configuration.
-
On the main toolbar, click
, and then click
Settings.
- In the Search index configuration section, in the Tokenizer subsection, enter the required information.
Setting Description Type The tokenizer that determines how the search text is split.
The Type field must contain either Standard (default) or Character.
- Standard: The Standard tokenizer is a method of splitting the search text into individual terms and is based on a default set of characters. This tokenizer follows the word break rules from the Unicode Text Segmentation algorithm.
- Character: The Character tokenizer allows you to customize when you want a search text to be split into individual terms. For more information, go to When and how to use the Character tokenizer.
Parameter map A list of characters that the Character tokenizer allows.
If you entered Character in the Type field, you must add a parameter map.
To add a parameter map:
- Click Add.
The Add map option dialog box appears. - In the Field key field, enter the following value: allowedCharacters
- In the Field value field, enter the set of characters that you want the tokenizer to allow. For more information, go to When and how to use the Character tokenizer.
- Standard: The Standard tokenizer is a method of splitting the search text into individual terms and is based on a default set of characters. This tokenizer follows the word break rules from the Unicode Text Segmentation algorithm.
- Click Save all.
Your changes are saved.
What's next?
- Restart the environment to apply your changes. For more information, go to Stop an environment and Start an environment.
- Reindex Collibra.