Edit the tokenizer settings

Important Tokenizer settings in Collibra Console are being retired. To modify or remove existing configurations, contact Collibra Support.

Tokenizer settings determine how the search text entered in the search box is split into individual terms—called tokens—when a search is performed.

Important After you edit the tokenizer settings, you need to reindex Collibra Platform.

Depending on your environment, follow this procedure either in Collibra Console or on the Services Configuration tab of the Collibra settings:

Collibra Console Collibra Settings

Important You can't edit the service configuration from the Settings page in the latest UI. If you use the latest UI, you can edit the service configuration only in Collibra Console. For more information, go to DGC service configuration settings.

Prerequisites

You have the ADMIN or SUPER role in Collibra Console.

You have a global role that has the Product Rights > System administration global permission.

The Services Configuration tab is available in the Collibra settings.

Steps

Open the Services Configuration tab:
1. On the main toolbar, click → Settings.
  The Settings page opens.
2. Click Services Configuration.
3. Click Edit configuration.
Open the DGC service settings for editing:
1. Open Collibra Console.
  Collibra Console opens with the Infrastructure page.
2. In the tab pane, expand an environment to show its services.
3. In the tab pane, click the Data Governance Center service of that environment.
4. Click Configuration.
5. Click Edit configuration.

In the Search index configuration section, in the Tokenizer subsection, enter the required information.

Setting	Description
Type This setting requires the SUPER role.	The tokenizer that determines how the search text is split. The Type field must contain either Standard (default) or Character. Standard: The Standard tokenizer is a method of splitting the search text into individual terms and is based on a default set of characters. This tokenizer follows the word break rules from the Unicode Text Segmentation algorithm. Character: The Character tokenizer allows you to customize when you want a search text to be split into individual terms. For more information, go to When and how to use the Character tokenizer.
Parameter map This setting requires the SUPER role.	A list of characters that the Character tokenizer allows. If you entered Character in the Type field, you must add a parameter map. To add a parameter map: Click Add. The Add map option dialog box appears. In the Field key field, enter the following value: allowedCharacters In the Field value field, enter the set of characters that you want the tokenizer to allow. For more information, go to When and how to use the Character tokenizer.

Setting

Description

Type

This setting requires the SUPER role.

The tokenizer that determines how the search text is split.

The Type field must contain either Standard (default) or Character.

Standard: The Standard tokenizer is a method of splitting the search text into individual terms and is based on a default set of characters. This tokenizer follows the word break rules from the Unicode Text Segmentation algorithm.
Character: The Character tokenizer allows you to customize when you want a search text to be split into individual terms. For more information, go to When and how to use the Character tokenizer.

Parameter map

This setting requires the SUPER role.

A list of characters that the Character tokenizer allows.

If you entered Character in the Type field, you must add a parameter map.

To add a parameter map:

Click Add.
The Add map option dialog box appears.
In the Field key field, enter the following value: allowedCharacters
In the Field value field, enter the set of characters that you want the tokenizer to allow. For more information, go to When and how to use the Character tokenizer.

Click Save all.
Your changes are saved.

What's next?

Restart the environment to apply your changes. For more information, go to Stop an environment and Start an environment.
Reindex Collibra.