Customizing the search index

Before you customize the Collibra Data Intelligence Cloud search feature, it is important to learn how the search functionality works.

All text content in Collibra is stored in a search index to allow fast text search. To populate the search index, the text is split into separate words. The split is done by a component called the tokenizer and the words are often called tokens.

Every logical entity is stored in an index document. This document contains information about how many times a specific token occurs in the text. Separate index documents are stored for:

When you search for text, the text is also tokenized in the same way. Then, the different words are searched for in the entire search index and a score is calculated for each of the matched documents. The calculation of this score is driven by different factors:

Tip   You can influence this score by changing the boost factor. For more information about the search functionality in Collibra, see Searching in Collibra.