Customizing the search index

Before you customize the Collibra Data Intelligence Platform search feature, it is important to learn how the search functionality works.

All text content in Collibra is stored in a search index to allow fast text search. To populate the search index, the text is split into separate words. The split is done by a component called the tokenizer and the words are often called tokens.

Every logical entity is stored in an index document. This document contains information about how many times a specific token occurs in the text. Separate index documents are stored for:

  • Asset names
  • Community names
  • Domain names
  • Text attributes
  • Comments

When you search for text, the text is also tokenized in the same way. Then, the different words are searched for in the entire search index and a score is calculated for each of the matched documents. The calculation of this score is driven by different factors:

  • The number of times the searched words occur in the document
  • The size of the match relative to the size of the document

Tip You can influence this score by changing the boost factor. For more information about the search functionality in Collibra, see Searching in Collibra.