Customizing search index

Before you customize the search index, it is important to understand how the indexing works.

How search indexing works

All textual content in Collibra is stored in a search index to enable fast text searches. To populate the search index, the textual content is split into individual terms called tokens.

Each logical entity in Collibra—such as asset names, community names, domain names, text attributes, and comments—is stored in its own index document in the search index. An index document contains information about how many times a specific token occurs in the entity's content. For example:

When you search for a text (for example, data governance), the engine splits your text into tokens (data and governance), looks for these tokens in the search index, and calculates a score for each matched index document.

An index document’s score reflects how well it matches a search token, based on relevance. This score is calculated using an algorithm that considers factors such as how often the search tokens appear in the document (term frequency), how unique those tokens are across all documents (inverse document frequency), and the length of the field being searched (shorter fields are often seen as more relevant). Documents that contain more of the tokens or rare tokens score higher.

The score is based on the following factors:

The engine returns index documents with the highest scores first, so the most relevant search results appear at the top. For example:

This process ensures that the most relevant information appears first in your search results.

Related topics