Semantic search (beta)

By default, search uses a keyword-based approach, commonly known as keyword or lexical search. Keyword search looks for resources that contain the search text you enter. For example, if you search for data governance, keyword search provides results that contain the exact phrase data governance or the individual words data and governance. It doesn't account for synonyms or related concepts. In contrast, semantic search goes beyond the literal keyword interpretation by understanding the context, intent, and meaning behind your search text, thus providing more nuanced and relevant results.

As an administrator, you can integrate semantic search with keyword search. This combined approach ensures users get a more comprehensive set of results that match not only their search text but also their actual intent, even if specific keywords aren't present in their search text. With semantic search, they might find results such as information integrity or risk management when they search for data governance. This way, they can discover relevant content that might be missed with a purely keyword-based approach.

Where is semantic search available?

Semantic search is available only for commercial customers and in managed Elastic environments. It isn't available for users on Collibra Cloud for Government (GovCloud) and Collibra Platform Self-Hosted (CPSH).

How does semantic search work?

Semantic metadata

Semantic search enhances traditional keyword search by using semantic metadata to provide more accurate and relevant results. Semantic search focuses only on the Name (asset's display name), Description, and Definition fields of assets to understand their meaning.

Importance of a full reindex

Semantic metadata is generated during a full reindex. Therefore, after enabling semantic search or after adding assets, a full reindex is required to update the search index with semantic metadata.

Asynchronous data

For data that is updated asynchronously, keyword indexing continues to work automatically. However, to update semantic metadata for asynchronous updates, a full reindex is required.

Search results

With semantic search enabled, search results are based on keyword search but enhanced by semantic relevance. Semantic search influences the order of the results, prioritizing those that are more contextually relevant. If no keyword matches are found, semantic search tries to find relevant items. If neither keyword nor semantic matches are found, similar terms may be suggested, correcting possible spelling errors (for example, dato changed to data).

Example 

Suppose that the search algorithm considers data governance and risk management to be related concepts. Consider an asset named DIP that contains the phrase Risk management in its description but doesn't mention data governance anywhere.

If you search for data governance, the following occurs:

  • If there are direct matches for data or governance, those results are shown, but their order is influenced by semantic relevance. For example, an asset named Data Governance with the description Risk management would rank higher than an asset named Data Governance without any mention of risk management.
  • If there are no direct matches, a semantic-only search is performed, and DIP is shown based on its relatedness to data governance.

What search text works and doesn't work for semantic search?

Semantic search is activated only if the search text contains more than one word, allowing it to infer meaning and understand context. For single-word search texts, only keyword matches are returned. Additionally, enclosing the search text in double quotation marks bypasses semantic search and returns exact matches instead. It is also important to note that semantic search can't identify assets with very short names or descriptions.

Good to know

  • Semantic search doesn't rely on the synonym relation type.
  • Semantic search works regardless of which search tab is active on the search page.
  • There is no visual indicator on a search result to distinguish whether the result is from semantic search or not.
  • Semantic search is disabled by default.
  • Semantic search is optimized for English.

How to enable or disable semantic search?

Prerequisites

  • You have the ADMIN or SUPER role in Collibra Console.
  • The Use managed Elastic setting in Collibra Console is enabled.

Steps

  1. Open Collibra Console.
    Collibra Console opens with the Infrastructure page.
  2. In the tab pane, expand an environment to show its services.
  3. In the tab pane, click the Data Governance Center service of that environment.
  4. Click Configuration.
  5. Click Edit configuration.
  6. Go to the Search index configuration section.
  7. Select the required value in the Semantic search setting.
  8. Click Save all.
  9. Click Rebuild search index and automatic hyperlinks.