Understanding search behavior and relevance
Understanding how search behavior and relevance work can help you refine your searches and retrieve the most accurate results. This topic explains the rules and methods that determine how search queries are processed and ranked on the search page. It also provides insights into how relevance scores are calculated to ensure that the most relevant resources appear at the top of your search results.
Search behavior
When you perform a search, specific rules determine how matches are found and ranked in the search results. For example, your search text will be split into smaller parts called tokens, and it may be treated as if it ends with a wildcard (*) even if you don’t explicitly add one. These rules use the following methods in the background to process your search text:
- Standard matching: Processes your search text as a whole by identifying complete tokens to find broader matches. The standard matching method looks for matches in all fields.
- Enhanced matching: Process your search text by splitting any complex tokens into smaller parts to find additional matches. The final search results combine matches from both standard and enhanced matching methods. This ensures that you receive comprehensive and relevant matches without having to modify your search text. The enhanced matching method looks for matches only in the Name and Tag fields.
Both standard and enhanced matching methods are automatically applied whenever you perform a search.
Search behavior explained with examples
The following examples explain how a search text is processed in standard and enhanced matching methods to return comprehensive and relevant matches. These examples assume that the UI search appends wildcard setting is disabled.
Why does loan return loansize but not sizeloan?
- Standard matching:
loanis treated as justloan. This doesn’t returnloansizeandsizeloan. - Enhanced matching:
loanis treated asloan*. This returnsloansizebut notsizeloan.
| Search text | Matching method | Treated as | Returns | Doesn't return |
|---|---|---|---|---|
loan
|
Standard | loan
|
loan
|
|
| Enhanced | loan*
|
loansize
|
sizeloan
|
Why doesn’t size loan return sizeloan?
- Standard matching: The space in
size loanacts as an OR operator, treating the text as two separate words. This returns results that contain eithersizeorloan, such assize caseorloan amount. This doesn’t return results that contain bothsizeandloan, such assize loanorsizeloan. - Enhanced matching: The space in
size loanacts as an AND operator, treating the text as a single phrase,size loan. This returns results that contain bothsizeandloanin that order, such assize loan. Additionally, the text is treated as a single phrase with a wildcard at the end,size loan*. This returns results that start withsize loan, such assize loanamount. However, this doesn’t returnsizeloanbecause it doesn’t contain a space.
| Search text | Matching method | Treated as | Returns | Doesn't return |
|---|---|---|---|---|
size loan |
Standard | size or loan |
|
|
| Enhanced |
|
|
sizeloan
|
Why doesn’t size_loan return sizeable loan?
- Standard matching: The underscore in
size_loanacts as the literal character, treating the text as a specific phrase,size_loan. This returns results only if there is an exact match, that is,size_loan. - Enhanced matching: The underscore in
size_loanacts as an AND operator, treating the text as a single phrase,size loan. This returns results that contain bothsizeandloanin that order, such assize loan. Additionally, the text is treated as a single phrase with a wildcard at the end,size loan*. This returns results that start withsize loan, such assize loanamount,size_loanabc, andsize.loanabc. However, this doesn’t returnsizeable loanbecausesizeableisn’t treated as a match forsize.
size loanamount exists in a Comment field, enhanced matching can't detect it.| Search text | Matching method | Treated as | Returns | Doesn't return |
|---|---|---|---|---|
size_loan |
Standard | size_loan
|
|
|
| Enhanced |
|
|
sizeable loan
|
How different search texts are interpreted
The following table shows how complex tokens are processed in the standard and enhanced matching methods.
| Search text | Standard matching | Enhanced matching |
|---|---|---|
sizeLoan
|
sizeloan
|
size, loan |
size_loan |
size_loan
|
size, loan |
size.loan |
size.loan
|
size, loan |
size-loan
|
size, loan
|
size, loan |
size=loan
|
size, loan
|
size, loan |
size123loan
|
size123loan
|
size, 123, loan |
Behavior when the "UI search appends wildcard" setting is enabled
The UI search appends wildcard setting, if enabled, uses the standard matching method but with a wildcard (*) added to the end of your search text. However, unlike the enhanced matching method, it looks for matches in all fields.
Search relevance
By default, search results on the search page are sorted in the order of descending relevance.
What is relevance in the context of search results
Relevance is a calculation of the similarity, measured across several lines of comparison, between your search text and the content of the resources in your Collibra environment.
In a set of search results, the relevance of each resource is represented by a positive number or score. The higher the score, the more relevant the resource is to your search text.
How relevance scores are derived
To derive relevance scores, Collibra uses a combination of query clauses.
Query clauses
When you perform a search, Collibra queries the database using various query clauses. Each query clause compares the similarity between your search text and your Collibra resources along a different line of comparison.
Some examples of the objectives of different query clauses are as follows:
- Calculate the similarity between the spelling of your search text and the text found in a field in the database.
- Calculate how frequently your search text appears in a field. The more often it appears, the greater the relevance. A field containing 5 occurrences of a given text is more likely to be relevant than a field containing a single occurrence of the text.
- Calculate the occurrence percentage of a text among all words in a particular field.
For example, if your search text occurs twice in the 10-word description of an asset, that asset will have a higher relevance score than an asset for which your search text occurs twice in its 20-word description.