ElasticSearch feature "should" to boost specific words


#1

Hi all!

In these days, we’ve analyzed the search requirements for a specific project, and we have some considerations:

a) We believe that Enonic can only boost fields instead of specific words directly. In that way, we will need to create a tag field for the articles to be possible to score the search results with good relevance. This would be more work for the customer/editors as they would need to be inserting tags in all articles they wish to boost some specific words.

b) We have seen that ElasticSearch has a feature: “should”, that can boost specific words. We believe enonic does not implement it today, is it possible to get it available on Enonic? can we consider we will be able to use it, or may continue in an approach using tags fields for boosting?

Or any other suggestion about it, please.

Please let us know if you also see any other possibilities to deliver that requirement.


#2

Hi, sorry for the late answer.

Today we are using the elasticsearch simple-query-string function for our fulltext-function. This has the advantage that it never fails; it just skips the stuff that it doesn’t understand, so its very useful for just passing along user-input.

There is also a much richer es-query function, the https://www.elastic.co/guide/en/elasticsearch/reference/1.5/query-dsl-query-string-query.html. Amongst other thing, this support boosting single words or phrases with the same notation that you can use for fields.
We will open up for using this function by adding a new function to the query-language, eg fulltextAdvanced() or something similar that will support the word-boosting in the query string, e.g:

newFulltextFunction('myField, otherField', 'fish^2 cheese onion', AND)

In the above, matching ‘fish’ in a query will score twice as much as matches on cheese or ‘onion’

On the flip-side, it must be taken into account that this function will fail if given invalid input. This will be introduced in version 6.16.0 if everything goes as planned, and the release is not too far in the distance.

The should-notation is already available, since we are translating boolean-expression “OR” into separate should-expressions. Matching a greater number of boolean expression parts will increase the score of a query-hit. We do not yet support an explicit extra boosting of single should-expressions.