I have a question about levenshtein distance in fulltext search. We have a query like:
fulltext(‘data.*’, ‘XXX~2’, ‘AND’)
where XXX is the search word. For example we have object with title ( data.title ) ‘Enonic’ and we want to return this object when user search for ‘Enon’. Everything works fine with words in latin characters. However it doesn’t work for word ‘Grønnsaker’. User tries to search text ‘Grønnsak’. Is this a limitation from elasticsearch - https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#fuzziness ?
Seems like there is a bug there when applying the fuzzy-operator to the phrase. All special characters should be ascii-folded both index and query-time, meaning that grønnsaker should be indexed and queried as “gronnsaker”
Ill create an issue for this. In the mean-time, you could either use the ngram-query in addition to the fulltext (to get a match on beginnings, but obviously wont work on everything that the fuzzy-operator does), or as a workaround do the ascii-folding yourself on the search-phrase (e.g https://github.com/mplatt/fold-to-ascii)
A bit more info on this one. The issue with fuzzy operators is a known bug in Elastic Search, see discussion on their forum, also their internal discussion inside Github issue, where they first deprecate and then undeprecate fuzzy-operator. We could rewrite our analyzer function to handle such queries but it’s quite a bit of work. Not dropping it, just saying that this is not coming soon.
The good news is that the second part of the issue (wildcards preventing analysis of search results) seems to be easier to fix. Created a separate issue for that: https://github.com/enonic/xp/issues/7569.