Use Levenshtein distance in query

gomes · February 27, 2020, 12:33pm

Enonic version: 7.2.0
OS: Linux Mint 19.3

Is possible use the Levenshtein distance with the expression into double quotes?

For example, these two queries returns different results:

fulltext('_allText', 'utprøvence~2', 'OR') => Return 95 results
fulltext('_allText', '"utprøvence"~2', 'OR') => No results

I need use double quotes, because I have expressions that have more than 1 word.

rymsha · February 27, 2020, 9:33pm

Maybe it is related to this one?
https://discuss-3.enonic.com/t/levenshtein-distance-in-fulltext-search-with-and-letters-o-a-etc/1613

gomes · February 28, 2020, 11:42am

Maybe. But this topic is related to the double quotes. Without double quotes, the “Levenshtein distance” works correctly.

rymsha · February 28, 2020, 1:02pm

I tried fulltext('_allText', '"utprøvence"~2', 'OR') in Data Toolbox and got same results s for fulltext('_allText', 'utprøvence~2', 'OR'), although it was only one item called utprøvence in the data.
So, I currently can’t reproduce the behavior you describe.

gomes · March 18, 2020, 4:01pm

Another question about this.
The Levenshtein distance consider the character space to return the results of query?
For example, the query

fulltext('_allText', 'myword~1', 'OR')

return the results with word my word?

rymsha · March 21, 2020, 4:05pm

Due to tokeization which uses space as a delimeter it won’t work.
I suspect you are tying to solve compound words in Norwegian language. Enonic XP uses Elasticsearch which does not provide this functionality out of the box.

KristianD · March 24, 2020, 7:54am

Does the levenshtein distance not allow the misspelling of a spesific word by one character?
fulltext(‘displayName’, ‘tannhelsetjeneser~1’)

should this not equal?
fulltext(‘displayName’, ‘tannhelsetjenester’)

and return the same amount of results?

tsi · March 24, 2020, 8:11am

The levenshtein Query should pot. return more hits than the exact one

KristianD · March 26, 2020, 8:39am

levenshtein Query is returning less hits in this case.
https://www.helsedirektoratet.no/search?searchquery=tannhelsetjeneser (92 hits)
https://www.helsedirektoratet.no/search?searchquery=tannhelsetjenester (197 hits)

fulltext(‘displayName’, ‘tannhelsetjeneser~1’)

tsi · March 30, 2020, 7:21am

You are also using fewer characters, not just misspelling. The levenshtein algorithm is handled by Elasticsearch, so you could always check out what they say about it?

KristianD · March 30, 2020, 8:58am

Could this also occur because the fuzzy in word tannhelsetjenester consider the character space between characters too?

KristianD · April 1, 2020, 12:00pm

I only get 92 results if I add an extra character. (ref You are also using fewer characters, not just misspelling.)

https://www.helsedirektoratet.no/search?searchquery=tannhelsetjenesterr

Use Levenshtein distance in query

Enonic version: 7.2.0 OS: Linux Mint 19.3

Enonic version: 7.2.0
OS: Linux Mint 19.3