[Koha-bugs] [Bug 27153] ElasticSearch should search keywords apostrophe blind

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Aug 30 02:14:08 CEST 2022


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27153

--- Comment #14 from David Cook <dcook at prosentient.com.au> ---
I had a librarian (from an English-only library) asking for this the other day,
and I was wondering why we have apostrophes replaced with a space (both for CHR
and ICU indexing with Zebra).

But Frido's example makes a lot of sense:

(In reply to Fridolin Somers from comment #6)
> Elision removes the text : l'europe = europe
> This apostrophe filter creates : l'europe = leurope
> 
> Wrong in french but surely ranking will bet better for the exact match.

And in English, we'd have "father's" become "fathers", which means "father"
wouldn't match unless you're using right truncation (which Koha typically does
out of the box I suppose). 

But then there's a Ukrainian word під'їзд which if you broke it into під їзд,
you'd get hits for під which is a totally unrelated word.  

--

I know people have praised Google here, but it's not perfect either. 

Try searching "l'arbre під'їзд" and try searching "під'їзд l'arbre", and you'll
get wildly different results. 

It seems that Google tries to determine the language of the search query
(possibly based off the first word), and then analyze the search string based
off that. 

For "l'arbre під'їзд", you get many results with "L'Arbre", "des arbres",
"arbre", etc. Most results seem to include Russian rather than Ukrainian. 

But for "під'їзд l'arbre", you only get a few results that have the string
"L'Arbre"and most of the results are Ukrainian. 

So Google is likely determining the language of the search string and then
applying a language-specific analyzer. 

"My mom" in Chinese is 我的妈妈. 的 is the character that denotes possession of mom
(妈妈) by me/I (我). You might think then you could replace 的 with a space to
separate the 2 nouns... except 的 isn't always used that way. The word 目的 means
"goal". Doing anything to 的 would compromise the word, unless you're able to
understand the context that it's used in. 

If you google "我的妈妈", eventually it strips off "我的" and just searches for "妈妈"
since that's the main noun in the phrase. 

Search is hard.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list