[Koha-bugs] [Bug 21357] Filter elisions from index and search terms in Elasticsearch

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Fri Jan 31 10:14:26 CET 2020


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21357

--- Comment #38 from Ere Maijala <ere.maijala at helsinki.fi> ---
(In reply to Julian Maurice from comment #37)
> Ellision might not cause troubles (but what about names like "D'Amato" ?).
> I'm thinking about the next step : stemming is very different from one
> language to another and we need to find a way to have stemming for
> multi-language catalogs.

I agree that stemming is difficult, and I'e tried to purposefully limit this
bug to elisions. 

> > Separating analysis for different languages also won't work for
> > mixed-language fields.  Think about names and a (very fictional)
> > example phrase "Images from movie l'Avion". You'd get either elision
> > filtering or English stemming but not both. For sure it will still be
> > found with a simple keyword search, but it breaks at least adjacent
> > word searches and relevance ranking.
> 
> Nothing would work perfectly with mixed-language fields. But in this
> particular example, you could have another subfield `lang_en_fr` that does
> english stemming and french elision

And maybe lang_fi_fr, lang_sv_fr etc. This gets complicated pretty quickly. And
I'm afraid separating the different language analysis chains doesn't solve the
issue with stemming etc. because you'd need to avoid indexing into "wrong"
fields which would require you to know what language the string to be indexed
is in.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list