[Koha-bugs] [Bug 21357] Filter elisions from index and search terms in Elasticsearch

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Fri Jan 31 10:02:09 CET 2020


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21357

--- Comment #37 from Julian Maurice <julian.maurice at biblibre.com> ---
(In reply to Katrin Fischer from comment #35)
> Hi Julian, does tit mean it searches the different representations
> simultanously? 

Only one query to ES is needed, if that's what you mean by simultaneously

> I wonder how it would work for English, thinking of words like "can't" or
> "doesn't".

The built-in english analyzer does not do anything with words that end with
"n't" but it should possible to configure a custom english analyzer that treats
"can't" and "cannot" the same way.

(In reply to Ere Maijala from comment #36)
> I can't really see the benefit since, as far as I can see, elision handling
> is not prone to cause conflicts with other language analysis.

Ellision might not cause troubles (but what about names like "D'Amato" ?).
I'm thinking about the next step : stemming is very different from one language
to another and we need to find a way to have stemming for multi-language
catalogs.

> Separating analysis for different languages also won't work for
> mixed-language fields.  Think about names and a (very fictional)
> example phrase "Images from movie l'Avion". You'd get either elision
> filtering or English stemming but not both. For sure it will still be
> found with a simple keyword search, but it breaks at least adjacent
> word searches and relevance ranking.

Nothing would work perfectly with mixed-language fields. But in this particular
example, you could have another subfield `lang_en_fr` that does english
stemming and french elision

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list