[Koha-bugs] [Bug 21357] Filter elisions from index and search terms in Elasticsearch

Fri Jun 23 01:24:01 CEST 2023

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21357

--- Comment #77 from David Cook <dcook at prosentient.com.au> ---
(In reply to Katrin Fischer from comment #76)
> I am not sure I follow that argument. We already have been ignoring ' with
> Zebra for many years and never got a bug filed about it. On the other hand,
> we have a lot of people now that ask for it to ignore the '. 

We don't ignore the apostrophe/single quote in Zebra. We change it to a space.
That's significantly different than replacing it with nothing from a
tokenization perspective.

> Would it be possible to make it configurable? If not, I'd say we should just
> go with it. It's a big issue for French. And searching 'dont' is what I feel
> is more common than 'don t'.

I think I've talked about some options on a different bug about indexing both
options, so there probably are some options (like indexing with and without
punctuation) for making a Koha configuration for it. But it would probably be a
bit error-prone as we'd need to change all of Koha to support it.

I'm not using Elasticsearch yet, so I'm happy for people to try things out.
Just trying to think of any pitfalls. And also hoping that we can try to keep
Zebra and Elasticsearch similar in their configuration - or else we should
think about deprecating Zebra and just focusing on Elasticsearch.

Regarding "don't", I would hope that people would search for "don't" instead of
"dont" or "don t". Note that "dont" is also a French word, so searches for
"don't" would match for titles including the word "dont". But multilingual
indexing is hard anyway...

-- 
You are receiving this mail because:
You are watching all bug changes.