[Koha-bugs] [Bug 15541] Prevent normalization during matching/import process

Tue May 31 04:55:43 CEST 2016

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15541

--- Comment #21 from David Cook <dcook at prosentient.com.au> ---
(In reply to Marcel de Rooy from comment #20)
> I understand your point hopefully :)
> If we start using this field: Since it was meant to add a Normalize class, I
> think you should create at least a rudimentary one now.
> Moving the _normalize code and add your none/raw normalizations.

I'm not sure that I understand your point, unfortunately :(.

I think using normalization during matching is a bad idea, as Zebra already
normalizes the query before processing it. Here's the problem:

Custom normalization: Strip hyphens
Zebra normalization: Strip colons

Data: "Mont-Royal: it's a mountain"

Zebra would index it as "Mont-Royal it's a mountain".
You'd retrieve that exact phrase with the query "Mont-Royal: it's a mountain",
which would be transformed into "Mont-Royal it's a mountain" by Zebra.

However, with the custom normalization, your query would be "MontRoyal: it's a
mountain" before going to Zebra, and then it would be "MontRoyal it's a
mountain" when processed by Zebra.

If you're doing an exact search, your search will fail, even though that same
data in the regular Koha search would work.

We need to normalize indexing/retrieval consistently, and the way to do that is
with Zebra alone.

If there needs to be extra normalization, I think it should be done before the
record even makes it to Koha...

-- 
You are receiving this mail because:
You are watching all bug changes.