[Koha-bugs] [Bug 27299] New: Zebra phrase register is incorrectly tokenized

Tue Dec 22 23:56:39 CET 2020

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27299

            Bug ID: 27299
           Summary: Zebra phrase register is incorrectly tokenized
 Change sponsored?: ---
           Product: Koha
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5 - low
         Component: Searching - Zebra
          Assignee: koha-bugs at lists.koha-community.org
          Reporter: dcook at prosentient.com.au

Recently, I noticed issues with "exact" matching for authority linking when
using Zebra ICU. 

I've documented those issues upstream on the idzebra project on Github:
https://github.com/indexdata/idzebra/issues/24

Adam Dickmeiss and I are still working through this issue, but it seems very
likely to me that the issue is that we are tokenizing strings for the "p"
register when we should not be. 

Looking at Zebra CHR, the "p" register is not tokenized. According to Zebra's
own documentation
(https://software.indexdata.com/zebra/doc/querymodel-zebra.html#querymodel-pqf-apt-mapping-structuretype),
the "p" register is supposed to be "Character normalized, but not tokenized
index for phrase matches". 

I'm still waiting for Adam to confirm my solution, but I've opened this bug
report to track things on the Koha side, and to include a patch which I hope
will resolve these problems.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.