[Koha-devel] More info about Zebra hyphen truncating search (and announcing Zebra 2.0.60)

Mon Feb 9 00:32:22 CET 2015

Hi all:

Just letting everyone know that there is definitely a bug in Zebra 2.0.59
(and a certain number of unknown versions after 2.0.44 which is the version
in Debian and Ubuntu prior to Jessie/Vivid which both use 2.0.59 as of this
email), which causes problems with tokenization when using ICU and having a
hyphen in a search term.

Basically, the hyphen triggers tokenization, but only the first token is
used. So a search for "Mont-Royal" will actually just be a search for
"Mont". Or a search for "up-to-date" will just be a search for "up". This is
the case even when trying to use ICU transformation/transliteration rules to
remove the hyphen before tokenizing.

However, I reported the issue to IndexData on February 4th, and they wrote a
fix on February 7th
(http://git.indexdata.com/?p=idzebra.git;a=commitdiff;h=704fd190292cb771df94
553b0ed6f9f4b71660a6). They've released that fix as part of Zebra 2.0.60,
which is now available via the IndexData repositories. I just tested it on a
Debian Squeeze install (using the IndexData apt repository), and it works
great.

I'm going to ask IndexData if they can provide more information, as Robin
has volunteered to report a bug to Debian. It might be too late to get Zebra
2.0.60 shipped with Jessie, but maybe they can backport the patch or at
least be aware of the issue...

Cheers,

David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St, Ultimo, NSW 2007