[Koha-devel] Zebra hyphen truncating search

David Cook dcook at prosentient.com.au
Thu Feb 5 09:05:11 CET 2015


Hi Fridolin:

I'm sorry, but I'm not entirely sure what you're saying.

In regards to the following rule,

<transliterate rule="[:Number:] { '-' > '' "/> 

Actually, I was going to email about this rule... it doesn't actually
provide an equivalence between numbers with hyphens and without. It actually
replaces a hyphen "-" with a single quote "'". That single quote is then
removed by the following rule:

<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>

In effect, these two rules in conjunction create the equivalence between
numbers with hyphens and those without. That is "99-99" gets transformed
into "9999" when indexing/searching.

The rule that we "should" be using is the following:

<transliterate rule="[:Number:] { '-' > "/>

You can test it out on a Koha setup, or you can use
http://demo.icu-project.org/icu-bin/translit.

--

In any case, that rule doesn't come into effect in the case of
"UPA-CE14060101", because the regex doesn't match. 

I could use something like the following rule to convert the hyphen to
space:

<transliterate rule="[:Letter:] { '-' > ' ' "/>

However, this doesn't work due to the tokenization problems. The hyphen does
successfully get transliterated into a space, which the tokenize rule will
break use as a breaking space, but the tokenize rule only returns the first
token. 

In regards to the tokenization problem I've observed in 2.0.59, I've already
reported the bug to Indexdata, and they've assigned it an issue number:
YAZ-820.    

As far as I know though, the tokenize rule is an Indexdata thing rather than
an ICU thing per se. It's run by the YAZ ICU utility, but it's their own
thing I believe.

Cheers,

David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St, Ultimo, NSW 2007

> -----Original Message-----
> From: koha-devel-bounces at lists.koha-community.org [mailto:koha-devel-
> bounces at lists.koha-community.org] On Behalf Of Fridolin SOMERS
> Sent: Tuesday, 3 February 2015 7:17 PM
> To: koha-devel at lists.koha-community.org
> Subject: Re: [Koha-devel] Zebra hyphen truncating search
> 
> In actual Koha code, this rule in words-icu.xml provides an equivalence
> between numbers with hyphens and without (specially ISBN) :
> 
> <transliterate rule="[:Number:] { '-' > '' "/>
> 
> This means this behavior is not by default in ICU.
> 
> Regards,
> 
> Le 03/02/2015 07:16, David Cook a écrit :
> > Hi Colin!
> >
> >
> >
> > I stumbled across an email you sent to the Indexdata Zebralist in 2013:
> > http://lists.indexdata.dk/pipermail/zebralist/2013-August/002576.html
> >
> >
> >
> > Were you ever able to solve those problems?
> >
> >
> >
> > Also, what version of Zebra were you running? I've noticed this
> > problem with Zebra using 2.0.59, but I haven't been able to produce it
> > using Zebra 2.0.47. I had the exact same configuration files, MySQL
> > database, and Zebra indexes. Here's an example:
> >
> >
> >
> > 2.0.59:
> >
> >
> >
> > Z> f UPA-CE14060101
> >
> > Sent searchRequest.
> >
> > Received SearchResponse.
> >
> > Search was a success.
> >
> > Number of hits: 12325, setno 1
> >
> > SearchResult-1: term=UPA cnt=12325
> >
> > records returned: 0
> >
> > Elapsed: 0.021128
> >
> >
> >
> > 2.0.47:
> >
> >
> >
> > Z> f UPA-CE14060101
> >
> > Sent searchRequest.
> >
> > Received SearchResponse.
> >
> > Search was a success.
> >
> > Number of hits: 1, setno 3
> >
> > SearchResult-1: term=UPA cnt=12325, term=CE14060101 cnt=1
> >
> > records returned: 0
> >
> > Elapsed: 0.015657
> >
> >
> >
> > --
> >
> >
> >
> > It looks like the packages in Debian 7 and Ubuntu 14.04 are stuck back
> > at Zebra 2.0.44, which is from 2010. However, Indexdata apparently has
> > newer deb packages for both Debian and Ubuntu:
> http://ftp.indexdata.dk/pub/zebra/.
> > In fact, the server I'm having problems with appears to have installed
> > Zebra
> > 2.0.59 using the following "/etc/apt/sources.list.d/indexdata.list"
> > pointing to "deb http://ftp.indexdata.dk/debian squeeze main".
> >
> >
> >
> > I'm guessing that you were using a Zebra version that was higher than
> > 2.0.47? Possibly a deb package installed from Indexdata's repos?
> >
> >
> >
> > I know Tomas has had good luck contacting Indexdata folk. maybe he has
> > some ideas about how to find a solution to this issue?
> >
> >
> >
> > David Cook
> >
> > Systems Librarian
> >
> > Prosentient Systems
> >
> > 72/330 Wattle St, Ultimo, NSW 2007
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Koha-devel mailing list
> > Koha-devel at lists.koha-community.org
> > http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> > website : http://www.koha-community.org/ git :
> > http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
> >
> 
> --
> Fridolin SOMERS
> Biblibre - Pôles support et système
> fridolin.somers at biblibre.com
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : http://www.koha-community.org/
> git : http://git.koha-community.org/
> bugs : http://bugs.koha-community.org/




More information about the Koha-devel mailing list