[Koha-devel] Prevent normalization during matching/import process

David Cook dcook at prosentient.com.au
Mon Jan 11 07:20:22 CET 2016


Hi all:

 

I've opened a bug for preventing normalization during the matching/import
process: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15541

 

I was trying to use a URL field as a matchpoint, and it was going horribly
badly.

 

1)      By default, punctuation is stripped, leading/trailing spaces are
trimmed, and more than one space is condensed down to one space. This makes
a URL into a string without any spaces or punctuation.

2)      So, I had to add "Identifier-other:u" to biblio-zebra-indexdefs.xsl
but I couldn't access it until I tried "id-other,st-urx" as my match point.
The st-urx is necessary to make it use the ":u" register.

3)      I also added some code to C4/Matcher.pm so that a match point
normalizer of "None" would disable the normalization from #1. 

4)      I also plan to add a flag to C4::Search::SimpleSearch to disable the
s/:/=/g normalization since that also destroys the URL in the query and
makes it fail to match. 

 

I've only tested this so far with CHR but it works well. I'll probably look
at ICU tomorrow.

 

I'm sure there are probably other cases than just URLs where we will want to
skip the default normalizing when doing matching.  or normalize it in a way
that accords with the way Zebra normalizes the data in records. For
instance, Zebra will replace punctuation with a space for "phrase" indexes
rather than just stripping it out and leaving nothing behind. 

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St, Ultimo, NSW 2007

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20160111/1097cd9d/attachment.html>


More information about the Koha-devel mailing list