[Koha-bugs] [Bug 15541] Prevent normalization during matching/import process

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Fri Apr 29 02:09:22 CEST 2016


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15541

David Cook <dcook at prosentient.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #47022|0                           |1
        is obsolete|                            |

--- Comment #15 from David Cook <dcook at prosentient.com.au> ---
Created attachment 50971
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=50971&action=edit
Bug 15541 - Prevent normalization during matching/import process

This patch allows you to use the "qualifier,qualifier" syntax
in the Record Matching Rules "Search Index" when using the QueryParser.
While QueryParser doesn't support this syntax, it will now fallback
correctly to non-QueryParser functionality. Without the patch, your search
will just fail silently.

This patch also adds a "skip_normalize" flag to C4::Search::SimpleSearch(),
and sets the flag during C4::Matcher::get_matches. This prevents
the s/:/=/g and s/=/:/g normalization which happens in a heavy-handed
way to provide correct query syntax. However, get_matches() already uses
the correct syntax, so this normalization is unneeded. The normalization
also mangles URLs which causes failures when using the "url"
(ie "u") register in Zebra (see bug 15555).

This patch also creates "raw" and "none" normalizers for the
Record Matching Rules, which prevents the stripping of
spaces and punctuation by Koha prior to sending queries to Zebra.
This is important for a number of reasons. First and foremost,
Zebra does normalization better than Koha. ICU and CHR normalize
strings differently, so it's better not to try to outsmart
Zebra with pre-normalization of punctuation and spaces, as
it will lead to search problems. Second, when using the "u"
register in Zebra, you don't want to normalize the value, since
it's stored "as is" in the Zebra database. Normalization causes
search failures.

_TEST PLAN_

1) Apply patches from
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15555
2) Create a record in Koha with a 024 $a http://koha-community.org/test $2 uri
3) Do a full re-index of Zebra
4) Create a Record Matching Rule in Koha with the following details:
    Description: Test
    Match threshold: 100
    Record type: Bibliographic
    Match point 1:
        Search index: id-other,st-urx
        Score: 100
        Tag: 024
        Subfields: a
        Normalization rule: raw
5) Download your record from Koha as a .mrc file (ie isomarc, binary marc, etc)
6) Go to "Stage MARC records for import"
7) Upload your .marc file.
8) Change your "Record matching rule" to "Test"
9) Click Stage for import
10) It should say "1 records with at least one match in catalogu per matching
rule "Test".

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list