[Koha-bugs] [Bug 15541] Prevent normalization during matching/import process
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Mon May 23 06:07:02 CEST 2016
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15541
David Cook <dcook at prosentient.com.au> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|Academy |
--- Comment #16 from David Cook <dcook at prosentient.com.au> ---
(In reply to Marcel de Rooy from comment #11)
> [1] Although your changes in SimpleSearch are very small, it would probably
> help if you would add a test case in t/db_dependent/Search.t. (It seems to
> fail some tests currently, separate from your patch.)
When I run "prove t/db_dependent/Search.t", it says that all tests are
successful, although there are lots of warnings generated.
I'll review the test code to see if I can add a test...
> [2] The source_normalizer or norms just lead me to two FIXMEs:
> # FIXME normalize, substr
> # FIXME - default normalizer
These FIXME messages pre-date my patch.
> You introduce none and raw in the regex while this field actually does
> nothing (and should not have been there in this stage).
I don't understand this sentence.
> Your patch starts
> using this field, while the only thing you want is not running _normalize
> for (mainly?) URLs.
URLs are certainly the main thing I want to not normalize, but honestly nothing
should be normalized during the matching, as Zebra already handles
normalization for search queries.
> If we do not really solve the problem of this unused
> field, I think we should not touch it. BTW What would be the difference
> between raw and none? The regex suggests more than what we offer.
There is no difference between raw and none. They're just synonyms.
I'm open to removing _normalize() from the matching completely. Honestly, I
figured that the community would be against it, so I created a way of opting
out of the default _normalize(). That way default system behaviour doesn't
change, while some people will know how to deactivate it.
> [3] I have some doubts about this new condition:
> if ($QParser && $matchpoint->{'index'} !~ m/\w,\w/) {
> Somehow you managed to bump in another unfinished area here too :)
> I would rather leave the condition as it was. In the QParser branch you
> could choose to ignore/remove the second specifier word with a similar regex.
>
It's not really a new condition. If you look at C4::Search::SimpleSearch,
you'll see that it's already there:
$QParser = C4::Context->queryparser if
(C4::Context->preference('UseQueryParser') && ! ($query =~ m/\w,\w|\w=\w/));
Also, you can't remove the second specifier word, because it's essential to the
query. You also can't ignore it, because the QParser can't handle it. That's
why I added the same condition that's found in C4::Search::SimpleSearch. You
need to catch this condition before going into the QParser code block.
--
Can the QA team indicate how to proceed?
I'll look at adding a unit test. However, the QParser condition or one similar
to it needs to stay. As for _normalize(), I'll remove it completely, if that's
a change which will be accepted by the QA team and the RM.
--
You are receiving this mail because:
You are watching all bug changes.
More information about the Koha-bugs
mailing list