[Koha-bugs] [Bug 15541] Prevent normalization during matching/import process

Mon May 23 06:07:02 CEST 2016

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15541

David Cook <dcook at prosentient.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|Academy                     |

--- Comment #16 from David Cook <dcook at prosentient.com.au> ---
(In reply to Marcel de Rooy from comment #11)
> [1] Although your changes in SimpleSearch are very small, it would probably
> help if you would add a test case in t/db_dependent/Search.t. (It seems to
> fail some tests currently, separate from your patch.)

When I run "prove t/db_dependent/Search.t", it says that all tests are
successful, although there are lots of warnings generated.

I'll review the test code to see if I can add a test...

> [2] The source_normalizer or norms just lead me to two FIXMEs:
> # FIXME normalize, substr
> # FIXME - default normalizer

These FIXME messages pre-date my patch. 

> You introduce none and raw in the regex while this field actually does
> nothing (and should not have been there in this stage). 

I don't understand this sentence.

> Your patch starts
> using this field, while the only thing you want is not running _normalize
> for (mainly?) URLs. 

URLs are certainly the main thing I want to not normalize, but honestly nothing
should be normalized during the matching, as Zebra already handles
normalization for search queries.

> If we do not really solve the problem of this unused
> field, I think we should not touch it. BTW What would be the difference
> between raw and none? The regex suggests more than what we offer.

There is no difference between raw and none. They're just synonyms. 

I'm open to removing _normalize() from the matching completely. Honestly, I
figured that the community would be against it, so I created a way of opting
out of the default _normalize(). That way default system behaviour doesn't
change, while some people will know how to deactivate it.

> [3] I have some doubts about this new condition:
>     if ($QParser && $matchpoint->{'index'} !~ m/\w,\w/) {
> Somehow you managed to bump in another unfinished area here too :) 
> I would rather leave the condition as it was. In the QParser branch you
> could choose to ignore/remove the second specifier word with a similar regex.
> 

It's not really a new condition. If you look at C4::Search::SimpleSearch,
you'll see that it's already there:

$QParser = C4::Context->queryparser if
(C4::Context->preference('UseQueryParser') && ! ($query =~ m/\w,\w|\w=\w/));

Also, you can't remove the second specifier word, because it's essential to the
query. You also can't ignore it, because the QParser can't handle it. That's
why I added the same condition that's found in C4::Search::SimpleSearch. You
need to catch this condition before going into the QParser code block.

--

Can the QA team indicate how to proceed? 

I'll look at adding a unit test. However, the QParser condition or one similar
to it needs to stay. As for _normalize(), I'll remove it completely, if that's
a change which will be accepted by the QA team and the RM.

-- 
You are receiving this mail because:
You are watching all bug changes.