[Koha-bugs] [Bug 13064] Indexing problem with ICU on control characters
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Tue Oct 14 14:28:28 CEST 2014
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064
Chris Cormack <chris at bigballofwax.co.nz> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #32136|0 |1
is obsolete| |
--- Comment #2 from Chris Cormack <chris at bigballofwax.co.nz> ---
Created attachment 32296
-->
http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=32296&action=edit
Bug 13064 - Indexing problem with ICU on control characters
The ICU configuration files contains a rule to remove control characters :
<transform rule="[:Control:] Any-Remove"/>
This rule is before tokenization.
The problem is that "[:Control:]" regex contains line feed, carriage return and
tab. See http://www.regular-expressions.info/posixbrackets.html.
So when several lines are indexed, last word of line is joined with first line
of next line. Thoses words are then not searchable.
For example :
First line
Second line
This will become "First lineSecond line", tokenized as "First", "lineSecond"
and "line".
Test plan :
- Use ICU in Zebra configuration
- Choose an indexed field, like 300$a
- Create a new record
- Enter several lines in choosen field, like :
First line
Second line
- Index this record
=> Without patch the search on "Second" does not return the record
=> With patch the search on "Second" returns the record
- Same tests with tab and carriage return instead of line feed
Signed-off-by: Chris Cormack <chris at bigballofwax.co.nz>
--
You are receiving this mail because:
You are watching all bug changes.
More information about the Koha-bugs
mailing list