[Koha-bugs] [Bug 7029] Searching : fuzzy and stemming
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Mon May 25 01:36:48 CEST 2020
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7029
David Cook <dcook at prosentient.com.au> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dcook at prosentient.com.au
--- Comment #13 from David Cook <dcook at prosentient.com.au> ---
I love indexing problems...
In the Web UI, I just tried "kw, wrdl: chien" and "kw, wrdl: chine" in an
English/Chinese Koha instance I have and got back 117 and 115 results
respectively.
In yaz-client, I tried "kw,wrdl=chien" and "kw,wrdl=chine" and got 0 results
both times. However "chinese" and "chienese" both got 115 results...
I add "format xml" and "elements zebra::snippet" and retry "chienese", and I
get the following:
Z> show 1
Sent presentRequest (1+1).
Records: 1
Record type: XML
<record xmlns="http://www.indexdata.com/zebra/">
<snippet name="Any" type="w"><s>Chinese</s></snippet>
</record>nextResultSetPosition = 2
Elapsed: 0.032032
Looking at the Zebra setup, that's using CHR indexing (via word-phrase-utf.chr)
rather than ICU indexing.
In word-phrase-utf.chr, I see the following:
equivalent ï(ie)
map ï i
My *guess* is that "ie" also gets searched as "ï" which is treated as "i", so
"chien" would also search "chin".
This corresponds with what Katrin and Marjorie are saying.
A bit odd that "chien/chin" would match "chine", but QueryAutoTruncation would
cause that to happen, as "chien" would be treated as "chin" and auto truncated
as "chin*".
So... that's probably that one explained.
As for the solution... probably update "map ï i" to "map ï (ie)".
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
You are watching all bug changes.
More information about the Koha-bugs
mailing list