[Koha-bugs] [Bug 7029] Searching : fuzzy and stemming

Mon May 25 01:36:48 CEST 2020

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7029

David Cook <dcook at prosentient.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dcook at prosentient.com.au

--- Comment #13 from David Cook <dcook at prosentient.com.au> ---
I love indexing problems...

In the Web UI, I just tried "kw, wrdl: chien" and "kw, wrdl: chine" in an
English/Chinese Koha instance I have and got back 117 and 115 results
respectively. 

In yaz-client, I tried "kw,wrdl=chien" and "kw,wrdl=chine" and got 0 results
both times. However "chinese" and "chienese" both got 115 results...

I add "format xml" and "elements zebra::snippet" and retry "chienese", and I
get the following:

Z> show 1
Sent presentRequest (1+1).
Records: 1
Record type: XML
<record xmlns="http://www.indexdata.com/zebra/">
  <snippet name="Any" type="w"><s>Chinese</s></snippet>
</record>nextResultSetPosition = 2
Elapsed: 0.032032

Looking at the Zebra setup, that's using CHR indexing (via word-phrase-utf.chr)
rather than ICU indexing. 

In word-phrase-utf.chr, I see the following:
equivalent ï(ie)
map ï           i

My *guess* is that "ie" also gets searched as "ï" which is treated as "i", so
"chien" would also search "chin". 

This corresponds with what Katrin and Marjorie are saying.

A bit odd that "chien/chin" would match "chine", but QueryAutoTruncation would
cause that to happen, as "chien" would be treated as "chin" and auto truncated
as "chin*".

So... that's probably that one explained. 

As for the solution... probably update "map ï i" to "map ï (ie)".

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
You are watching all bug changes.