[Koha-bugs] [Bug 26472] Elasticsearch - ES - Authority record results not ordered correctly due to punctuation marks

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Fri Sep 15 21:33:37 CEST 2023


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26472

--- Comment #45 from Victor Grousset/tuxayo <victor at tuxayo.net> ---
https://sourceforge.net/p/icu/mailman/icu-support/thread/d1a3c944-02a7-4cc0-87ad-4ad30b773967%40tuxayo.net/#msg37895241

> > I think you might be able to get that by reordering spaces above
> > punctuation, and setting the first non ignorable to be the first space.
> >
> 
> No, I am pretty sure that alternate=shifted compares the primary weight
> with the maxVariable setting before script reordering.
> 
> I just tried this in the ICU Collation Demo
> <https://icu4c-demos.unicode.org/icu-bin/collation.html> with
> 
>    - rule "[reorder punct space]"
>    - alternate=shifted
>    - max variable=punct
> 
> and both spaces and punctuation get shifted/ignored.
> 
> So I don't think we have a way to ignore/shift anything other than
> "anything up to the max variable".
> 
> You might be best off pre-processing the strings, removing punctuation
> characters before sending the string into collation.

Quite over my head but it seems to confirm no way to get what we want with ICU
config.


If the assumptions are correct
- punctuation ignored, but whitespace not ignored is our need
- strength: quaternary" is still better than "alternate: shifted" and better
than the current sorting
  - might need more testing since it managed to masquerade as complying with
the test plan ^^"

Then immediate move forward is go on with the "strength: quaternary" patch.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list