[Koha-bugs] [Bug 10939] ICU does not transliterate polish special characters
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Thu Nov 7 10:40:37 CET 2013
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10939
--- Comment #17 from Jacek Ablewicz <abl at biblos.pk.edu.pl> ---
Ouch, looks like problems with "ł, Ł' are not related to UCA < v5.2 quirks like
I thought previously; not at all. Real problem here being that:
NFD; [:Nonspacing Mark:] Remove; NFC
apparently does not work for 'striked through' latin characters (same problem
with: ø, Ø, ƶ, Ƶ etc., not just with Ł, ł). I guess that's because NFD is not
decomposing 'ł' to 'l + /' like it does for accented characters and so on. For
ICU indexing to behave (more or less) like CHR
(zebradb/etc/word-phrase-utf.chr) did, we can:
1) add something like that:
<transliterate rule="{ Ø > o "/>
<transliterate rule="{ ø > o "/>
<transliterate rule="{ Đ > d "/>
<transliterate rule="{ đ > d "/>
<transliterate rule="{ Ħ > h "/>
<transliterate rule="{ ħ > h "/>
<transliterate rule="{ Ł > l "/>
<transliterate rule="{ ł > l "/>
<transliterate rule="{ Ŧ > t "/>
<transliterate rule="{ ŧ > t "/>
<transliterate rule="{ Ƶ > z "/>
<transliterate rule="{ ƶ > z "/>
<transliterate rule="{ Ǥ > g "/>
<transliterate rule="{ ǥ > g "/>
<transliterate rule="{ Ⱥ > a "/>
<transliterate rule="{ ⱥ > a "/>
<transliterate rule="{ Ȼ > c "/>
<transliterate rule="{ ȼ > c "/>
<transliterate rule="{ Ɇ > e "/>
<transliterate rule="{ ɇ > e "/>
<transliterate rule="{ Ɍ > r "/>
<transliterate rule="{ ɍ > r "/>
<transliterate rule="{ Ɏ > y "/>
<transliterate rule="{ ɏ > y "/>
<transliterate rule="{ Ɨ > i "/>
<transliterate rule="{ ɨ > i "/>
<transliterate rule="{ ʉ > u "/>
<transliterate rule="{ Ʉ > u "/>
<transliterate rule="{ Ӕ > ae "/>
<transliterate rule="{ ӕ > ae "/>
<transliterate rule="{ Œ > oe "/>
<transliterate rule="{ œ > oe "/>
to words-icu.xml, or:
2) as Julien suggested, we may use built-in Latin-ASCII ICU transliterator,
i.e. add:
<transform rule="[:Latin:] Latin-ASCII"/>
Solution 2) looks much better IMO (it's more general-purpose) but it may not be
ideal for everybody, as Latin-ASCII transliterator is not implemented in
pre-4.6 ICU versions.
--
You are receiving this mail because:
You are watching all bug changes.
More information about the Koha-bugs
mailing list