[Koha-bugs] [Bug 14759] Replacement for Text::Unaccent

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Dec 8 23:11:39 CET 2015


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14759

--- Comment #15 from David Cook <dcook at prosentient.com.au> ---
(In reply to Galen Charlton from comment #14)
> Other way around: Text::Unaccent is not, as it would be much preferable,
> emitting Perl Unicode strings; rather, it is emitting octet-sequences.

Sorry, I must have been unclear; I meant to say that Text::Unaccent is emitting
octet-sequences (hence why using encode() on the string returned by
Text::Unaccent would create a Perl Unicode string).

And that Perl itself was causing problems when it tried to create a new string
from an octet sequence string and a Perl Unicode string.

> A good pattern is aim for is using *only* Unicode strings within core code,
> and relegating use of Encode and friends to input and output; Text::Unaccent
> would get in the way of that.

Fair enough. I'm not in favour of Text::Unaccent per se. I was curious why it
seemed to mangle some strings, and I shared what answers I found. 

I suspect Unicode::Normalize will really be the way to go, as you suggest. It
seems much more comprehensive than Text::Unaccent and Text::Unaccent::PurePerl.
I imagine we just need feedback from people experienced in Arabic, Hebrew, and
CJK languages.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list