[Koha-bugs] [Bug 14759] Replacement for Text::Unaccent
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Tue Dec 8 04:49:49 CET 2015
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14759
--- Comment #12 from David Cook <dcook at prosentient.com.au> ---
More interesting things...
You can still have a Perl string with a UTF8 flag set, even when you're not
using "use utf8"...
My example:
my $arabic = "\x{0645}";
PV = 0x190e950 "\331\205"\0 [UTF8 "\x{645}"]
Interestingly, if I don't use "use utf8", and use a UTF8 encoded character in
my source code, I get a string without a UTF8 flag:
my $arabic_text = "ﻡ";
PV = 0x1ea92b0 "\331\205"\0
I imagine use of the \x{} construct must do a utf8::upgrade...
--
In any case, if I put $arabic and $arabic_text into the same string, I get the
following:
my $arabic_result = "Arabic = $arabic_text = $arabic";
say $arabic_result;
Arabic = Ù� = م
PV = 0x29feda0 "Arabic = \303\231\302\205 = \331\205"\0 [UTF8 "Arabic =
\x{d9}\x{85} = \x{645}"]
However, if I try "$arabic_text = decode("UTF-8",$arabic_text")", which
according to http://perldoc.perl.org/Encode.html means: $characters =
decode('UTF-8', $octets), then I get the following:
Arabic = م = م
PV = 0x15efe50 "Arabic = \331\205 = \331\205"\0 [UTF8 "Arabic = \x{645} =
\x{645}"]
Alternatively, I could have done "$arabic = encode("UTF-8",$arabic);", which
would yield this result:
Arabic = م = م
PV = 0x832210 "Arabic = \331\205 = \331\205"\0
This explains the UTF8 flag a bit:
http://perldoc.perl.org/Encode.html#The-UTF8-flag
So yeah... that's cool... who knew that was a thing, eh?
--
You are receiving this mail because:
You are watching all bug changes.
More information about the Koha-bugs
mailing list