[Koha-devel] utf-8, probable solution

Paul POULAIN paul.poulain at free.fr
Wed Feb 15 16:08:32 CET 2006


Thanks to Heikki Levanto, Tümer Garip & Mike Rylander, you pointed 3 
things useless alone, but very useful when mixed.

I think I have the solution to our problem. It's not a zebra or 
html::template or marc::record problem, it's a Perl one !

Let me explain :
I followed my utf-8 string in my perl Code until printed and it was 
always utf-8 (\x9c...)
But in firefox, it was iso8859-1.

Heikki told me that the first 255 char were shared by unicode and 
iso8859-1. So, I told myself : OK, Paul, add a "true utf-8 character to 
your string". I choose \x{263a} (the smiley, because i'm always 
optimistic & that is what is used in perluniintro)

Surprise ... now my é was a utf-8 é in firefox !!!!
Conclusion : perl looked at my string before sendint it, and, as it 
finds it's not "true utf-8", Perl did something to change it in iso8859-1.

I also had a brand new message in my log :
 >            Wide character in print at ...

Mike R. and Tümer G. suggestions make me investigate perldoc on unicode.
and here it is :
>        A user of Perl does not normally need to know nor care how Perl happens to encode its internal strings, but it becomes rele-
>        vant when outputting Unicode strings to a stream without a PerlIO layer -- one with the "default" encoding.  In such a case,
>        the raw bytes used internally (the native character set or UTF-8, as appropriate for each string) will be used, and a "Wide
>        character" warning will be issued if those strings contain a character beyond 0x00FF.
>        For example,
>              perl -e 'print "\x{DF}\n", "\x{0100}\x{DF}\n"'
>        produces a fairly useless mixture of native bytes and UTF-8, as well as a warning:
>             Wide character in print at ...
>        To output UTF-8, use the ":utf8" output layer.  Prepending
>              binmode(STDOUT, ":utf8");
>        to this sample program ensures that the output is completely UTF-8, and removes the program's warning.


GOTCHA ! I have added binmode(STDOUT, ":utf8"), and now, even without 
the smiley, my éà... are correctly shown.

Still having to investigate mySQL utf-8, but it seems that
 > set NAMES=utf8
is useless.

Thanks everybody for helping me. I'll continue this thread on koha-devel 
only, as zebra & perl4lib are not interested probably.
-- 
Paul POULAIN et Henri Damien LAURENT
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)





More information about the Koha-devel mailing list