[Koha-devel] Bug 8375 is unfortunatly wrong solution for wrong problem
Dobrica Pavlinusic
dpavlin at rot13.org
Thu Jul 19 15:39:37 CEST 2012
I'm trying to make label printing work for utf-8 characters. I noticed
that Bug 8375[1] got included in master, and I think it's a wrong
solution to problem.
1: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8375
I'm copy/pasting my comment on this bug here to give it wider exposure
because I think this is somewhat critical:
I'm getting
Wide character in compress at /usr/share/perl5/PDF/Reuse.pm line 825.
errors when I try to use label printing with utf-8 characters (with or
without this patch).
However, when changing utf8::decode to utf8::encode in this patch,
everything works. Let me try to explain why:
PDF::Reuse does binmode on it's file-handle so we shouldn't pass any
utf-8 characters ("wide characters") to it. However, this patch proposes
using utf8::decode which (according to perldoc utf8) does:
· $success = utf8::decode($string)
Attempts to convert in-place the octet sequence in UTF-X to
the corresponding character sequence. The UTF-8 flag is turned on
only if the source string contains multiple-byte UTF-X
characters. If $string is invalid as UTF-X, returns false; otherwise
returns true.
So decode is used when we are converting bytes into utf-8 strings, not
the other way around.
However, utf8::encode does exactly what we need:
· utf8::encode($string)
Converts in-place the character sequence to the corresponding
octet sequence in UTF-X. The UTF8 flag is turned off, so that
after this operation, the string is a byte string. Returns
nothing.
Which is exactly why I'm proposing to change this patch to use encode
instead of decode.
I guess that this patch has passed sign-off and QA because it's a
special case:
We are not correctly converting latin-1 characters into utf-8 when using
Z39.50 search, so we get utf-8 marked string which contains latin1
umlauts inside it. In that case, utf8::decode correctly strips utf-8
flag and everything works, but this change also breaks real string which
have utf-8 characters in them.
--
Dobrica Pavlinusic 2share!2flame dpavlin at rot13.org
Unix addict. Internet consultant. http://www.rot13.org/~dpavlin
More information about the Koha-devel
mailing list