[Koha-devel] Bug 8375 is unfortunatly wrong solution for wrong problem

Thu Jul 19 15:39:37 CEST 2012

I'm trying to make label printing work for utf-8 characters. I noticed
that Bug 8375[1] got included in master, and I think it's a wrong
solution to problem.

1: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8375

I'm copy/pasting my comment on this bug here to give it wider exposure
because I think this is somewhat critical:

I'm getting

Wide character in compress at /usr/share/perl5/PDF/Reuse.pm line 825.

errors when I try to use label printing with utf-8 characters (with or
without this patch).

However, when changing utf8::decode to utf8::encode in this patch,
everything works. Let me try to explain why:

PDF::Reuse does binmode on it's file-handle so we shouldn't pass any
utf-8 characters ("wide characters") to it. However, this patch proposes
using utf8::decode which (according to perldoc utf8) does:

· $success = utf8::decode($string)

  Attempts to convert in-place the octet sequence in UTF-X to
  the corresponding character sequence.  The UTF-8 flag is turned on
  only if the source string contains multiple-byte UTF-X
  characters.  If $string is invalid as UTF-X, returns false; otherwise
  returns true.

So decode is used when we are converting bytes into utf-8 strings, not
the other way around.

However, utf8::encode does exactly what we need:

· utf8::encode($string)

  Converts in-place the character sequence to the corresponding
  octet sequence in UTF-X.  The UTF8 flag is turned off, so that
  after this operation, the string is a byte string.  Returns
  nothing.

Which is exactly why I'm proposing to change this patch to use encode
instead of decode.

I guess that this patch has passed sign-off and QA because it's a
special case:

We are not correctly converting latin-1 characters into utf-8 when using
Z39.50 search, so we get utf-8 marked string which contains latin1
umlauts inside it. In that case, utf8::decode correctly strips utf-8
flag and everything works, but this change also breaks real string which
have utf-8 characters in them.

-- 
Dobrica Pavlinusic               2share!2flame            dpavlin at rot13.org
Unix addict. Internet consultant.             http://www.rot13.org/~dpavlin