[Koha-devel] utf-8 handling in Koha

Mon Oct 24 21:37:42 CEST 2011

In our migration to new koha, we hit bug 6554. We are having similar
problem, but in our case we don't use localized templates but have utf-8
characters inside MySQL which get double utf-8 encoded before they are sent
to browser.

1: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=6554

Below is short summary of changes from patch:

This patch tries to clean up utf-8 handling in Koha.

In current implementation (mostly commented out in this patch)
uses heuristic to guess which strings need decoding from utf-8
to binary representation and doesn't support utf-8 characters
in templates and has problems with utf-8 data from database.

With this changes, Koha perl code always uses utf-8 encoding
correctly. All incomming data from database is allready
correctly marked as utf-8, and decoding of utf8 is required
only from Zebra and XSLT transfers which don't set utf-8 flag
correctly.

For output, standard perl :utf8 handler is used removing various
"wide character" warnings as side-effect.

I would love to hear your thoughts on this approach. So far, I know that
it breaks CGI::Session (which is documented as known bug in it's
documentation) so after first reload library names, shelfs and other
data returned from session isn't encoded correctly.

I would also need to check if this change affect LDAP, Z39.50 encoding
and SIP server, but before I start down this road do you see any reasons
not to persueue it? Compatibility with older perl versions might be
one reason. I'm running perl v5.10.1 from Debian squeeze.

-- 
Dobrica Pavlinusic               2share!2flame            dpavlin at rot13.org
Unix addict. Internet consultant.             http://www.rot13.org/~dpavlin