[Koha-devel] utf-8 handling in Koha
Dobrica Pavlinusic
dpavlin at rot13.org
Mon Oct 24 21:37:42 CEST 2011
In our migration to new koha, we hit bug 6554. We are having similar
problem, but in our case we don't use localized templates but have utf-8
characters inside MySQL which get double utf-8 encoded before they are sent
to browser.
1: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=6554
Below is short summary of changes from patch:
This patch tries to clean up utf-8 handling in Koha.
In current implementation (mostly commented out in this patch)
uses heuristic to guess which strings need decoding from utf-8
to binary representation and doesn't support utf-8 characters
in templates and has problems with utf-8 data from database.
With this changes, Koha perl code always uses utf-8 encoding
correctly. All incomming data from database is allready
correctly marked as utf-8, and decoding of utf8 is required
only from Zebra and XSLT transfers which don't set utf-8 flag
correctly.
For output, standard perl :utf8 handler is used removing various
"wide character" warnings as side-effect.
I would love to hear your thoughts on this approach. So far, I know that
it breaks CGI::Session (which is documented as known bug in it's
documentation) so after first reload library names, shelfs and other
data returned from session isn't encoded correctly.
I would also need to check if this change affect LDAP, Z39.50 encoding
and SIP server, but before I start down this road do you see any reasons
not to persueue it? Compatibility with older perl versions might be
one reason. I'm running perl v5.10.1 from Debian squeeze.
--
Dobrica Pavlinusic 2share!2flame dpavlin at rot13.org
Unix addict. Internet consultant. http://www.rot13.org/~dpavlin
More information about the Koha-devel
mailing list