[Koha-devel] Investigations on Perl, MySQL & UTF-8

Henri-Damien LAURENT laurenthdl at alinto.com
Fri Mar 10 14:05:10 CET 2006


Pierrick LE GALL a écrit :
> Hi koha-devel,
>
> Because the story of Perl, MySQL, UTF-8 and Koha is becoming more and
> more complicated, I've decided to start my tests outside of Koha or any
> web server. I wanted to check that Perl and MySQL could communicate
> with UTF-8 data.
>
> What I did :
>
> 1. copy some UTF-8 strings from
> http://www.columbia.edu/kermit/utf8-t1.html paste into a UTF-8 text
> file utf8.txt (open/past in UTF-8 console, with Vim having :set
> encoding=utf-8)
>
> 2. create a UTF-8 database with a simple table having a TEXT field
>
> $ mysql --user=root --password=xxx
> mysql> CREATE DATABASE `utf8_test` CHARACTER SET utf8;
> mysql> connect utf8_test
> mysql> create table strings (id int, value text);
> mysql> quit
>
> (no need to set connection character set to utf-8 in that case, default
> latin1 is fine)
>
> Note: my MySQL server is latin1...
>
> $ mysql --user=root --password=xxx utf8_test
> mysql> status
> Server characterset:    latin1
> Db     characterset:    utf8
> Client characterset:    latin1
> Conn.  characterset:    latin1
> mysql> set names 'UTF8';
> mysql> status
> Server characterset:    latin1
> Db     characterset:    utf8
> Client characterset:    utf8
> Conn.  characterset:    utf8
>
> 3. write and execute a Perl script which reads the UTF-8 text file,
> insert UTF-8 strings into database, retrieve UTF-8 strings from
> database, print UTF-8 strings to STDOUT. See details in attached file
> readfile_insertdb.pl. Important note: "set names 'UTF8';" is mandatory.
>
> Everything is *working fine*. My output is in UTF-8, I'm 100% sure of
> it.
>
> DBD::mysql : 2.9007
>       Perl : 5.8.7
>      MySQL : 4.1.12-Debian_1ubuntu3.1-log
>        DBI : 1.48
>
> (find your local versions with attached script versions.pl)
>
> I suspect that Paul's data stored in MySQL are not truely UTF-8. Maybe
> I miss the point, but it seems Perl, MySQL and UTF-8 are not working so
> badly altogether.
>
> Cheers,
>
>   
WOW.
Indeed, frenchies can have some explanations about set names here :
http://doc.domainepublic.net/mysql/doc.mysql/charset-connection.html

here comes the english version :
http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html
clear, when you know what to search for :)

I will test myself.
-- 
Henri-Damien LAURENT





More information about the Koha-devel mailing list