[Koha-devel] UTF-8

Dorian Meid dnmeid at gmx.de
Mon Mar 12 05:26:54 CET 2007


Am 12.03.2007 um 03:19 schrieb Thomas Dukleth:

> TARGET ISSUES.
>
> Are you certain that your Z39.50 target is returning records with  
> UTF-8
> encoding?  If you supply the connection parameters and a test  
> search for
> records which you believe are problematic, then I can test the target
> myself.
>
>
> BYTE CODES NEEDED FOR A RELIABLE CHECK.

Because I don't have another z39.50 client I use the scripts we have,  
namely z3950/search.pl
I simply store the retreived marc data with:

				open(MARCFILE, ">:raw", "result$i.MARC");
				print MARCFILE $marcdata;
				close MARCFILE;

starting at line 174.
I also tried:
				open(MARCFILE, ">:utf8", "result$i.MARC");

when I use a server which claims to send utf-8 i get:

with raw-print: 75 CC 88 C3

with utf8-print: 00 75 03 08 FF FD

both should be "üß" (ü ß)

when I use a server which uses autonegotiation of the charset I get a  
correct encoded latin-1 record

the ü is ok but FF FD for ß is definitely wrong.

It seems to me that this encoding thing is a real pain and I'm rather  
new to it.

In this example I used the host "z3950.gbv.de" port "20012" database  
"gvk" user "999" password "abc".
I searched for the ISBN "3-552-06027-8"
The title should be "Die Süße des Lebens"

Another strange point is that the field ends at the ß character ("Die  
SüÃ"), but the original title is much longer (maybe indicates a  
faulty record, but it's the same with all records containing ß)

Unfortunately I don't know any other server, which provides utf8  
encoded MARC21 data. Maybe somebody can tell me one and a sample  
search with the results to expect.


> Sadly, Firefox is a poor performer for transmitting data in UTF-8.
Yep, I checked my browser here: http://www.fileformat.info/info/ 
unicode/utf8test.htm
>


Dorian Meid









More information about the Koha-devel mailing list