[Koha-devel] encoding of z3950 imported records

Zbigniew Bomert OP zbomert at dominikanie.pl
Thu Apr 29 03:29:03 CEST 2004


Hi,

First, thank you all for this software; I am happy trying to use Koha.

Lately, Benedykt Barszcz reported a problem with encoding of records imported through z3950. This is a general problem.

1. In fact, the records a imported in ISO-8859-1 encoding. They are
converted from marc8/ANSEL or ISO5426/ISO6937 to ISO-8859-1 in the
subroutine 'char_decode' in Biblio.pm. This will not work for libraries
with books in languages of different charsets. The universal solution
would be to store data in utf-8. Then all localized templates schuld be translated to utf-8 encoding.

2. In 'char_decode' a string is converted to ISO-8859-1 separetly for
UNIMARC and MARC21. LoC uses MARC21 with ANSEL encoding, european
libraries use mostely UNIMARC with ISO5426 encoding - but it's not
always the case. Polish National Library's z3950 server in fact responds
with MARC21 records in ISO5426 encoding. Happily, it seems to me, that
both encodings don't overlap, so it is safe to make char-decoding for
both at ones.

For polish chars the code could look like this:

	   s/(\xe2|\xc2)c/\xc4\x87/gm ;
	   s/(\xe2|\xc2)C/\xc4\x86/gm ;
	   s/(\xe2|\xc2)n/\xc5\x84/gm ;
	   s/(\xe2|\xc2)N/\xc5\x83/gm ;
	   s/(\xe2|\xc2)o/\xc3\xb3/gm ;
	   s/(\xe2|\xc2)O/\xc3\x93/gm ;
	   s/(\xe2|\xc2)s/\xc5\x9b/gm ;
	   s/(\xe2|\xc2)S/\xc5\x9a/gm ;
	   s/(\xe2|\xc2)z/\xc5\xba/gm ;
	   s/(\xe2|\xc2)Z/\xc5\xb9/gm ;
	   #ogonek
	   s/(\xf1|\xce)a/\xc4\x85/gm ;
	   s/(\xf1|\xce)A/\xc4\x84/gm ;
	   s/(\xf1|\xce)e/\xc4\x99/gm ;
	   s/(\xf1|\xce)E/\xc4\x98/gm ;
	   # łŁ
	   s/(\xb1|\xf8)/\xc5\x82/gm ;
	   s/(\xa1|\xe8)/\xc5\x82/gm ;
	   #żŻ
	   s/(\xe7|\xc7)z/\xc5\xbc/gm ;
	   s/(\xe7|\xc7)Z/\xc5\xbb/gm ;

For letters with acute:
	   s/(\xe2|\xc2)a/\xc3\xa1/gm ;
	   s/(\xe2|\xc2)A/\xc3\x81/gm ;
	   s/(\xe2|\xc2)e/\xc3\xa9/gm ;
	   s/(\xe2|\xc2)E/\xc3\x89/gm ;

and so on.

3. To see correct chars in search result one should also add
char-decoding of title and author in cgi-bin/z3950/search.pl

After those changes I can import correctly records from National
Library, and even polish records from LoC.

Zbigniew Bomert OP





More information about the Koha-devel mailing list