[Koha-devel] Re: MARC character encoding

paul POULAIN paul.poulain at free.fr
Thu Jan 23 04:26:06 CET 2003


Ed Summers a écrit:

>On Tue, Jan 21, 2003 at 09:15:07AM +0100, paul POULAIN wrote:
>  
>
>>Francois lemarchand sended me a little script to translate éà... into 
>>8859-1 standard characters. I've included it in the addbiblio.pl script 
>>(when the system finds a biblio in the breeding farm)
>>It seems to work. Things are definetly strange in char encoding.
>>
>>Uploaded in cvs a few minuts ago
>>    
>>
>
>I'm looking at the script. From the comments it looks like Francois'
>code is converting from ISO 5426 to ISO 8859-1. How are character sets
>handled in UNIMARC? I'm guessing there are more character sets than ISO
>5426 which can be used.
>
>I just checked and Perl's Encode::* modules don't seem to handle ISO 5426 :( 
>which is a shame. It is even more a shame that ISO doesn't make these standards public. I'm going to subscribe to perl-unicode at perl.org and see if I can find 
>out more.
>
>//Ed
>
Sorry, but i've more deeply looked at francois code, and some MARC21 and 
UNIMARC files.

My conclusion is that the following code :
        s/\xe1/\xc1/gm;
        s/\xe2/\xc2/gm;
        s/\xe3/\xc3/gm;
        s/\xe4/\xc4/gm;
        s/\xe8/\xc8/gm;
        s/\xe9/\xc9/gm;
        s/\xf0/\xd0/gm;
is enough to migrate from MARC21 to UNIMARC char coding. It tried this 
on my marc21->unimarc script, on 30 000 records, and it works fine.

So, i think we have 2 complete tables (marc21 and unimarc) in Biblio.pm, 
that i commited a few minuts ago.

-- 
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/koha-devel/attachments/20030123/2ae2a76d/attachment-0002.htm>


More information about the Koha-devel mailing list