[Koha-devel] Re: MARC character encoding
paul POULAIN
paul.poulain at free.fr
Thu Jan 23 04:26:06 CET 2003
Ed Summers a écrit:
>On Tue, Jan 21, 2003 at 09:15:07AM +0100, paul POULAIN wrote:
>
>
>>Francois lemarchand sended me a little script to translate éà... into
>>8859-1 standard characters. I've included it in the addbiblio.pl script
>>(when the system finds a biblio in the breeding farm)
>>It seems to work. Things are definetly strange in char encoding.
>>
>>Uploaded in cvs a few minuts ago
>>
>>
>
>I'm looking at the script. From the comments it looks like Francois'
>code is converting from ISO 5426 to ISO 8859-1. How are character sets
>handled in UNIMARC? I'm guessing there are more character sets than ISO
>5426 which can be used.
>
>I just checked and Perl's Encode::* modules don't seem to handle ISO 5426 :(
>which is a shame. It is even more a shame that ISO doesn't make these standards public. I'm going to subscribe to perl-unicode at perl.org and see if I can find
>out more.
>
>//Ed
>
Sorry, but i've more deeply looked at francois code, and some MARC21 and
UNIMARC files.
My conclusion is that the following code :
s/\xe1/\xc1/gm;
s/\xe2/\xc2/gm;
s/\xe3/\xc3/gm;
s/\xe4/\xc4/gm;
s/\xe8/\xc8/gm;
s/\xe9/\xc9/gm;
s/\xf0/\xd0/gm;
is enough to migrate from MARC21 to UNIMARC char coding. It tried this
on my marc21->unimarc script, on 30 000 records, and it works fine.
So, i think we have 2 complete tables (marc21 and unimarc) in Biblio.pm,
that i commited a few minuts ago.
--
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/koha-devel/attachments/20030123/2ae2a76d/attachment-0002.htm>
More information about the Koha-devel
mailing list