[Koha-devel] Re: MARC character encoding

paul POULAIN paul.poulain at free.fr
Mon Jan 20 08:46:02 CET 2003


Ed Summers a écrit:

>Hi Paul:
>On Wed, Jan 15, 2003 at 03:20:47PM +0100, paul POULAIN wrote:
>  
>
>>Could someone explain how to translate the "MARC21" charset to a more 
>>convenient one (and which is more convenient ?)
>>Same question for UNIMARC (which is ISO646 if my docs are right)
>>    
>>
>If we lived in a perfect world we would all be using Unicode (UTF8)
>since it covers so many of the worlds scripts [1]. Unfortunately the
>world is not perfect. MARC has been around longer than Unicode, so 
>MARC-8 character encoding to allow non Latin scripts to live in MARC 
>records. I guess the world has bigger problems than character encodings
>(Mr George Bush comes to mind), but I'll leave that particular problem
>alone :)
>
50% of your news here are related to Mr George Bush.
Unfortunately for me, the other 50% are NOT related to character 
encoding :-)))

>I wasn't aware that UNIMARC had defined a different standard for
>character encoding. Isn't ISO646 just an synonym for ASCII? [2] Which docs 
>describe the character sets used in UNIMARC?
>
No, you're right. ISO646 IS Ascii.
What i don't understand is how they code >127 codes on 2 digits.
for example,  \xc3\x65 = ê
It's not ASCII ?

>>I tried MARC-Charset, which seems to translate from "MARC21" to UNICODE, 
>>but i don't know what to do with my unicode ;-(
>>    
>>
>Yes, MARC::Charset is an implementation of the MARC-8 ==> Unicode
>(UTF-8) mappings published by the Library of Congress. [3] In MARC-8
>there is a special way of 'escaping' to other character sets (Hebrew,
>Cyrillic, East-Asian, etc). 
>
this special way is \xc3\x65 ?

>You mapped all the UNIMARC fields to MARC fields!?! I was under the
>impression that this was quite a big undertaking to do completely. Is
>your code currently checked into CVS? Having a UNIMARC filter in
>MARC::Record (MARC::File::UNIMARC) has been a long term goal. Maybe we
>could roll this work into the MARC::Record package?
>
NO, of course. You're right, this is a BIG job.
I did this only for a few fields/subfields I needed (around 20-25 fields)
It's a 10-20 lines script (+ the mapping array) with MARC::Record

- 
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/koha-devel/attachments/20030120/2408c983/attachment-0002.htm>


More information about the Koha-devel mailing list