[Koha-bugs] [Bug 17842] Broken diacritics on records exported as MARC from cart

Tue May 26 01:27:47 CEST 2020

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=17842

David Cook <dcook at prosentient.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dcook at prosentient.com.au

--- Comment #19 from David Cook <dcook at prosentient.com.au> ---
Given my bad experience the other day trying to import records converted from
GB2312 to UTF8 into Koha, I'm extra interested by this. Maybe it's a related
topic.

At a glance, those sample records look fine both in Latin1 and UTF8. 

MarcEdit can convert the ISO MARC into its MRK format, but I'm failing to
convert it from ISO MARC to MARCXML. 

When I try to read your sample records as UTF-8 using MARC::File::USMARC, I see
the following error:

UTF-8 "\xFC" does not map to Unicode

Using "xxd cart.iso2709", I see that the "fc" byte is the ü in über and für.
Ah, and FC is ü in Latin-1 encoding whereas in UTF-8 it's C3 BC. 

So it sounds like Koha is exporting as Latin-1 but trying to import as UTF-8
and that's where it's falling over? 

Needs more investigating, but that's the problem with your sample records I'd
say.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.