[Koha-bugs] [Bug 17842] Broken diacritics on records exported as MARC from cart
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Tue May 26 01:27:47 CEST 2020
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=17842
David Cook <dcook at prosentient.com.au> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dcook at prosentient.com.au
--- Comment #19 from David Cook <dcook at prosentient.com.au> ---
Given my bad experience the other day trying to import records converted from
GB2312 to UTF8 into Koha, I'm extra interested by this. Maybe it's a related
topic.
At a glance, those sample records look fine both in Latin1 and UTF8.
MarcEdit can convert the ISO MARC into its MRK format, but I'm failing to
convert it from ISO MARC to MARCXML.
When I try to read your sample records as UTF-8 using MARC::File::USMARC, I see
the following error:
UTF-8 "\xFC" does not map to Unicode
Using "xxd cart.iso2709", I see that the "fc" byte is the ü in über and für.
Ah, and FC is ü in Latin-1 encoding whereas in UTF-8 it's C3 BC.
So it sounds like Koha is exporting as Latin-1 but trying to import as UTF-8
and that's where it's falling over?
Needs more investigating, but that's the problem with your sample records I'd
say.
--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
More information about the Koha-bugs
mailing list