[Koha-bugs] [Bug 25600] Koha doesn't check for warnings when parsing (ISO2709) MARC

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue May 26 02:02:47 CEST 2020


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25600

--- Comment #1 from David Cook <dcook at prosentient.com.au> ---
Recently, I converted some English/Chinese records in ISO2709 MARC from the
Chinese GB2312 encoding to UTF-8 and imported them into Koha. 

At first, it seemed fine. There were no visible warnings or errors. 

However, when I looked at the records, it was obvious that many of the Chinese
strings were truncated.

I figured out that the problem was that MARC::File::USMARC reads the ISO2709
format precisely, which was a problem as I was converting from a 2-byte
encoding to a 3-byte encoding. The ISO2709 MARC record label and directory were
never going to be correct in that scenario.

But Koha didn't complain.

So I used MARC::File::USMARC on my own and looked at $record->warnings(), and I
saw an explosion of problems with the parsing of my ISO2709 MARC records.

In the end, I did a workaround by converting from GB2312 to UTF8 using iconv,
opened the ISO2709 MARC in MarcEdit which converted it to its own "MarcBreaker"
MRK format, and then exported that as MARCXML. Evidently MarcEdit's MarcBreaker
is more skeptical of the ISO2709 MARC format and parses the data more
carefully. 

I am tempted to submit a patch to MARC::File::USMARC to have an option for more
cautious parsing, but in the meantime I think we should at least raise the
parsing warnings in Koha?

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.


More information about the Koha-bugs mailing list