[Koha-bugs] [Bug 35104] We should warn when attempting to save MARC records that contain characters invalid in XML

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Wed Nov 1 01:43:12 CET 2023


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=35104

--- Comment #18 from David Cook <dcook at prosentient.com.au> ---
(In reply to David Cook from comment #17)
> I'm going to poke around in this a bit more...

The TransformHtmlToMarc doesn't seem to affect it...

If I do $record->as_formatted then I see:

likeminded

If I do $record->as_xml then I see:

&#2;

Looking at
https://metacpan.org/dist/MARC-File-XML/source/lib/MARC/File/XML.pm#L378 there
is an escape function that escapes ampersands and angle brackets. 

In theory, maybe MARC::File::XML should escape any invalid characters using
character references or remove them since they're invalid.

But MARC::File::XML's escaping means it's impossible for us to pre-escape any
invalid characters.

It feels like MARC::File::XML is essentially holding us hostage. We need to
clean our input data (in whatever format) before it reaches MARC::File::XML,
which seems a bit silly, since it's the XML format which has the
restrictions...

That being said... the XML 1.0 spec is pretty forgiving. After review, it's
really just excluding *some* ASCII control characters, Unicode surrogates,
U+FFFE, and U+FFFF. That's a really small number of characters and none of them
are printable characters.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.


More information about the Koha-bugs mailing list