[Koha-bugs] [Bug 35104] We should warn when attempting to save MARC records that contain characters invalid in XML
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Wed Nov 1 01:43:12 CET 2023
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=35104
--- Comment #18 from David Cook <dcook at prosentient.com.au> ---
(In reply to David Cook from comment #17)
> I'm going to poke around in this a bit more...
The TransformHtmlToMarc doesn't seem to affect it...
If I do $record->as_formatted then I see:
likeminded
If I do $record->as_xml then I see:

Looking at
https://metacpan.org/dist/MARC-File-XML/source/lib/MARC/File/XML.pm#L378 there
is an escape function that escapes ampersands and angle brackets.
In theory, maybe MARC::File::XML should escape any invalid characters using
character references or remove them since they're invalid.
But MARC::File::XML's escaping means it's impossible for us to pre-escape any
invalid characters.
It feels like MARC::File::XML is essentially holding us hostage. We need to
clean our input data (in whatever format) before it reaches MARC::File::XML,
which seems a bit silly, since it's the XML format which has the
restrictions...
That being said... the XML 1.0 spec is pretty forgiving. After review, it's
really just excluding *some* ASCII control characters, Unicode surrogates,
U+FFFE, and U+FFFF. That's a really small number of characters and none of them
are printable characters.
--
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
More information about the Koha-bugs
mailing list