[Koha-devel] Finding invalid XML characters in Koha data via SQL
David Cook
dcook at prosentient.com.au
Fri Apr 12 03:36:03 CEST 2024
Hi all,
I just wanted to share a (MariaDB) SQL report that I wrote for finding bib
records with invalid XML characters:
select biblionumber from biblio_metadata where metadata REGEXP
'[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\x{1000
0}-\\x{10FFFF}]+';
Newer versions of Koha strip invalid character from the XML so that you can
fix your records. I figure this report is very valuable when coupled with
that functionality. In fact, I just advised a library today to use them
together to fix up some bad data in their catalogue.
--
On a related note, I've noticed that you can have a record with good bib XML
but invalid item XML, and you won't notice until your record fails to be
indexed. So I'm planning on writing a report for that too.
I'm thinking it might be good to add these reports to core Koha, so that
people can find and fix their own metadata problems. What do people think?
David Cook
Senior Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia
Office: 02 9212 0899
Online: 02 8005 0595
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20240412/4a2fe943/attachment.htm>
More information about the Koha-devel
mailing list