[Koha-devel] Finding invalid XML characters in Koha data via SQL
Magnus Enger
magnus at libriotech.no
Fri Apr 12 08:06:05 CEST 2024
Hi!
Den 12.04.2024 03:36, skrev David Cook via Koha-devel:
> Hi all,
>
> I just wanted to share a (MariaDB) SQL report that I wrote for finding
> bib records with invalid XML characters:
>
> select biblionumber from biblio_metadata where metadata REGEXP
> '[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\x{10000}-\\x{10FFFF}]+';
>
> Newer versions of Koha strip invalid character from the XML so that you
> can fix your records. I figure this report is very valuable when coupled
> with that functionality. In fact, I just advised a library today to use
> them together to fix up some bad data in their catalogue.
>
> --
>
> On a related note, I’ve noticed that you can have a record with good bib
> XML but invalid item XML, and you won’t notice until your record fails
> to be indexed. So I’m planning on writing a report for that too.
>
> I’m thinking it might be good to add these reports to core Koha, so that
> people can find and fix their own metadata problems. What do people think?
Sounds like an excellent idea! Sounds kind of similar to "MARC
bibliographic framework test" at /cgi-bin/koha/admin/checkmarc.pl
The report could also be added to
https://wiki.koha-community.org/wiki/SQL_Reports_Library for older Kohas
and to be immediately useful.
Best regards,
Magnus
More information about the Koha-devel
mailing list