[Koha-devel] Finding invalid XML characters in Koha data via SQL
Philippe Blouin
philippe.blouin at inlibro.com
Fri Apr 12 14:44:48 CEST 2024
Something else to add to search_for_data_inconsistencies.pl ?
I like perl-based solution, and I appreciate centralized ones, even
though I suppose what you're testing is not an "inconsistency".
Logo inLibro <https://inLibro.com> Philippe Blouin
Directeur de la technologie
T 833-INLIBRO (465-4276) <tel:833-465-4276>, poste 230
C philippe.blouin at inLibro.com
www.inLibro.com <https://inLibro.com>
On 2024-04-11 21:36, David Cook via Koha-devel wrote:
>
> Hi all,
>
> I just wanted to share a (MariaDB) SQL report that I wrote for finding
> bib records with invalid XML characters:
>
> select biblionumber from biblio_metadata where metadata REGEXP
> '[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\x{10000}-\\x{10FFFF}]+';
>
> Newer versions of Koha strip invalid character from the XML so that
> you can fix your records. I figure this report is very valuable when
> coupled with that functionality. In fact, I just advised a library
> today to use them together to fix up some bad data in their catalogue.
>
> --
>
> On a related note, I’ve noticed that you can have a record with good
> bib XML but invalid item XML, and you won’t notice until your record
> fails to be indexed. So I’m planning on writing a report for that too.
>
> I’m thinking it might be good to add these reports to core Koha, so
> that people can find and fix their own metadata problems. What do
> people think?
>
> David Cook
>
> Senior Software Engineer
>
> Prosentient Systems
>
> Suite 7.03
>
> 6a Glen St
>
> Milsons Point NSW 2061
>
> Australia
>
> Office: 02 9212 0899
>
> Online: 02 8005 0595
>
>
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website :https://www.koha-community.org/
> git :https://git.koha-community.org/
> bugs :https://bugs.koha-community.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20240412/4932e5a7/attachment.htm>
More information about the Koha-devel
mailing list