[Koha-devel] Finding invalid XML characters in Koha data via SQL

Philippe Blouin philippe.blouin at inlibro.com
Fri Apr 12 14:44:48 CEST 2024


Something else to add to search_for_data_inconsistencies.pl ?

I like perl-based solution, and I appreciate centralized ones, even 
though I suppose what you're testing is not an "inconsistency".

Logo inLibro <https://inLibro.com> 	Philippe Blouin
Directeur de la technologie

T 833-INLIBRO (465-4276) <tel:833-465-4276>, poste 230
C philippe.blouin at inLibro.com

www.inLibro.com <https://inLibro.com>

On 2024-04-11 21:36, David Cook via Koha-devel wrote:
>
> Hi all,
>
> I just wanted to share a (MariaDB) SQL report that I wrote for finding 
> bib records with invalid XML characters:
>
> select biblionumber from biblio_metadata where metadata REGEXP 
> '[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\x{10000}-\\x{10FFFF}]+';
>
> Newer versions of Koha strip invalid character from the XML so that 
> you can fix your records. I figure this report is very valuable when 
> coupled with that functionality. In fact, I just advised a library 
> today to use them together to fix up some bad data in their catalogue.
>
> --
>
> On a related note, I’ve noticed that you can have a record with good 
> bib XML but invalid item XML, and you won’t notice until your record 
> fails to be indexed. So I’m planning on writing a report for that too.
>
> I’m thinking it might be good to add these reports to core Koha, so 
> that people can find and fix their own metadata problems. What do 
> people think?
>
> David Cook
>
> Senior Software Engineer
>
> Prosentient Systems
>
> Suite 7.03
>
> 6a Glen St
>
> Milsons Point NSW 2061
>
> Australia
>
> Office: 02 9212 0899
>
> Online: 02 8005 0595
>
>
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website :https://www.koha-community.org/
> git :https://git.koha-community.org/
> bugs :https://bugs.koha-community.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20240412/4932e5a7/attachment.htm>


More information about the Koha-devel mailing list