[Koha-devel] Searching for garbage characters

Cab Vinton bibliwho at gmail.com
Tue Nov 6 15:32:02 CET 2012


We've managed to import a number of MARC records with corrupted
diacritics and my attempts to retrieve these with a report haven't met
w/ success. (Sample records in this list:
http://catalog.splnh.com/cgi-bin/koha/opac-shelves.pl?viewshelf=8.)

My thought is to search for 100, 700, etc. tags containing any
characters outside of the ASCII 32 through 126 range, but my regex
skills aren't up to the task. To wit:

SELECT CONCAT('<a
href=\"/cgi-bin/koha/catalogue/detail.pl?biblionumber=',biblionumber,'\">',biblionumber,'</a>')
AS bibnumber, pname
FROM
(SELECT biblionumber,
ExtractValue(marcxml,'//datafield[@tag="100"]/subfield[@code>="a"]')
AS pname FROM biblioitems)
AS authors
WHERE pname REGEXP '[\W]'

These attempts also didn't seem to be getting me any closer:

WHERE pname REGEXP '[^a-z]'

WHERE pname LIKE '%[^a-zA-Z0-9]%'

WHERE PATINDEX('%[^a-zA-Z0-9]%',pname) > 1

Any thoughts on how to write this report? Have tried the folks over on
the MarcEdit list, but no solution as yet.

Many thanks,

Cab Vinton, Director
Sanbornton Public Library
Sanbornton, NH

Life is short. Read fast!


More information about the Koha-devel mailing list