[Koha-devel] Searching for garbage characters

Paul paul.a at aandc.org
Tue Nov 6 18:12:39 CET 2012


At 09:32 AM 11/6/2012 -0500, Cab Vinton wrote:
>We've managed to import a number of MARC records with corrupted
>diacritics and my attempts to retrieve these with a report haven't met
>w/ success. (Sample records in this list:
>http://catalog.splnh.com/cgi-bin/koha/opac-shelves.pl?viewshelf=8.)
>
>My thought is to search for 100, 700, etc. tags containing any
>characters outside of the ASCII 32 through 126 range, but my regex
>skills aren't up to the task. To wit:

Try a last line such as

WHERE NOT HEX(pname) REGEXP '^([0-7][0-9A-F])*$'
or
WHERE pname REGEXP '[^ -~]'
But I'm afraid that this may find more (all|most accented chars) than what 
you're looking for (the black diamonds) - see for example a z39.50 search 
for L'évolution de l'aéronautique by Jauneaud at LoC.  My understanding 
(and I definitely stand to be corrected) is that some cataloguers used 
(maybe still do?) two characters (in themselves both valid in UTF-8) as an 
[accent][letter] combination. I have asked our people, and they tell me 
that when they import, they edit these out (although I can still find a 
couple that sneaked into our db, but they do not appear to affect search 
capability.)

Best - Paul



>SELECT CONCAT('<a
>href=\"/cgi-bin/koha/catalogue/detail.pl?biblionumber=',biblionumber,'\">',biblionumber,'</a>')
>AS bibnumber, pname
>FROM
>(SELECT biblionumber,
>ExtractValue(marcxml,'//datafield[@tag="100"]/subfield[@code>="a"]')
>AS pname FROM biblioitems)
>AS authors
>WHERE pname REGEXP '[\W]'
>
>These attempts also didn't seem to be getting me any closer:
>
>WHERE pname REGEXP '[^a-z]'
>
>WHERE pname LIKE '%[^a-zA-Z0-9]%'
>
>WHERE PATINDEX('%[^a-zA-Z0-9]%',pname) > 1
>
>Any thoughts on how to write this report? Have tried the folks over on
>the MarcEdit list, but no solution as yet.
>
>Many thanks,
>
>Cab Vinton, Director
>Sanbornton Public Library
>Sanbornton, NH
>
>Life is short. Read fast!
>_______________________________________________
>Koha-devel mailing list
>Koha-devel at lists.koha-community.org
>http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
>website : http://www.koha-community.org/
>git : http://git.koha-community.org/
>bugs : http://bugs.koha-community.org/

---
Maritime heritage and history, preservation and conservation,
research and education through the written word and the arts.
<http://NavalMarineArchive.com> and <http://UltraMarine.ca>



More information about the Koha-devel mailing list