[Koha-bugs] [Bug 7284] New: Authority matching algorithm improvements

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Thu Dec 1 18:34:41 CET 2011


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

             Bug #: 7284
           Summary: Authority matching algorithm improvements
    Classification: Unclassified
 Change sponsored?: Seeking cosponsors
           Product: Koha
           Version: master
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5 - low
         Component: MARC Authority data support
        AssignedTo: jcamins at cpbibliography.com
        ReportedBy: jcamins at cpbibliography.com
         QAContact: ian.walls at bywatersolutions.com


At present, the automatic authority matching for MARC21 is of limited use
because it fails on headings with more than one subfield, doesn't take into
account subfield codes, and considers punctuation significant. An improved
matching algorithm should be able to match the following headings to the
correct authorities (these particular examples are from a local authority
file):
=650  #4$aHistory
=650  #4$aHistory
=650  #4$aHistory$xBibliography (the technique of bibliography as applied to
the study of history)
=650  #4$aHistory$vBibliography (bibliographies about history)
=650  #4$aHistory$vBibliography.
=650  #4$aHistory$zGreek Empire$vBibliography
=650  #4$aHistory$zGreek Empire$vBibliography.
=650  #0$aHistory.
=650  #7$aHistory.$2abc

Those headings should match the following authorities:
=150  #4$aHistory.
=150  #4$aHistory$xBibliography.
=150  #4$aHistory$vBibliography.
=150  #4$aHistory$zGreek Empire$vBibliography.
=150  #0$aHistory.
=150  #7$aHistory.$2abc

Libraries with examples of problematic headings from other authority files are
respectfully requested to provide them in comments for the purpose of testing.

There are a number of additional changes needed to make the
link_bibs_to_authorities.pl script and the situation where
BiblioAddsAuthorities=allow work properly:
* The option to link headings to the first matching authority, even if there is
more than one (and provide some sort of warning about that fact)
* Verbose mode on link_bibs_to_authorities.pl should offer more information.
* link_bibs_to_authorities.pl should be able to process only a portion of the
catalog.
* Allow machine-created authority records to be used for indexing.

Potential future changes that would make these features even more useful:
* A web interface to link_bibs_to_authorities.pl
* An authority record de-duplicator
* An option to correct punctuation when authorizing headings (either via
link_bibs_to_authorities.pl or in the cataloging module)

-- 
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.


More information about the Koha-bugs mailing list