[Koha-bugs] [Bug 7284] Authority matching algorithm improvements

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Jan 24 18:09:04 CET 2012


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284

Jared Camins-Esakov <jcamins at cpbibliography.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #7266|0                           |1
        is obsolete|                            |

--- Comment #14 from Jared Camins-Esakov <jcamins at cpbibliography.com> 2012-01-24 17:09:04 UTC ---
Created attachment 7323
  --> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7323
Bug 7284: Authority matching improvements

Squashed patch incorporating all previous patches.

1. Cleaned up authorities code by removing unused functions and adding
unimplemented functions and added some unit tests.

2. Added an additional box to the authority finder plugin for "Heading match,"
which consults not just the main entry but also See-from and See-also-from
headings.

3. Improved Koha's authority linker cron job (misc/link_bibs_to_authorities.pl)
to make it more useful:

Added the following options to the misc/link_bibs_to_authorities.pl script:
--auth-limit        Only process those headings that match the authorities
                    matching the user-specified WHERE clause.
--bib-limit         Only process those bib records that match the
                    user-specified WHERE clause.
--commit            Commit the results to the database after every N records
                    are processed.
--link-report       Display a report of all the headings that were processed.

Converted misc/link_bibs_to_authorities.pl to use POD.

Added a detailed report of headings that linked, did not link, and linked
in a "fuzzy" fashion (the exact semantics of fuzzy are up to the individual
linker modules) during the run.

Implemented new C4::Linker functionality to make it possible to easily add
custom authority linker algorithms. Currently available linker options are:
* Default: retains the current behavior of only creating links when there is
  an exact match to one and only one authority record; if the
'broader_headings'
  option is enabled, it will try to link to headings to authority records for
  broader headings by removing subfields from the end of the heading (NOTE:
  test the results before enabling broader_headings in a production system
  because its usefulness is very much dependent on individual sites' authority
  files)
* First Match: based on Default, creates a link to the *first* authority
  record that matches a given heading, even if there is more than one
  authority record that matches
* Last Match: based on Default, creates a link to the *last* authority
  record that matches a given heading, even if there is more than one record
  that matches

Made the linking functionality use the SearchAuthorities in C4::AuthoritiesMarc
rather than SimpleSearch in C4::Search. Once C4::Search has been refactored,
SearchAuthorities should be rewritten to simply call into C4::Search. However,
at this time C4::Search cannot handle authority searching. Also fixed numerous
performance issues in SearchAuthorities and the Linker script:
* Correctly destroy ZOOM recordsets in SearchAuthorities when finished. If left
  undestroyed, efficiency appears to approach O(log n^n)
* Add an optional $skipmetadata flag to SearchAuthorities that can be used to
  avoid additional calls into Zebra when all that is wanted are authority
  records and not statistics about their use

This patch also adds the following sysprefs:
* AutoCreateAuthorities - When this and BiblioAddsAuthorities are both turned
  on, automatically create authority records for headings that don't have
  any authority link when cataloging. When BiblioAddsAuthorities is on and
  AutoCreateAuthorities is turned off, do not automatically generate authority
  records, but allow the user to enter headings that don't match an existing
  authority. When BiblioAddsAuthorities is off, this has no effect.
* LinkerModule - Chooses which linker module to use for matching headings
  (current options are as described above in the section on linker options:
  "Default," "FirstMatch," and "LastMatch")
* LinkerOptions - A pipe-separated list of options to set for the authority
  linker
* LinkerRelink - When turned on, the linker will confirm the links for headings
  that have previously been linked to an authority record when it runs. When
  turned off, any heading with an existing link will be ignored.
* LinkerKeepStale - When turned on, the linker will never *delete* a link to an
  authority record, though, depending on the value of LinkerRelink, it may
  change the link.

This patch also modifies the authority indexing to remove trailing punctuation
from Match indexes.

4. Replace the old BiblioAddAuthorities subroutines with calls into the new
C4::Linker routines.

5. Add a simple implementation for C4::Heading::UNIMARC. (With thanks to F.
Demians, 2011.01.09:) Correct C4::Heading::UNIMARC class loading. Create biblio
tag to authority types data structure at initialization rather than querying
DB.

6. Ran perltidy on all changed code.

Signed-off-by: Jared Camins-Esakov <jcamins at cpbibliography.com>

-- 
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.


More information about the Koha-bugs mailing list