[Koha-bugs] [Bug 7284] Authority matching algorithm improvements
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Sat Jan 21 16:00:19 CET 2012
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284
Jared Camins-Esakov <jcamins at cpbibliography.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #7083|0 |1
is obsolete| |
Attachment #7084|0 |1
is obsolete| |
Attachment #7085|0 |1
is obsolete| |
Attachment #7086|0 |1
is obsolete| |
Attachment #7091|0 |1
is obsolete| |
--- Comment #13 from Jared Camins-Esakov <jcamins at cpbibliography.com> 2012-01-21 15:00:19 UTC ---
Created attachment 7266
--> http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=7266
Bug 7284: Authority matching improvements
Squashed patch incorporating all previous patches.
1. Cleaned up authorities code by removing unused functions and adding
unimplemented functions and added some unit tests.
2. Added an additional box to the authority finder plugin for "Heading match,"
which consults not just the main entry but also See-from and See-also-from
headings.
3. Improved Koha's authority linker cron job (misc/link_bibs_to_authorities.pl)
to make it more useful:
Added the following options to the misc/link_bibs_to_authorities.pl script:
--auth-limit Only process those headings that match the authorities
matching the user-specified WHERE clause.
--bib-limit Only process those bib records that match the
user-specified WHERE clause.
--commit Commit the results to the database after every N records
are processed.
--link-report Display a report of all the headings that were processed.
Converted misc/link_bibs_to_authorities.pl to use POD.
Added a detailed report of headings that linked, did not link, and linked
in a "fuzzy" fashion (the exact semantics of fuzzy are up to the individual
linker modules) during the run.
Implemented new C4::Linker functionality to make it possible to easily add
custom authority linker algorithms. Currently available linker options are:
* Default: retains the current behavior of only creating links when there is
an exact match to one and only one authority record; if the
'broader_headings'
option is enabled, it will try to link to headings to authority records for
broader headings by removing subfields from the end of the heading (NOTE:
test the results before enabling broader_headings in a production system
because its usefulness is very much dependent on individual sites' authority
files)
* First Match: based on Default, creates a link to the *first* authority
record that matches a given heading, even if there is more than one
authority record that matches
* Last Match: based on Default, creates a link to the *last* authority
record that matches a given heading, even if there is more than one record
that matches
Made the linking functionality use the SearchAuthorities in C4::AuthoritiesMarc
rather than SimpleSearch in C4::Search. Once C4::Search has been refactored,
SearchAuthorities should be rewritten to simply call into C4::Search. However,
at this time C4::Search cannot handle authority searching. Also fixed numerous
performance issues in SearchAuthorities and the Linker script:
* Correctly destroy ZOOM recordsets in SearchAuthorities when finished. If left
undestroyed, efficiency appears to approach O(log n^n)
* Add an optional $skipmetadata flag to SearchAuthorities that can be used to
avoid additional calls into Zebra when all that is wanted are authority
records and not statistics about their use
This patch also adds the following sysprefs:
* AutoCreateAuthorities - When this and BiblioAddsAuthorities are both turned
on, automatically create authority records for headings that don't have
any authority link when cataloging. When BiblioAddsAuthorities is on and
AutoCreateAuthorities is turned off, do not automatically generate authority
records, but allow the user to enter headings that don't match an existing
authority. When BiblioAddsAuthorities is off, this has no effect.
* LinkerModule - Chooses which linker module to use for matching headings
(current options are as described above in the section on linker options:
"Default," "FirstMatch," and "LastMatch")
* LinkerOptions - A pipe-separated list of options to set for the authority
linker
* LinkerRelink - When turned on, the linker will confirm the links for headings
that have previously been linked to an authority record when it runs. When
turned off, any heading with an existing link will be ignored.
* LinkerKeepStale - When turned on, the linker will never *delete* a link to an
authority record, though, depending on the value of LinkerRelink, it may
change the link.
This patch also modifies the authority indexing to remove trailing punctuation
from Match indexes.
4. Replace the old BiblioAddAuthorities subroutines with calls into the new
C4::Linker routines.
5. Add a simple implementation for C4::Heading::UNIMARC. (With thanks to F.
Demians, 2011.01.09:) Correct C4::Heading::UNIMARC class loading. Create biblio
tag to authority types data structure at initialization rather than querying
DB.
6. Ran perltidy on all changed code.
Signed-off-by: Jared Camins-Esakov <jcamins at cpbibliography.com>
--
Configure bugmail: http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
More information about the Koha-bugs
mailing list