[Koha-bugs] [Bug 21828] Improve efficiency of C4::Biblio::LinkBibHeadingsToAuthorities

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Aug 1 21:20:44 CEST 2023


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21828

--- Comment #1 from Andreas Roussos <a.roussos at dataly.gr> ---
(In reply to Martin Renvoize from comment #0)
> The routine iterates through all fields in a biblio and compares them to a
> list of acceptable tags that may link to an authority.  It may be more
> efficient to do the opposite iteration, working through acceptable authority
> tags and looking for the existence of that field in the biblio. (Note.. I've
> not tested this hypothesis.. it depends heavily on how efficient it is to
> fetch a single MARC tag using Marc::Record)
Hi Martin,

TL;DR: In UNIMARC instances, link_bibs_to_authorities.pl run time
       can be reduced by 80% and the number of DBI calls can be
       reduced by 90% with a very simple fix that optimises the
       C4::Biblio::LinkBibHeadingsToAuthorities routine ;-)

I had a go at optimising this routine recently, as I'm in the process
of writing a Perl script that will perform some housekeeping tasks
in our bibliographic records. [The script is meant to search for
"TOPICAL SUBJECT" authority headings (UNIMARC field 250) using the
text found in our existing "UNCONTROLLED SUBJECT TERMS" (biblio
subfield 610$a) and, when a match is found, add a new "TOPICAL NAME
USED AS SUBJECT" tag (field 606) to the bibliographic record, copy
the authority record contents over, link it via subfield $9, then
finally remove the (unwanted) uncontrolled subject term.]

In developing this script, I investigated the inner workings of the
routine in question, and I'm pleased to report that I've identified
where the inefficiency in LinkBibHeadingsToAuthorities() actually
lies. The patch for fixing this is trivial and backportable, too! ;-)

> The routine iterates through all fields in a biblio and compares them to a
> list of acceptable tags that may link to an authority.
Indeed, that's exactly what happens in LinkBibHeadingsToAuthorities()
with the crucial bit being line 641:

 637     foreach my $field ( $bib->fields() ) {
 638         if ( defined $tagtolink ) {
 639           next unless $field->tag() == $tagtolink ;
 640         }
 641         my $heading = C4::Heading->new_from_field( $field, $frameworkcode
);
 642         next unless defined $heading;
[...]

Then, in C4::Heading->new_from_field():

 61 sub new_from_field {
 62     my $class         = shift;
 63     my $field         = shift;
 64     my $frameworkcode = shift; #FIXME this is not used?
 65     my $auth          = shift;
 66     my $marcflavour   = C4::Context->preference('marcflavour');
 67     my $marc_handler = _marc_format_handler($marcflavour);
[...]

277 sub _marc_format_handler {
278     my $marcflavour = uc shift;
279     my $pname = "C4::Heading::$marcflavour";
280     load $pname;
281     return $pname->new();
282 }

This is where things get interesting as the behaviour diverges a bit
depending on the MARC flavour being used: in C4/Heading/MARC21.pm the
object is constructed with:

255 sub new {
256     my $class = shift;
257     return bless {}, $class;
258 }

... and the $bib_heading_fields data structure/hash is statically
set at the top of the module:

 49 my $bib_heading_fields = {
 50     '100' => {
 51         auth_type  => 'PERSO_NAME',
 52         subfields  => 'abcdfghjklmnopqrst',
 53         main_entry => 1
 54     },
 55     '110' => {
 56         auth_type  => 'CORPO_NAME',
 57         subfields  => 'abcdfghklmnoprst',
 58         main_entry => 1
 59     },
[...]

Whilst in C4/Heading/UNIMARC.pm the object is constructed with:

 68 sub new {
 69     my $class = shift;
 70
 71     my $dbh = C4::Context->dbh;
 72     my $sth = $dbh->prepare(
 73         "SELECT tagfield, authtypecode
 74          FROM marc_subfield_structure
 75          WHERE frameworkcode = '' AND authtypecode <> ''"
 76     );
 77     $sth->execute();
 78     $bib_heading_fields = {};
 79     while ( my ( $tag, $auth_type ) = $sth->fetchrow ) {
 80         $bib_heading_fields->{$tag} = {
 81             auth_type => $auth_type,
 82             subfields => 'abcdefghjklmnopqrstvxyz',
 83         };
 84     }
 85
 86     return bless {}, $class;
 87 }

... thus resetting the $bib_heading_fields hash *in each invocation*,
then populating it again with the results fetched from the database!

Does this information really need to be re-calculated for each field
of the record being saved/updated/linked? I think not, as:

1) Changes to the marc_subfield_structure table are not that frequent,
   and are unlikely to occur in the relatively short timeframe that a
   save/update/linking takes to complete.

2) As per the official Koha manual, you're not really meant to edit
   the Default framework as this will cause problems, but rather
   clone it to a different one which you'll use to catalogue your
   records. [C4::Heading->new_from_field() can actually be fed with
   a $frameworkcode parameter, but it doesn't currently use it.
   I'll leave that for a different bug report, though ;-)]

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.


More information about the Koha-bugs mailing list