[Koha-bugs] [Bug 21828] Improve efficiency of C4::Biblio::LinkBibHeadingsToAuthorities
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Tue Aug 1 21:20:44 CEST 2023
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21828
--- Comment #1 from Andreas Roussos <a.roussos at dataly.gr> ---
(In reply to Martin Renvoize from comment #0)
> The routine iterates through all fields in a biblio and compares them to a
> list of acceptable tags that may link to an authority. It may be more
> efficient to do the opposite iteration, working through acceptable authority
> tags and looking for the existence of that field in the biblio. (Note.. I've
> not tested this hypothesis.. it depends heavily on how efficient it is to
> fetch a single MARC tag using Marc::Record)
Hi Martin,
TL;DR: In UNIMARC instances, link_bibs_to_authorities.pl run time
can be reduced by 80% and the number of DBI calls can be
reduced by 90% with a very simple fix that optimises the
C4::Biblio::LinkBibHeadingsToAuthorities routine ;-)
I had a go at optimising this routine recently, as I'm in the process
of writing a Perl script that will perform some housekeeping tasks
in our bibliographic records. [The script is meant to search for
"TOPICAL SUBJECT" authority headings (UNIMARC field 250) using the
text found in our existing "UNCONTROLLED SUBJECT TERMS" (biblio
subfield 610$a) and, when a match is found, add a new "TOPICAL NAME
USED AS SUBJECT" tag (field 606) to the bibliographic record, copy
the authority record contents over, link it via subfield $9, then
finally remove the (unwanted) uncontrolled subject term.]
In developing this script, I investigated the inner workings of the
routine in question, and I'm pleased to report that I've identified
where the inefficiency in LinkBibHeadingsToAuthorities() actually
lies. The patch for fixing this is trivial and backportable, too! ;-)
> The routine iterates through all fields in a biblio and compares them to a
> list of acceptable tags that may link to an authority.
Indeed, that's exactly what happens in LinkBibHeadingsToAuthorities()
with the crucial bit being line 641:
637 foreach my $field ( $bib->fields() ) {
638 if ( defined $tagtolink ) {
639 next unless $field->tag() == $tagtolink ;
640 }
641 my $heading = C4::Heading->new_from_field( $field, $frameworkcode
);
642 next unless defined $heading;
[...]
Then, in C4::Heading->new_from_field():
61 sub new_from_field {
62 my $class = shift;
63 my $field = shift;
64 my $frameworkcode = shift; #FIXME this is not used?
65 my $auth = shift;
66 my $marcflavour = C4::Context->preference('marcflavour');
67 my $marc_handler = _marc_format_handler($marcflavour);
[...]
277 sub _marc_format_handler {
278 my $marcflavour = uc shift;
279 my $pname = "C4::Heading::$marcflavour";
280 load $pname;
281 return $pname->new();
282 }
This is where things get interesting as the behaviour diverges a bit
depending on the MARC flavour being used: in C4/Heading/MARC21.pm the
object is constructed with:
255 sub new {
256 my $class = shift;
257 return bless {}, $class;
258 }
... and the $bib_heading_fields data structure/hash is statically
set at the top of the module:
49 my $bib_heading_fields = {
50 '100' => {
51 auth_type => 'PERSO_NAME',
52 subfields => 'abcdfghjklmnopqrst',
53 main_entry => 1
54 },
55 '110' => {
56 auth_type => 'CORPO_NAME',
57 subfields => 'abcdfghklmnoprst',
58 main_entry => 1
59 },
[...]
Whilst in C4/Heading/UNIMARC.pm the object is constructed with:
68 sub new {
69 my $class = shift;
70
71 my $dbh = C4::Context->dbh;
72 my $sth = $dbh->prepare(
73 "SELECT tagfield, authtypecode
74 FROM marc_subfield_structure
75 WHERE frameworkcode = '' AND authtypecode <> ''"
76 );
77 $sth->execute();
78 $bib_heading_fields = {};
79 while ( my ( $tag, $auth_type ) = $sth->fetchrow ) {
80 $bib_heading_fields->{$tag} = {
81 auth_type => $auth_type,
82 subfields => 'abcdefghjklmnopqrstvxyz',
83 };
84 }
85
86 return bless {}, $class;
87 }
... thus resetting the $bib_heading_fields hash *in each invocation*,
then populating it again with the results fetched from the database!
Does this information really need to be re-calculated for each field
of the record being saved/updated/linked? I think not, as:
1) Changes to the marc_subfield_structure table are not that frequent,
and are unlikely to occur in the relatively short timeframe that a
save/update/linking takes to complete.
2) As per the official Koha manual, you're not really meant to edit
the Default framework as this will cause problems, but rather
clone it to a different one which you'll use to catalogue your
records. [C4::Heading->new_from_field() can actually be fed with
a $frameworkcode parameter, but it doesn't currently use it.
I'll leave that for a different bug report, though ;-)]
--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
More information about the Koha-bugs
mailing list