[Koha-bugs] [Bug 25265] Elasticsearch - Batch editing items on a biblio can lead to incorrect index

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Fri Sep 11 11:38:56 CEST 2020


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25265

Marcel de Rooy <m.de.rooy at rijksmuseum.nl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|BLOCKED                     |Failed QA

--- Comment #19 from Marcel de Rooy <m.de.rooy at rijksmuseum.nl> ---
General
Module changes, no tests ? I understand that mocking a search engine is a bit
hard, but this is a point for adding at least something ?
Reading the code and searching further in the ElasticSearch code raises lots of
questions. Listing a few.

$indexer->update_index_background( $biblionumbers, $records );
$indexer->delete_index_background( $biblionumbers );
marc_records_to_documents => Koha/SearchEngine/Elasticsearch.pm
We are now fetching MARC records in ModZebra for Elastic. I have the feeling
that if we use Elastic we should be leaving ModZebra as soon as we can ;) Move
this code where it belongs? Somewhere in SearchEngine/Elastic?
The operation of sending biblionumbers and biblio records in two lists makes me
a bit suspicious. There is no check anymore that these lists are in sync. If
there would be a gap somehow, we would be indexing records on the wrong id?
Might be just theoretical but looks suboptimal.

It seems that ModZebra is still inserting records into zebraqueue when we use
Elastic. Why? Should we skip and cleanup ?
The for loop in ModZebra seems to be suboptimal? In a loop you are checking if
a record count is zero and if so, insert one record. 
Why dont you do one DELETE statement and one INSERT ? delete from zebra_queue
where biblio_auth_number IN ($str) etc.
Actually thinking about that: if you do one db call per record, why dont you
add the loop in batch mod without touching ModZebra ?
[Wrote this earlier: see my conclusion too.]

Koha/SearchEngine/Elasticsearch/Indexer.pm dates from 2013 but still includes:
$indexer->update_index_background does just refer to update_index: TODO
implement in the future - I don't know the best way of doing this yet.
Same $indexer->delete_index_background # TODO: Should be made async

C4/AuthoritiesMarc.pm:    ModZebra( $authid, 'specialUpdate',
'authorityserver', $record );
Is this the only use of the record parameter in ModZebra ? Apparently. For
Zebra we dont need it, for Elastic we now do.

This part in ModZebra is really troublesome too. We do not even check if we got
a biblio or an authority record !
            unless ($record) {
                $record = GetMarcBiblio({
                    biblionumber => $biblionumber,
                    embed_items  => 1 });
            }

Conclusion:
Looks like we can still improve a lot of the Elastic code. Adding the current
code in the old ModZebra is not the way to go imo. Move that code to
SearchEngine by just adding a call. Since we only have one use of the record
parameter in ModZebra, I would be inclined to remove it for consistency. I
understand that it is useful as optimization though.
Adding a loop somewhere else around ModZebra is probably the pragmatic way too.
We never call it for lists. Perhaps adding an iteration routine in a module ?

Changing status.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list