[Koha-bugs] [Bug 25265] Elasticsearch - Batch editing items on a biblio can lead to incorrect index
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Fri Sep 11 14:17:32 CEST 2020
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25265
--- Comment #22 from Nick Clemens <nick at bywatersolutions.com> ---
(In reply to Marcel de Rooy from comment #19)
> General
> Module changes, no tests ?
I will work on something here
> We are now fetching MARC records in ModZebra for Elastic.
We were before these patches
> that if we use Elastic we should be leaving ModZebra as soon as we can ;)
Agreed, but that is a bigger job, this is addressing a bug
> The operation of sending biblionumbers and biblio records in two lists makes
> me a bit suspicious. There is no check anymore that these lists are in sync.
> If there would be a gap somehow, we would be indexing records on the wrong
> id? Might be just theoretical but looks suboptimal.
As you note later, we only pass a record from ModAuthority, otherwise we are
building the lists
> It seems that ModZebra is still inserting records into zebraqueue when we
> use Elastic. Why? Should we skip and cleanup ?
Z3950 was the original main reason, we have the z3950responder now, but keeping
zebra in sync
is not a bad thing. Eventually yes, but for now the dafety net has been useful.
Beyond the scope of this bug
> The for loop in ModZebra seems to be suboptimal? In a loop you are checking
> if a record count is zero and if so, insert one record.
> Why dont you do one DELETE statement and one INSERT ? delete from
> zebra_queue where biblio_auth_number IN ($str) etc.
> Actually thinking about that: if you do one db call per record, why dont you
> add the loop in batch mod without touching ModZebra ?
> [Wrote this earlier: see my conclusion too.]
That is essentially what we have now, a loop in the batch operations, calling
ModZebra for each record.
This could be improved, but I don't think this adds complexity so much as
highlighting what we do
>
> Koha/SearchEngine/Elasticsearch/Indexer.pm dates from 2013 but still
> includes:
> $indexer->update_index_background does just refer to update_index: TODO
> implement in the future - I don't know the best way of doing this yet.
> Same $indexer->delete_index_background # TODO: Should be made async
This is work we should do, we don't have sponsors or the infrastructure yet,
hopefully the task queue can help here
>
> C4/AuthoritiesMarc.pm: ModZebra( $authid, 'specialUpdate',
> 'authorityserver', $record );
> Is this the only use of the record parameter in ModZebra ? Apparently. For
> Zebra we don't need it, for Elastic we now do.
This code is weird, I agree, but beyond the scope of this bug
> This part in ModZebra is really troublesome too. We do not even check if we
> got a biblio or an authority record !
> unless ($record) {
> $record = GetMarcBiblio({
> biblionumber => $biblionumber,
> embed_items => 1 });
> }
This code is beyond the scope here, we are in C4/Biblio.pm - we really should
only
deal with biblio, and we do, except for the call above
> Conclusion:
> Looks like we can still improve a lot of the Elastic code.
Yes we definitely can
> Adding the
> current code in the old ModZebra is not the way to go imo.
Rewriting and moving this will be a bigger work and this is a bug that affects
stables
> Move that code to
> SearchEngine by just adding a call.
> Since we only have one use of the record
> parameter in ModZebra, I would be inclined to remove it for consistency. I
> understand that it is useful as optimization though.
Please file a new bug for this
> Adding a loop somewhere else around ModZebra is probably the pragmatic way
> too. We never call it for lists. Perhaps adding an iteration routine in a
> module ?
I don't see why looping in one place or the other matters. In fact, look at
this comment in the code
# true ModZebra commented until indexdata fixes zebraDB crashes (it seems
they occur on multiple updates
# at the same time
> Changing status.
Your comments are well appreciated Marcel, and they are true. We have technical
debt here, and we need to rethink some of our methods.
You have taken a very broad look at the code, and identified issues that affect
many areas.
Currently though, we have a bug that is affecting users of ES, and we need to
support these users so we can continue to test and
improve the ES code. I attempted to take a path here that makes smaller
changes, the basic idea of calling for index updates by batch rather than
individually I
think is sound.
I will work on tests and take a look at the Zebra loop and would ask you to
reconsider some of the larger points as future enhancements
--
You are receiving this mail because:
You are watching all bug changes.
More information about the Koha-bugs
mailing list