[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Thu Nov 29 07:55:41 CET 2018


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #25 from Ere Maijala <ere.maijala at helsinki.fi> ---
(In reply to David Cook from comment #23)
> (In reply to Ere Maijala from comment #20)
> > That means
> > I'd rather change the script so that the main process would only feed
> > children with record ID's and the children would do all the rest.
> 
> That's what I'd think.
> 
> (In reply to Ere Maijala from comment #21)
> > Oh, but then the batching and committing of changes would become difficult.
> > On a second thought I'm not sure ForkManager is quite as suitable for the
> > task as it might seem.
> 
> Why would batching and committing changes be difficult? (That's a genuine
> question. I haven't done much hands-on with Elasticsearch and Solr indexing
> APIs myself, so happy to admit my ignorance there.)

For good indexing performance you need to send records to Elasticsearch in
batches. The current default is to collect 5000 records and then commit the
batch to ES. If we have a lot of workers that only process one record at a
time, we also need IPC to collect the records in the main process to be able to
update in batches.

All that's of course possible, but I'm not sure there's any real benefit from
the way more complex mechanism compared to the slice version.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.


More information about the Koha-bugs mailing list