[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Wed Nov 28 11:36:28 CET 2018


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

--- Comment #18 from David Gustafsson <glasklas at gmail.com> ---
(In reply to Ere Maijala from comment #13)
> David, is there a compelling reason to do it with a predefined record range?
> I find it a bit complicated, and it doesn't currently work the same way for
> authorities. 
> 
> I've just attached an implementation along the lines I described earlier. It
> can be used e.g. like this:
> 
> echo -n "1,2,3" | xargs -d "," -I{} -P 3 perl
> misc/search_tools/rebuild_elastic_search.pl -v -b --slice={},3
> 
> This allows one to index the records in parallel without prior knowledge of
> the available record id's and is fairly simple in implementation.

The main reason would be that instead of for example one long lived thread per
CPU (or 4 as above) you would split up the work in many more batches that can
be balanced across CPUs with a certain concurrency level until none are left.
This could potentially distribute load more evenly assuming for example one or
more of the long living thread finishes early. But in practice they probably
would finish almost the same time, so it does not really matter if using one or
the other model.

Parallel also outputs the workers output in sequence, which could be nice, but
also not all that important.

I mainly made the patch because I knew it would be a quick and dirty way to get
a working parallel indexing.

Parallel::ForkManager looks great to me, I would probably have used it instead
of parallel if was aware of it. It would probably be quite easy to implement as
part of the rebuild script (with the slice approach) instead having to use
xargs. Then you could also use a larger number for slice to produce more
workers since ForManager has a $MAX_PROCESSES argument.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.


More information about the Koha-bugs mailing list